TLDR AI 2024-05-15

Anthropic in Europe 🤖, Mamba and vision tasks 🐍, Image Deraining 🌦️

Headlines & Launches

Google I/O (9 minute read)

Google announced many new features, including Gemini Flash, Veo video generation, Imagen 3, and its newest assistant, Project Astra, at I/O 2024. In all, there are an impressive number of improvements, including 2m token context length, dramatically cheaper models, and improved multimodality.

Anthropic Is Expanding To Europe And Raising More Money (2 minute read)

Anthropic has expanded its AI assistant, Claude, to Europe. Claude supports multiple languages. Anthropic is offering the service across its website, iOS app, and business plans for teams. The company is beginning the process of raising more money.
Research & Innovation

Mamba's Suitability for Vision Tasks (20 minute read)

Researchers investigated the Mamba architecture, typically used for tasks with long-sequence and autoregressive characteristics, and its application in vision tasks, and found that while Mamba is not effective for image classification, it shows promise in detection and segmentation tasks that do.

A New State-Free Sequence Parallel Inference (16 minute read)

A new state-space model using a dual transfer function representation has been developed for deep learning. It features a state-free sequence parallel inference algorithm.
Engineering & Resources

Image Deraining (GitHub Repo)

ESDNet is a Spiking Neural Network (SNN) designed for image deraining tasks. It capitalizes on the unique properties of rain pixel values to enhance spike signal intensity.

Ollama on Google Firebase (6 minute read)

Genkit is a new toolset for Firebase for building and deploying generative products. It can be used to launch servers for open source language models.

Fine-Tune PaliGemma (Colab Notebook)

Google released and teased a few open source models in its launch today. One actually-released model is a vision language model based on SigLIP. It is extremely easy to tune and extend to a variety of tasks. This Colab Notebook shows how to do so with clean, readable code.

What OpenAI Did (6 minute read)

GPT-4o multimodal abilities, integrating vision and voice, promise significant advances in how AI interacts with the world, paving the way for AI to become a more ubiquitous presence in daily life.
Quick Links

Gemini Flash (Website)

Gemini Flash is a new lightweight model from Google that features multimodal reasoning and a long context window of up to one million tokens.

Veo (Website)

Veo is a new video generation AI model from Google Deepmind that can generate 1080p resolution videos that can go beyond a minute long.

xAI Nears $10B Deal To Rent Oracle's Servers (1 minute read)

Elon Musk's AI startup xAI is negotiating a potential $10 billion deal to rent cloud servers from Oracle, aiming to become one of Oracle's largest customers and rival AI offerings from OpenAI and Google.
The most important AI, ML, and data science news in a free daily email.
Join 500,000 readers for