TLDR AI 2024-04-10

New Google models 🌐, Llama 3 coming soon 👀, Visual Autoregressive Models 🖼️

🚀

Headlines & Launches

Google Gemma Expanded Models (7 minute read)

Google has trained a code Gemma and Recurrent Gemma models. They get competitive performance and include FIM capabilities. The recurrent model is much faster and more memory efficient.

Meta Confirms That Llama 3 Is Coming Next Month (2 minute read)

Meta has confirmed plans to release Llama 3, the next generation of its large language model for generative AI assistants, within the next month.

Four Takeaways on the Race to Amass Data for A.I. (3 minute read)

The development of AI, particularly large language models like GPT-3, is heavily reliant on vast amounts of data, with companies like Meta and Google racing to gather more as high-quality online data may run out by 2026. Tech giants are employing controversial methods, including using YouTube data and considering the purchase of publishers, to fuel their AI advancements. The use of 'synthetic' data is a potential solution, though it carries the risk of amplifying AI errors.

🧠

Research & Innovation

Enhancing Object Detection in Single-Domain (11 minute read)

Addressing the challenge of single-domain generalization (S-DG) in object detection, the new OA-DG method introduces OA-Mix for data augmentation and OA-Loss for training.

Improving Stable Diffusion with Unified Feedback (17 minute read)

UniFL is a method that improves the output quality of diffusion models using a fairly complicated cascade of feedback steps. These all serve to improve the visual quality, aesthetics, and preference alignment of the image generation models. The techniques are agnostic to the underlying model and can be used to improve any image generation models.

Swap Anything (11 minute read)

SwapAnything is a new algorithm that can replace things in a picture with other things you choose, without changing the rest of the picture. It's better than other tools because it can swap any object, not just the main subject, and it's really good at making the new object fit perfectly into the original picture. It uses inversion, concept vectors, and a pretrained diffusion model.

👨‍💻

Engineering & Resources

Chemistry Bench for Language Model (GitHub Repo)

A Big-Bench compatible benchmark for evaluating large language model capabilities on chemistry questions. This benchmark can help measure the scientific ability of various language models.

The SqueezeAttention for LLMs (GitHub Repo)

A new method called SqueezeAttention has been developed to optimize the Key-Value cache of large language models, significantly reducing memory usage by 30% to 70% and doubling the throughput.

Visual Autoregressive Models (GitHub Repo)

Code for the recent “next-resolution prediction” work that frames image generation as progressive prediction of increasing resolution. The repository includes inference scripts and a demo notebook. The training code will be released soon.

🎁

Miscellaneous

Intel Details Gaudi 3 At Vision 2024 (7 minute read)

Intel has announced its new Gaudi 3 AI processors, claiming up to 1.7X the training performance, 50% better inference, and 40% better efficiency than Nvidia's H100 processors at a lower cost.

Can Demis Hassabis Save Google? (12 minute read)

DeepMind founder Demis Hassabis now leads Google's unified AI research arm, aiming to maintain the tech giant's edge in the AI landscape with breakthroughs like AlphaGo and AlphaFold. Despite the success, challenges in integrating AI into tangible products and competition from entities such as OpenAI's ChatGPT persist. Hassabis, recognized for his significant contributions to AI, must now navigate Google's product strategy to leverage DeepMind's research advancements.

‘It’s very easy to steal someone’s voice’: how AI is affecting video game actors (10 minute read)

Cissy Jones, a renowned voice artist, co-founded Morpheme to offer ethical AI voice modeling after her voice was used without consent. Morpheme ensures actors consent and are compensated for AI-generated vocal content. The broader industry's rush to AI adoption may sideline human talent and ignore actors' rights. As the video game industry integrates AI, Sag-Aftra union negotiations aim to protect performers' consent and fair compensation.

⚡️

Quick Links

Longcontext Alpaca Training (Colab Notebook)

Train over 200k context windows on an H100 with a new gradient accumulation offloading.

Aerospace AI Hackathon Projects (5 minute read)

200 AI and aerospace engineers gathered to prototype innovative solutions for the Aviation and Space industries and built an impressive array of tools ranging from AI air traffic controllers to AI flight planners to Apple Vision Pro flight simulators.

Apple’s New AI Model Could Help Siri See How iOS Apps Work (1 minute read)

Apple has developed a new multimodal LLM called Ferret-UI that has the potential to understand the user interfaces of mobile displays, enabling Siri to comprehend and interact with on-screen elements effectively.

The most important AI, ML, and data science news in a free daily email.

Join 500,000 readers for

Privacy Careers Advertise