TLDR AI 2024-05-10

Stability AI Discord Bot 🤖, Music-Generating AI ğŸŽ¶, You Only Cache Once 🤟

Headlines & Launches

Yellow Raises $5M for Assistive 3D Modeling (4 minute read)

A former Google ML Head of Product and an MIT professor have raised money from a16z's new game fund for generative 3D character creation.

Stability AI Discord Bot (3 minute read)

Stability AI has introduced Stable Artisan, a system combining all of its models that can be run from Discord. This is likely a play to compete directly with Midjourney.

ElevenLabs Previews Music-Generating AI Model (3 minute read)

Voice AI startup ElevenLabs is previewing a new model that converts prompts into song lyrics. The company is using a promotional strategy similar to the one OpenAI used for Sora AI.
Research & Innovation

You Only Cache Once (25 minute read)

The YOCO architecture is a decoder-decoder model that reduces GPU memory demands while retaining global attention capabilities. It consists of a self-decoder and cross-decoder, allowing for efficient caching and reuse of key-value pairs. YOCO achieves favorable performance compared to traditional Transformers, with significant improvements in inference memory, latency, and throughput, making it suitable for large language models and long context lengths.

Consistency Language Models (21 minute read)

Predicting more than one token at a time is an interesting paradigm of active research. If successful, it would dramatically improve generation time for many large language models. The approach in this post, which mirrors consistency models from image synthetics, attempts to use a parallel decoding strategy on fine-tuned LLMs to speed up generation. Early results match speculative decoding performance of 3x.
Engineering & Resources

Enhanced Change Detection in Images (19 minute read)

DiffMatch is a novel semi-supervised change detection method that leverages visual language models to synthesize pseudo labels for unlabeled data, providing additional supervision signals.

Buzz Pretraining Dataset (Hugging Face Hub)

Buzz is a novel dataset that includes preference data in the pretraining mix. Its researchers have also released several models that were trained using this data. They found that the models perform well on a number of human preference tasks.

Achieving Fairness with a New Post-Processing Algorithm (36 minute read)

This new post-processing algorithm addresses model bias by applying a "fairness cost" to recalibrate output scores, ensuring compliance with various group fairness criteria such as statistical parity, equal opportunity, and equalized odds.

Gemma 10M context (6 minute read)

Discussion of the various ways to extend context for language models. It doesn't provide much in the way of evaluations, but it's a fascinating discussion of where the field is exploring.

Vision Mamba: A Comprehensive Survey (34 minute read)

A comprehensive survey that explores Mamba's applications across various visual tasks and its evolving impact. Stay updated on new findings and advancements on the Mamba project.
Quick Links

Alibaba Rolls Out Qwen2.5 (2 minute read)

Alibaba Cloud has launched the latest version of its large language model, Tongyi Qianwen Qwen2.5, marking significant improvements in reasoning, code comprehension, and textual understanding over Qwen2.0.

Mistral AI Raising At A $6B Valuation (2 minute read)

Paris-based Mistral AI is raising funds at a valuation of $6 billion, tripling its previous valuation from December.

Leaked Deck Reveals How OpenAI Is Pitching Publisher Partnerships (3 minute read)

OpenAI's Preferred Publishers Program, detailed in a pitch deck, offers financial incentives and enhanced visibility within ChatGPT for select news publishers.
The most important AI, ML, and data science news in a free daily email.
Join 500,000 readers for