TLDR AI 2024-05-10

Stability AI Discord Bot 🤖, Music-Generating AI 🎶, You Only Cache Once 🤟

🚀

Headlines & Launches

Yellow Raises $5M for Assistive 3D Modeling (4 minute read)

A former Google ML Head of Product and an MIT professor have raised money from a16z's new game fund for generative 3D character creation.

Stability AI Discord Bot (3 minute read)

Stability AI has introduced Stable Artisan, a system combining all of its models that can be run from Discord. This is likely a play to compete directly with Midjourney.

ElevenLabs Previews Music-Generating AI Model (3 minute read)

Voice AI startup ElevenLabs is previewing a new model that converts prompts into song lyrics. The company is using a promotional strategy similar to the one OpenAI used for Sora AI.

🧠

Research & Innovation

You Only Cache Once (25 minute read)

The YOCO architecture is a decoder-decoder model that reduces GPU memory demands while retaining global attention capabilities. It consists of a self-decoder and cross-decoder, allowing for efficient caching and reuse of key-value pairs. YOCO achieves favorable performance compared to traditional Transformers, with significant improvements in inference memory, latency, and throughput, making it suitable for large language models and long context lengths.

Consistency Language Models (21 minute read)

Predicting more than one token at a time is an interesting paradigm of active research. If successful, it would dramatically improve generation time for many large language models. The approach in this post, which mirrors consistency models from image synthetics, attempts to use a parallel decoding strategy on fine-tuned LLMs to speed up generation. Early results match speculative decoding performance of 3x.

👨‍💻

Engineering & Resources

Enhanced Change Detection in Images (19 minute read)

DiffMatch is a novel semi-supervised change detection method that leverages visual language models to synthesize pseudo labels for unlabeled data, providing additional supervision signals.

Buzz Pretraining Dataset (Hugging Face Hub)

Buzz is a novel dataset that includes preference data in the pretraining mix. Its researchers have also released several models that were trained using this data. They found that the models perform well on a number of human preference tasks.

Achieving Fairness with a New Post-Processing Algorithm (36 minute read)

This new post-processing algorithm addresses model bias by applying a "fairness cost" to recalibrate output scores, ensuring compliance with various group fairness criteria such as statistical parity, equal opportunity, and equalized odds.

🎁

Miscellaneous

Gemma 10M context (6 minute read)

Discussion of the various ways to extend context for language models. It doesn't provide much in the way of evaluations, but it's a fascinating discussion of where the field is exploring.

Vision Mamba: A Comprehensive Survey (34 minute read)

A comprehensive survey that explores Mamba's applications across various visual tasks and its evolving impact. Stay updated on new findings and advancements on the Mamba project.

⚡️

Join 500,000 readers for

Privacy Careers Advertise