TLDR AI 2024-01-24

ElevenLabs raises $80M 💰, Google Chrome AI features 💻, evaluating LLMs 📊

🚀

Headlines & Launches

ElevenLabs raises $80m Series B (4 minute read)

The AI voice company has raised funding to continue its voice cloning, dub studio, and other AI-enabled audio work.

Google Chrome Gains AI Features (2 minute read)

Google is enhancing Chrome with AI features, including a writing assistant for composing various texts online, an automated tab organizer for managing multiple open tabs, and a custom theme creator using a text-to-image diffusion model.

🧠

Research & Innovation

Weight averaging of reward models (22 minute read)

Reward models are used in RLHF to represent human preference, although the model being aligned often “hacks the reward” and achieves unfavorable performance. By merging multiple reward models, which remain linearly mode connected, the resulting aligned model is preferred 79% of the time over one aligned with a single reward model. Model merging is strange, and may just be regularization, but it has worked surprisingly well in general models and now has been shown to work as a training step for the general language model pipeline.

Contrastive Preference Optimization (26 minute read)

Another preference optimization technique now applied to machine translation. It is more data efficient than DPO on this task. Importantly, the objective discouraged the model from proposing adequate but inaccurate translations, which enabled the model to achieve competitive performance on WMT.

Evaluating Large Multimodal Models (18 minute read)

This technical report presents MMCBench, a new benchmark designed to test the consistency and reliability of large multimodal models (LMMs) across various tasks like text-to-image and speech-to-text. It covers over 100 popular models, aiming to improve readers' understanding of these AI systems' performance in real-world scenarios.

👨‍💻

Engineering & Resources

Automatically trained PairRM with DPO (4 minute read)

A very strong new Mistral tune that uses clever weak supervision and synthetic data to generate a DPO-compatible dataset. The outlined process can be repeated a number of times and applied to a wide variety of enterprise use cases.

Nano ColBERT (GitHub Repo)

ColBERT is one of the better embedding models for retrieval. It is worth exploring and using since many are building RAG-enabled AI applications. This implementation is a simple and straightforward replication without the performance optimizations and their added complexity. It uses BERT from HuggingFace but achieves essentially identical performance to the original implementation.

Text-to-Image Generation with RPG Framework (GitHub Repo)

This project introduces RPG, a new framework that uses the “Recaption, Plan, and Generate” approach to improve text-to-image generation. It cleverly breaks down complex image creation into simpler tasks, leading to more accurate and detailed images, especially when dealing with multiple objects and attributes.

🎁

Miscellaneous

Optimizing matrix multiplication (6 minute read)

A quick read about hardware-specific matrix multiplication optimizations and a general process to follow to speed up AI code.

SyncTalk: Mastering Realism in Talking Head Videos (2 minute read)

SyncTalk is a breakthrough in the realm of realistic talking head videos. It overcomes previous challenges in synchronizing facial identity, lip movements, and expressions.

Should The Future Be Human? (7 minute read)

Elon Musk and Larry Page fundamentally disagree about AI's potential risks, with Page labeling Musk a "speciesist" for preferring humans over digital life forms, leading to a rift in their friendship. This reflects the broader debate on AI's impact, encompassing concerns about consciousness, individuation, art, science, philosophy, and the possibility of human-AI mergers, and highlights the need for cautious and thoughtful development of AI technologies.

⚡️

Join 500,000 readers for

Privacy Careers Advertise