TLDR AI 2024-05-27

Character.ai partnerships 🤝, Golden Gate Claude 🌉, Ensembling LoRA 🤖

🚀
Headlines & Launches

xAI and Meta fight to partner with Character.ai (4 minute read)

Silicon Valley AI firms are fighting to partner with Character.ai, a fast-growing role-playing startup founded by AI pioneer Noam Shazeer. This comes as many large firms are pouring money into smaller companies.

Scarlett Johansson told OpenAI not to use her voice (2 minute read)

Scarlett Johansson alleges that OpenAI created a voice for ChatGPT that mimics hers without consent, leading her to seek legal advice. OpenAI has since paused using the voice and is discussing the issue with her representatives. The situation highlights concerns over the use of celebrity likenesses in AI applications.

Golden Gate Claude (2 minute read)

A new research paper details the mapping of AI model Claude 3 Sonnet's inner workings, revealing "features" activated by concepts like the Golden Gate Bridge. By adjusting these features' strengths, researchers can direct Claude's responses to incorporate specific elements, demonstrating a novel method of modifying large language models. The research aims to enhance AI safety by precisely adjusting model behaviors related to potential risks.
🧠
Research & Innovation

Reinforcement Learning at Lyft (21 minute read)

The Lyft team used online reinforcement learning, rewarded by driver future earnings, to match drivers with riders. They were able to dramatically improve in real time and made an estimated $30m/year extra for riders.

Lessons on Reproducible Evaluation of Language Models (45 minute read)

Evaluating language models is hard and details on how to do it are hard to find outside of the largest firms. This paper shows a reproducible and powerful set of evaluation criteria. It includes a valuable discussion of perplexity evaluation in the appendix.

Image Personalization with Classifier-Guided Diffusion Models (16 minute read)

Researchers introduce a new approach to customize diffusion models for generating identity-preserving images from user-provided references. Unlike traditional methods that require extensive domain-specific training, this technique uses classifier guidance to steer diffusion models without additional training.
👨‍💻
Engineering & Resources

Mistral Finetune (GitHub Repo)

Mistral released an official repository to fine-tune its models.

Ensembling LoRA (GitHub Repo)

LoRA-Ensemble is a parameter-efficient deep ensemble method for self-attention networks. This technique, which extends Low-Rank Adaptation (LoRA) for implicit ensembling, allows for accurate and well-calibrated predictions without the high computational cost of traditional ensemble methods.

Modular Norm for Neural Networks (GitHub Repo)

Modular norm is a new method for normalizing weight updates in neural networks that scales training efficiently across different network sizes.
🎁
Miscellaneous

AI's Communication Revolution: We're All Talking to Computers Now (10 minute read)

GPT-4o, OpenAI's latest AI model, bridges real-time communication between humans and machines, extending capabilities beyond text to include vision and audio. The AI revolution introduces a new wave of human-to-AI and eventual AI-to-AI interactions, likely impacting the dynamics of our social behaviors and business models. As this technology progresses, its effect on human communication will unfold, potentially catalyzing the creation of innovative companies and software solutions.

MobileNet-V4 (11 minute read)

MobileNet is a computer vision model with extreme performance and speed. It can run on edge devices. This blog post outlines the new model and some modern changes that went into its creation.

Multi-Dimensional Features in Language Models (GitHub Repo)

This project investigates whether language models use multi-dimensional features for computation, challenging the linear representation hypothesis.
⚡️
Quick Links

I Don't Want To Spend My One Wild And Precious Life Dealing With Google's AI Search (3 minute read)

Google's AI search feature is causing frustration by adding an uninvited three-second delay to search results, disrupting the user experience with unwanted information.

Mixtral 8x22b on a $400 CPU faster than reading speed (GitHub Discussion)

Thanks to recent advances from Mozilla's Llamafile project, it is now possible to run inference for the flagship model from Mistral at 20 tokens per second on a commodity CPU.

LLMs are not suitable for (advanced) brainstorming (4 minute read)

Large language models struggle with truly innovative brainstorming, often converging to consensus-based ideas rather than generating novel concepts.
The most important AI, ML, and data science news in a free daily email.
Join 500,000 readers for