TLDR AI 2024-02-08

Microsoft Copilot new features 🧑‍✈️, Smaug-72B 🌐, Google MusicRL 🎵

Headlines & Launches

Meet Smaug-72B: The New King Of Open-Source AI (4-minute read)

A new open-source language model named "Smaug-72B," developed by Abacus AI and derived from "Qwen-72B" by Alibaba Group's Qwen team, now leads Hugging Face's natural language processing leaderboard. Outperforming established models like GPT-3.5 and Mistral Medium in various benchmarks, Smaug-72B signifies a milestone for open-source AI by scoring an average of over 80 across major evaluations, hinting at the potential to rival proprietary AI technologies.

Microsoft Brings New AI Image Functionality To Copilot (5 minute read)

Microsoft announced a major update to its Copilot AI, including a new design, AI image creation, editing functionalities, and a new AI model named Deucalion.

EU’s AI Act Passes Last Big Hurdle On The Way To Adoption (2 minute read)

The European Union's AI Act, aimed at regulating artificial intelligence applications based on risk, has passed a crucial vote by Member State representatives, who have confirmed the final text of the draft law.
Research & Innovation

High-fidelity text-to-speech models with synthetic annotations (18 minute read)

These text-to-speech models, trained by Stability AI, can be guided by precise natural language instructions. As there is no large dataset with proper textual descriptions of audio for generation, its creators synthetically annotated a large corpus of speech for training. This is another example of a broader trend of annotation, up-captioning, and training for generative modeling.

Enhancing CLIP for Efficient Image Classification (18 minute read)

This paper revisits the classical Gaussian Discriminant Analysis (GDA) algorithm to improve the performance of CLIP in image classification tasks without additional training or resources.

MusicRL (24 minute read)

The MusicLM team at Google used 300k pieces of feedback, along with other reward signals, to run an RL process on their music generation models. They found it outperforms the base model in human preference studies, but it is unclear which RL method yields the highest fidelity output.
Engineering & Resources

Yolo-World: Realtime open vocabulary object detection (GitHub Repo)

Object detection is the process of identifying objects and their bounding boxes. This can usually only be done for a fixed set of objects chosen before training. This work introduces a real-time method that can do Open Vocabulary object detection, which means it can detect bounding boxes for any run-time-specified combination of objects.

MobileVLM V2 is Out! (14 minute read)

MobileVLM V2 is a range of advanced vision-language models for mobile devices that showcase notable performance improvements through innovative architecture.

Self Discover Implementation (GitHub Repo)

Google proposed a novel prompting technique that allows language models to use a set of reasoning primitives to discover a larger framework for problem-specific reasoning. This means the models can select different modules and combine them to better solve complex problems. This repository is an unofficial implementation of these ideas.

The Path to Profitability for AI (12 minute read)

Recent shifts in AI research focus on efficiency and depth over mere accuracy and breadth. NVIDIA's H100 sales and AI's growing energy demand highlight the industry's scale. Investments demand profitability, shifting research towards smaller, more efficient models like Phi 2 and emphasizing sustainable economics from model architecture to deployment. Innovations in training, fine-tuning, and design promise to improve AI's energy and computational efficiency. On-device capabilities reflect a broader trend towards more sustainable and practical AI applications.

Challenging Multi-Modal Language Models with a New Benchmark (16 minute read)

A new study reveals a weakness in multi-modal large language models (MLLMs) like GPT-4V: they struggle with specific types of image-text inputs, leading to errors. CorrelationQA is a benchmark designed to evaluate MLLMs' performance in scenarios where images may mislead or contradict text.

How design drove $10M in preorders for Rabbit R1 AI hardware (11-minute read)

The Rabbit R1 is a bright orange walkie-talkie for AI. It features one big button that users push to talk, a chunky scroll wheel to browse the screen, and a rotating camera with its own privacy door. At $199, the device can be programmed to control apps and websites. This article tells the story of how Rabbit and Teenage Engineering collaborated to create what may be the most successful AI hardware launch to date.
Quick Links

OpenAI Forms A New Team To Study Child Safety (2 minute read)

OpenAI has established a Child Safety team to explore ways to prevent misuse of its AI tools by children.

OpenWidget Chat Interface (Product)

Create your own free ChatGPT widget for your website.

Local AI filtered social media (GitHub Repo)

A Chrome extension that allows you to use a local language model to filter social media posts based on criteria you choose.
The most important AI, ML, and data science news in a free daily email.
Join 500,000 readers for