Meet Smaug-72B: The New King Of Open-Source AI (4-minute read)
A new open-source language model named "Smaug-72B," developed by Abacus AI and derived from "Qwen-72B" by Alibaba Group's Qwen team, now leads Hugging Face's natural language processing leaderboard. Outperforming established models like GPT-3.5 and Mistral Medium in various benchmarks, Smaug-72B signifies a milestone for open-source AI by scoring an average of over 80 across major evaluations, hinting at the potential to rival proprietary AI technologies.
High-fidelity text-to-speech models with synthetic annotations (18 minute read)
These text-to-speech models, trained by Stability AI, can be guided by precise natural language instructions. As there is no large dataset with proper textual descriptions of audio for generation, its creators synthetically annotated a large corpus of speech for training. This is another example of a broader trend of annotation, up-captioning, and training for generative modeling.
MusicRL (24 minute read)
The MusicLM team at Google used 300k pieces of feedback, along with other reward signals, to run an RL process on their music generation models. They found it outperforms the base model in human preference studies, but it is unclear which RL method yields the highest fidelity output.
Engineering & Resources
Yolo-World: Realtime open vocabulary object detection (GitHub Repo)
Object detection is the process of identifying objects and their bounding boxes. This can usually only be done for a fixed set of objects chosen before training. This work introduces a real-time method that can do Open Vocabulary object detection, which means it can detect bounding boxes for any run-time-specified combination of objects.
MobileVLM V2 is Out! (14 minute read)
MobileVLM V2 is a range of advanced vision-language models for mobile devices that showcase notable performance improvements through innovative architecture.
Self Discover Implementation (GitHub Repo)
Google proposed a novel prompting technique that allows language models to use a set of reasoning primitives to discover a larger framework for problem-specific reasoning. This means the models can select different modules and combine them to better solve complex problems. This repository is an unofficial implementation of these ideas.
The Path to Profitability for AI (12 minute read)
Recent shifts in AI research focus on efficiency and depth over mere accuracy and breadth. NVIDIA's H100 sales and AI's growing energy demand highlight the industry's scale. Investments demand profitability, shifting research towards smaller, more efficient models like Phi 2 and emphasizing sustainable economics from model architecture to deployment. Innovations in training, fine-tuning, and design promise to improve AI's energy and computational efficiency. On-device capabilities reflect a broader trend towards more sustainable and practical AI applications.
How design drove $10M in preorders for Rabbit R1 AI hardware (11-minute read)
The Rabbit R1 is a bright orange walkie-talkie for AI. It features one big button that users push to talk, a chunky scroll wheel to browse the screen, and a rotating camera with its own privacy door. At $199, the device can be programmed to control apps and websites. This article tells the story of how Rabbit and Teenage Engineering collaborated to create what may be the most successful AI hardware launch to date.