TLDR AI 2025-04-18
OpenAI o3 visual models π, Mistral Classifier Factory π, Goodfire raises $50M π°
Is there a sustainable way to monetize AI? (Sponsor)
You've built something powerful with AI. Now you're asking: how do we price it? Many startups are getting the answer wrong right nowβand putting their business's viability at risk.
In Metronome's upcoming webinar, pricing experts from 49 Palms Ventures and Metronome CEO Scott Woody will explore strategies and best practices for monetizing AI. Learn how to:
β Find a logical starting point with the 9-step AI pricing framework
β Assess and communicate your AI product's value
β Create flexible, iterable pricing systems that can evolve with the market
Register now to save your spot.
π§
Research & Innovation
Scene Captioning (14 minute read)
3D CoCa is a unified framework that combines vision-language contrastive learning and captioning for 3D scenes.
Efficient Line Art Colorization with Broader References (12 minute read)
Novel efficient long-context fine-grained ID preservation framework for line art colorization, achieving high precision, efficiency, and flexible usability for comic colorization. It transforms black-and-white line art into vibrant illustrations by effectively integrating extensive contextual references.
Large Reasoning Models as a Judge (8 minute read)
JudgeLRM is a family of LLMs trained with reinforcement learning for judgment tasks. Unlike SFT, it excels in reasoning-heavy evaluations, outperforming models like GPT-4 and DeepSeek-R1.
π¨βπ»
Engineering & Research
Speech Instruction Fine-Tuning Dataset (Hugging Face Hub)
SIFT-50M (Speech Instruction Fine-Tuning) is a 50-million-example dataset designed for instruction fine-tuning and pre-training of speech-text large language models (LLMs). It is built from publicly available speech corpora containing a total of 14K hours of speech and leverages LLMs and off-the-shelf expert models. The dataset spans five languages, covering diverse aspects of speech understanding and controllable speech generation instructions. SIFT-50M augments existing speech datasets with instruction-based question-answer (QA) pairs for speech understanding and includes approximately 5 million examples for controllable speech generation.
End-to-End Latent Diffusion Training with REPA-E (3 minute read)
REPA-E enables stable, joint training of VAEs and latent diffusion models using a representation-alignment loss, achieving state-of-the-art results on ImageNet.
DeepSpeed's DeepCompile (GitHub Repo)
The DeepSpeed team has worked to bring compilation to their distributed training efforts. This compilation speeds up various bottlenecked operations by many times. It makes use of a patched version of torch compile.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email