TLDR AI 2026-05-19
Qwen 3.7 π€, Cursor Composer 2.5 π¨βπ», Anthropic acquires Stainless π οΈ
π§
Deep Dives & Analysis
What political censorship looks like inside an LLM's weights (109 minute read)
Qwen3.5-9B's political censorship is a small circuit that can be read and turned off. The factual knowledge is already in pretraining. The censorship behavior is layered on top of the facts. The model never loses the knowledge, it just learns to route around it.
Agent Evaluation: A Detailed Guide (53 minute read)
LLM evaluation has shifted from static benchmarks to more dynamic, real-world agent systems. Effective evaluation now requires realistic harnesses to test agents over long time horizons in complex environments. This is crucial as agents increasingly adopt high-stakes roles, such as coding and medicine, necessitating rigorous performance measurement and outcome-oriented evaluation.
π¨βπ»
Engineering & Research
π AI Agent Security Summit | San Francisco (Sponsor)
Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation (9 minute read)
NVIDIA Cosmos Predict 2.5 generates videos from text, adapting for specific tasks like robot manipulation using LoRA/DoRA to inject trainable adapters, minimizing memory use. These methods offer efficient fine-tuning on a single GPU, preventing catastrophic forgetting while generating synthetic trajectories quickly. Fine-tuning with LoRA and DoRA significantly improves video quality, with LoRA more suited for tight memory conditions and DoRA preferred for addressing training instability.
HRM-Text (GitHub Repo)
HRM-Text is a 1B text generation model based on the HRM architecture. It can be trained with 130-600x less compute and 150-900x less data than foundation models, making foundation model pretraining accessible. The 0.6B parameter version of the model can be trained on 8 H100s on a single node in about 50 hours for around $800. The 1B parameter model can be trained on 16 H100s on two nodes in about 46 hours for around $1,472.
Generalization Dynamics of LM Pre-training (17 minute read)
Language models (LMs) undergo unpredictable switches between parroting patterns and exhibiting adaptive intelligence during pre-training, a phenomenon termed "mode-hopping." This behavior cannot be corrected by standard optimization techniques and presents as a competition for model capacity, influenced by data from each training window. Researchers propose leveraging these dynamics to better select pre-training checkpoints, curate data for stable generalization, and evaluate metrics predicting LM behavior.
Running long-horizon agents in production [Langchain Webinar] (Sponsor)
Production agents need durable execution, the ability to resume from where they left off without starting over. Join LangChain to learn how to make it work in real deployments.
Save your seatSkills in web, iOS, and Android (2 minute read)
xAI launched "Skills" for Grok, allowing users to teach it functions once, which it remembers across interactions.
LLM Wiki v2 (16 minute read)
This post contains a pattern for building personal knowledge bases using LLMs.
Introducing Scheduled Tasks 2.0 (7 minute read)
Scheduled Tasks 2.0 enhances automation by allowing tasks to run with context, ensuring continuity in workflows across different projects and apps.
Turn repeated instructions into reusable skills in Lovable (14 minute read)
Skills in Lovable allow users to create reusable, markdown-based instructions to eliminate repetitive explanations.
TLDR is hiring a Senior Software Engineer, Applied AI ($250k-$350k, Fully Remote)
TLDR's Applied AI team is tasked with making every process at TLDR legible to code, runnable by anyone, and composable into larger workflows. Join a small, fast moving team using the latest AI tools with an unlimited token budget.
Learn more.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email