TLDR AI 2026-05-26
Grok Build CLI 💻, AI hardware market ⚡, Pope Leo’s AI warning ⛪
Introducing Grok Build (3 minute read)
Grok Build, a new coding agent and CLI, has launched in beta for SuperGrok and X Premium Plus subscribers. It supports complex coding projects by allowing plan mode reviews and integrates seamlessly with user conventions. Users can deploy Grok's capabilities for automation and parallel processing using headless mode and specialized subagents.
Notes on Pope Leo XIV's encyclical on AI (12 minute read)
Pope Leo XIV recently released a document on the ethics of integrating AI into modern society. It touches on the environmental impact of the technology, covers the risks of algorithmic systems that make decisions that impact people's lives, discusses how AI amplifies the power of those with resources, and more. A link to the document is available in the article. The writing style is very approachable, even to non-Catholics.
On AI Hardware (7 minute read)
The market is becoming a stack of memory problems. Hardware changes slowly, while software and model architectures can move quickly. Hardware companies will need to build architectures that remain useful as the bottleneck shifts.
Gemini 3.5 Flash Looks Good For How Fast It Is (8 minute read)
Zvi judges Gemini 3.5 Flash the best model at its speed point but unconvincing against Opus 4.7 or GPT-5.5 outside latency-sensitive workloads, with Google positioning it as a daily driver for agentic workflows that outscores 3.1 Pro on Terminal-Bench and MCP Atlas while running 4x faster.
👨💻
Engineering & Research
Get the AI that makes your raw meeting notes awesome for 1 month free (Sponsor)
Granola listens to your calls and uses AI to add context to your shorthand notes. Then it follows up for you and produces briefs and notes for your time. No awkward meeting bot required: Granola works locally on your device. Now TLDR readers get
1 month free with code: TLDR1MO
On-Policy Distillation (5 minute read)
On-policy distillation trains a student model on trajectories sampled from its own policy while a teacher provides dense token-level supervision through KL-based regularization, closing the train-inference distribution mismatch that off-policy methods suffer. The canonical formulation unifies forward-KL, reverse-KL, and JSD losses with reverse-KL emerging as the default for mode-seeking smaller students, and a one-line code swap of the regularizer model on top of an RL stack like Tinker implements the technique.
Models.dev (GitHub Repo)
Models.dev consolidates specifications and pricing of various models, accessible via an API.
Introducing BenchBench (5 minute read)
BenchBench is a benchmark that tests how well models can create a benchmark. It works as a great benchmark for model abilities as well as a test of models' self-awareness. The benchmark tests creativity and not just problem-solving ability. In tests, GPT 5.2 was the only winner, with every other model, from Opus 4.6 to GPT 5.5, struggling to create an actually useful benchmark that others had a hard time solving.
Google DeepMind's AlphaProof Nexus solves decades-old math problems for a few hundred dollars (7 minute read)
Google DeepMind's AlphaProof Nexus autonomously solved nine out of 353 open Erdős problems, including questions unanswered for decades, at inference costs of a few hundred dollars per problem.
⚡The reason most AI transformations fail? 71% of workflows are invisible to leadership. 🔍 (Sponsor)
DeepSeek's 10 trillion USD grand strategy (35 minute read)
DeepSeek's aim is to enable a $10 trillion Chinese AI hardware ecosystem and achieve a $1 trillion valuation for itself.
Apple's Genmoji and Image Playground Set for Major Visual Overhaul in iOS 27 Ahead of WWDC 2026 (2 minute read)
Apple plans to upgrade AI image tools, Genmoji and Image Playground, in iOS 27, enhancing visual quality and realism.
GPT-5.6 Leaks: Coming in June (1 minute read)
GPT-5.6 seems heavily focused on stronger multi-step reasoning, better agentic workflows, and improved frontend generation capabilities.
How AI Will Save Prediction Markets (10 minute read)
Prediction markets have failed to deliver Robin Hanson's 1990 Idea Futures vision.
TLDR is hiring a Senior Software Engineer, Applied AI ($250k-$350k, Fully Remote)
TLDR's Applied AI team is tasked with making every process at TLDR legible to code, runnable by anyone, and composable into larger workflows. Join a small, fast moving team using the latest AI tools with an unlimited token budget.
Learn more.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email