TLDR AI 2026-05-20
Gemini 3.5 Flash ā”ļø, Karpathy joins Anthropic š§āš», OpenAI Guaranteed Capacity ā”
Gemini 3.5 Flash (5 minute read)
Google has introduced Gemini 3.5 Flash, a new model focused on agentic workflows, coding, and long-horizon task execution. The release also expanded Gemini access across Search, enterprise tools, Android Studio, and Google's developer platforms.
Karpathy joins Anthropic (1 minute read)
Andrej Karpathy announced he has joined Anthropic, citing the next few years at the LLM frontier as especially formative for his return to R&D. Karpathy noted he remains passionate about education and plans to resume that work later, signaling the move is research-focused rather than a permanent pivot away from teaching.
OpenAI announces new Guaranteed Capacity offering for customers to secure compute (3 minute read)
OpenAI's new Guaranteed Capacity offering allows customers to secure long-term access to compute to power AI products, agents, and workflows. Customers can choose between one-, two-, and three-year-long commitments, with discounts based on how long the commitment is. The company will offer Guaranteed Capacity until it sells out of its current allocation. It plans to offer it again in the future.
š§
Deep Dives & Analysis
Google Detailed the Shift Toward Agentic Gemini Products (19 minute read)
At I/O 2026, Google outlined how Gemini models were being integrated across consumer products, creative tools, and developer platforms. The company also shared that monthly token usage across its AI systems had grown to more than 3.2 quadrillion.
Model half-life (4 minute read)
Model half-life - the idea that model releases will keep getting faster and faster - doesn't really make sense under consideration. Model releases have definitely increased in pace, but the release time isn't halving every six months. This post looks at model release dates for several of the most well-known models and lays out predictions for when the next models might come out.
šØāš»
Engineering & Research
Stop stitching databases for AI agents (Sponsor)
Using Claude Code: The unreasonable effectiveness of HTML (10 minute read)
HTML's richness allows it to convey complex information more effectively than Markdown, including layouts, data tables, and interactive elements. It enhances readability by organizing specs into well-structured, easily navigable documents and offers better sharing and interaction capabilities. Claude Code uses HTML to efficiently ingest context from various sources, aiding in specs, design prototyping, and creating custom editing interfaces with improved engagement and clarity.
Real-Time Long Video Generation (GitHub Repo)
NVIDIA's LongLive 1.0 is a framework for interactive long-form video generation that supports sequential prompting and real-time user-guided editing using streaming attention and KV-cache optimization techniques.
OlmoEarth v1.1: A more efficient family of models (5 minute read)
OlmoEarth v1.1, a new model family, reduces compute costs by up to 3X while maintaining performance, making planet-scale mapping more affordable. The models efficiently process remote sensing data by optimizing token sequence lengths, crucial for reducing computational costs. Methodological improvements allow similar performance to the original version with significantly less compute, benefiting developers and enhancing scientific research in remote sensing.
A single pane of glass for managing all of your cloud agents (5 minute read)
Oz is a multi-harness control plane for cloud agents, supporting Claude Code, Codex, and Warp Agent. It offers automatic multi-agent orchestration, cross-harness Agent Memory, and improved cost and usage controls. Additionally, Oz provides expanded self-hosting options and enhanced governance features, streamlining agent management and deployment.
Introducing the Ettin Reranker Family (19 minute read)
Six new state-of-the-art CrossEncoder Ettin rerankers built on Ettin ModernBERT encoders have been released, offering models from 17M to 1B parameters. These models, trained with pointwise MSE distillation from a strong 1.54B parameter teacher, provide significant accuracy improvements over legacy models while enhancing speed, especially with Flash Attention 2. The models are particularly notable for their efficiency in retrieve-then-rerank systems and outperform models like ms-marco-MiniLM-L12-v2 on MTEB and NanoBEIR benchmarks.
ā” See how WHOOP, Stripe, and DoorDash use AI to listen to their customers (Sponsor)
The third wave of American philanthropy (23 minute read)
AI is about to generate hundreds of billions in new philanthropic funding.
Advancing content provenance for a safer, more transparent AI ecosystem (6 minute read)
OpenAI strengthens content provenance by implementing C2PA standards and Google DeepMind's SynthID watermarking for AI-generated images.
Index (2 minute read)
Index is a platform for content owners that helps them understand how AI agents use their work and earn revenue when they do.
Cerebras is now running Kimi K2.6 (1 minute read)
Kimi K2.6, a trillion-parameter model, has the fastest frontier model performance ever measured by Artificial Analysis at around 1,000 tokens per second.
TLDR is hiring a Senior Software Engineer, Applied AI ($250k-$350k, Fully Remote)
TLDR's Applied AI team is tasked with making every process at TLDR legible to code, runnable by anyone, and composable into larger workflows. Join a small, fast moving team using the latest AI tools with an unlimited token budget.
Learn more.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email