TLDR AI 2026-05-12
Interaction Models 🤖, Gemini Omni surfaces 🎥, SpaceXAI 🚀
Interaction Models: A Scalable Approach to Human-AI Collaboration (9 minute read)
Thinking Machines Lab introduced a new research preview of interaction models for real-time human-AI collaboration across audio, video, and text. The models train from scratch with a multi-stream design for real-time responsiveness, allowing constant exchange and eliminating the traditional turn-based limits. This scalable approach promises enhanced interactivity and intelligence with practical applications in various domains.
Elon Musk Announces xAI Will Become SpaceXAI Division (2 minute read)
Elon Musk announced that xAI will dissolve and integrate into SpaceX as a new division called SpaceXAI. SpaceXAI will handle AI projects like the social media platform X and Grok, branding them under SpaceX. This change streamlines operations, enhances vertical integration, and aligns AI efforts with SpaceX's strategic goals.
Google's Gemini Omni video model surfaces ahead of I/O debut (2 minute read)
Google's Gemini Omni video model surfaces ahead of I/O, integrating video remixing and editing directly in chat. Early feedback highlights its strong editing capabilities, such as watermark removal and object swapping, though it lags in raw cinematic quality compared to competitors like ByteDance's Seedance 2. The model may launch in tiered versions, possibly Flash and Pro, as part of a broader strategy to unify modalities under Gemini.
The Inference Shift (8 minute read)
Cerebras' surging IPO signals a coming split between "answer inference" optimized for token speed and "agentic inference" optimized for memory hierarchy. Cerebras' WSE-3 has 44GB of on-chip SRAM at 21 PB/s, roughly 6,000 times the memory bandwidth of an H100, making it perfect for human-facing low-latency answers including voice and AI wearables but unsuitable when KV caches and model weights exceed on-chip capacity.
Foundation Model Scaling (34 minute read)
AWS detailed how foundation model scaling has shifted beyond pre-training into post-training and test-time compute, alongside the distributed infrastructure required to support each stage efficiently.
👨💻
Engineering & Research
TLDR is hiring a Senior Software Engineer, Applied AI ($250k-$350k, Fully Remote)
TLDR's Applied AI team is tasked with making every process at TLDR legible to code, runnable by anyone, and composable into larger workflows. Join a small, fast moving team using the latest AI tools with an unlimited token budget.
Learn more.
Trajectory Models for Few-Step Diffusion (22 minute read)
Normalizing Trajectory Models replace standard diffusion denoising steps with conditional normalizing flows, enabling four-step image generation while retaining exact likelihood training and supporting self-distillation.
Agentic Test-Time Scaling (GitHub Repo)
AutoTTS explores automated strategy discovery for test-time scaling by using coding agents to iteratively refine controller logic inside a replay environment, avoiding gradient updates and online LLM calls.
Long Video Generation (4 minute read)
A²RD introduced an agentic autoregressive diffusion framework for generating long coherent videos through iterative retrieval, synthesis, refinement, and memory updates.
The Main Path to Truly Creative AI (4 minute read)
AI lacks creativity akin to humans due to the absence of intrinsic drives and subjective experiences. Emulating feelings could enhance AI's creative capabilities, but raises ethical concerns. Designing AI to genuinely feel and desire might lead to unintended consequences, akin to responsibility seen in parenting.
Auto-Improving Software (5 minute read)
Bedi runs an entire agent development lifecycle through five Claude Code prompts that scaffold, harden against spec, add capabilities, fix eval failures, and reconcile drift between docs, code, and config across his Agno-based platform. The Improve loop derives 8-12 probes from an agent's instructions, runs each against the live container via cURL, judges PASS or FAIL from container logs, then iterates up to five rounds picking levers like tightening rules, swapping tools, or bumping num_history_runs until probes pass, while Hill Climb runs the saved eval suite and fixes regressions in place.
Codex is for prosumers - here's why (and how) to switch (4 minute read)
a16z's Olivia Moore migrated her agentic workflows from Claude Cowork and Claude in Chrome to OpenAI's Codex, recommending most non-technical knowledge workers do the same now that the February desktop app, Plugins, and Automations collapse the ChatGPT-Claude-Cowork interface switching into one product. Codex ships one-click installable Skills she expects to anchor an internal and cross-user marketplace given that non-programmer Skill setup attempt rates likely sit below 10% on Claude, while Codex Pets give low-friction task status updates for users who do not live in an IDE.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email