TLDR AI 2026-03-16
Claude 1M Context 🧠, OpenAI adult mode 🤖, GLM-5-Turbo ⚡
OpenAI's Bid to Allow X-Rated Talk Is Freaking Out Its Own Advisers (11 minute read)
OpenAI's plan to drop its ban on X-rated content sparked vigorous internal debate over the potential risks. The company is forging ahead with its erotica plans despite concerns from council members with backgrounds in fields like psychology and cognitive neuroscience. OpenAI staff have identified several risks, including the potential for compulsive use and emotional overreliance on the chatbot. The company has developed a plan to monitor for a range of potential long-term effects of adult mode. Adult mode's release has been delayed as OpenAI is currently prioritizing other products.
Early look at upcoming design tool from Google (2 minute read)
Google's rebranded design tool, Stitch, will transform into a 3D workspace with AI-powered collaborations, replacing the flat canvas design. Enhancements include voice controls, a conversational agent, and the ability to generate functional React applications from designs. These advancements position Stitch as a comprehensive tool from concept to production, likely featuring at Google I/O 2026.
1M context is now generally available for Opus 4.6 and Sonnet 4.6 (5 minute read)
Claude Opus 4.6 and Sonnet 4.6 now include the full 1M context window at standard pricing on the Claude Platform. This means fewer compactions and more of the conversation kept intact. 1M context is now included in Claude Code for Max, Team, and Enterprise users with Opus 4.6. Standard pricing applies across the full window, and there is no multiplier.
MCP is Dead; Long Live MCP! (20 minute read)
The current industry zeitgeist is dialed in on CLIs, just like it was on MCP a few short months ago. Using CLI can result in token savings, but custom CLIs run into the same context problems as MCP, except without structure and many other sacrifices. Individual usage of coding agents looks very different from organizational adoption of coding agents. MCP is the present and future for enterprise and org-level use cases.
Can LLMs Be Computers? (20 minute read)
Transformers can execute programs efficiently inside their own inference loop. This opens a path towards AI systems that integrate learned representations with compiled algorithms inside a single computational substrate. Solving humanity's toughest problems will require systems that can both reason flexibly and compute reliably. Future AI systems will have software as part of the model.
The Five Categories of World Models (5 minute read)
AMI Labs and World Labs both raised over $1B on "world models," but the term covers five distinct approaches: JEPA, spatial intelligence, learned simulation, physical AI infrastructure, and active inference. The sharpest result in the piece is V-JEPA 2, achieving zero-shot robot planning after training on just 62 hours of domain-specific data. Each approach solves a different subproblem, and the lines between them are expected to blur fast.
👨💻
Engineering & Research
Replay: The conference where devs actually build durable AI systems (Sponsor)
Replay is Temporal's practical conference for devs building real systems. Join for hands-on workshops and leave with prod-ready AI agents.
Save with code TLDR75Introducing Attention Residuals: Rethinking depth-wise aggregation (2 minute read)
Residual connections have long relied on fixed uniform accumulation. Attention Residuals replace standard depth-wise recurrence with learned, input-dependent attention over preceding layers. It enables networks to selectively retrieve past representations, naturally mitigating dilution and hidden-state growth. Attention Residuals have been validated on the Kimi Linear architecture, demonstrating consistent downstream performance gains.
GLM-5-Turbo (1 minute read)
Z.ai's GLM-5-Turbo offers a streamlined API for generating text responses, catering to tasks like crafting marketing slogans. Key features include customizable roles, real-time streaming, and adjustable creativity through the "temperature" setting.
Faster Sparse Attention with IndexCache (GitHub Repo)
IndexCache reduces the cost of DeepSeek Sparse Attention by reusing top-k token indices across layers instead of recomputing them every time. The approach removes most indexer computations while maintaining model quality.
Test-Time Training (GitHub Repo)
Spatial-TTT is a framework that updates a spatial state from streaming visual inputs using test-time training, enabling models to reason over video-based spatial tasks. The method processes visual chunks incrementally and achieves strong results on video spatial reasoning benchmarks.
Tech Boss Uses AI and ChatGPT to Create Cancer Vaccine for His Dying Dog (3 minute read)
Sydney data engineer Paul Conyngham used ChatGPT, AlphaFold protein modeling, and a $3,000 genomic sequencing of his dog Rosie's tumor to produce a half-page mRNA formula that UNSW's RNA Institute turned into a physical vaccine in under two months. One tumor shrank 75% after her first injection in December 2025, marking the first personalized cancer vaccine ever designed for a dog. The pipeline mirrors Moderna and Merck's ongoing human trials. UNSW researchers are now asking why the same approach isn't being applied more broadly to cancer patients.
Cerebras is coming to AWS (3 minute read)
AWS is deploying Cerebras CS-3 systems to offer the industry's fastest AI inference via AWS Bedrock, using open-source LLMs and Amazon's Nova models. The collaboration introduces a disaggregated architecture, pairing AWS Trainium for prefill with Cerebras WSE for decode, boosting token throughput by 5x. This setup enhances high-speed inference performance by efficiently utilizing specialized hardware for each computational phase.
Give your AI real-time web access (Sponsor)
claudetop (GitHub Repo)
claudetop shows users where exactly their tokens and dollars go in real time.
The Rise of Agent Computers (2 minute read)
AMD is pitching a new device category called the "Agent Computer," always-on local hardware that runs AI agents continuously in the background while you sleep or work on other things, delegating tasks through Slack, WhatsApp, or iMessage.
LLM Architecture Gallery (Website)
This page features a collection of architecture figures and fact sheets from The Big LLM Architecture Comparison and A Dream of Spring for Open-Weight LLMs, focusing on the architecture panels only.
ByteDance Delays Global Release of Seedance 2.0 (2 minute read)
ByteDance paused the planned global rollout of its Seedance 2.0 AI video generator after viral clips triggered legal complaints from major Hollywood studios over alleged use of copyrighted characters and likenesses.
US Job Market Visualizer (Website)
This research tool visualizes 342 occupations, showing projected growth outlook, median pay, education requirements, and AI exposure.
Moonshot AI targets $1b raise, eyes $18b valuation (4 minute read)
Its rivals, Zhipu and MiniMax, recently traded at valuations between $30 billion and $40 billion in Hong Kong.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email