TLDR AI 2026-03-13
Claude Visuals 💻, xAI poaches Cursor leads 💼, CursorBench 🧑💻
Cursor is raising at a $50 billion valuation (3 minute read)
Elon Musk just pulled two of Cursor's product leaders into a $1.25 trillion company and told them to build xAI's coding product. AI coding is now a $5 billion+ market. xAI is the only frontier lab without a coding product printing money. AI labs have figured out that developers are the highest-willingness-to-pay users on the planet.
Claude now creates interactive charts, diagrams, and visualizations (5 minute read)
A beta version of Imagine with Claude is coming to Claude Chat. The feature can create custom charts, diagrams, and other visualizations, and then tweak and modify its creations as the conversation develops. The feature will be on by default, with Claude deciding when to build a visual for something. Users can directly request a visual. Videos showing example visualizations are available in the article.
Meta Delays Rollout of New AI Model After Performance Concerns (6 minute read)
Meta's Avocado model outperforms the company's previous model and the Google Gemini 2.5 model from March, but it doesn't perform as strongly against leading models from Google, OpenAI, and Anthropic. The company has decided to delay the model's release to at least May. Experts believe Meta still has time to catch up to rivals. The company's Llama 4 release fell short of expectations last year.
AI Writes Buggy Code. A Silicon Valley Start-Up Wants to Fix It (4 minute read)
Axiom raised a $200M Series A at a $1.6B valuation led by Menlo Ventures, one year after founding and with roughly 20 employees, to build what it calls Verified AI: systems that produce formally verified outputs in the programming language Lean, where every reasoning step is machine-checkable, and errors are caught deterministically rather than statistically. The company originally trained on math proofs and then transferred that skill to code verification, a technique that lets it prove not only that code produces correct outputs but also that it won't create unexpected attack surfaces.
Reverse-engineering Claude's generative UI - then building it for the terminal (27 minute read)
Claude's generative UI is a tool call that returns HTML injected into the DOM with incremental parsing as tokens stream. It enables interactive widgets to be rendered inline in Claude conversations. It uses lazy-loading documentation to insert context on demand. The terminal spawns Glimpse windows with bidirectional JSON communication, allowing the terminal to stay as a terminal while the visual content gets a real browser engine.
Agentic Commerce (11 minute read)
Commerce has historically been closer to a seller's market. Consumers are largely shown what platforms choose to surface. AI agents will gradually shift the power toward the buyer. Instead of navigating the layer of market and platform incentives, AI will enable consumers to access the best product for their needs directly. This dynamic may be most powerful in B2B commerce, which has historically been opaque, fragmented, and relationship-driven.
Institutional AI vs Individual AI (16 minute read)
Institutional AI could unlock far more value than individual AI by redesigning organizations alongside technology. Unlike individual AI, which boosts productivity with little firm value impact, institutional AI focuses on coordination, finding signals in data, countering bias, and scaling revenue. Successful organizations will be those that integrate AI as a foundational component, similar to the industrial shift to electric assembly lines.
👨💻
Engineering & Research
Your AI is only as smart as its data (Sponsor)
How we compare model quality in Cursor (7 minute read)
Cursor uses a hybrid online-offline eval process to keep its understanding of model quality aligned with what developers actually do. The offline part uses CursorBench, an internal eval suite based on real Cursor sessions from Cursor's engineering team. It measures multiple dimensions of agent performance, including collusion correctness, code quality, efficiency, and interaction behavior. The online part involves a controlled analysis on live traffic, which catches regressions that offline suites miss. Together, this loop keeps Cursor's notion of model quality grounded in production as workflows change.
We built RLM for coding. And it F*cking rocks. Swarm native agents are here to stay (13 minute read)
Slate is a frontier agent that uses a code environment for swarm orchestration. It can programmatically orchestrate and solve tasks by running a massive number of subagents. Slate supports a range of models and automatically selects the right model for the job. It uses novel context engineering and maximizes caching to lower costs.
OpenClaw Reinforcement Learning Framework (GitHub Repo)
OpenClaw-RL is an asynchronous reinforcement learning framework for training personalized AI agents from everyday conversations. It supports large-scale environment parallelization for developing more general agents.
I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead (14 minute read)
A single run(command="...") tool with Unix-style commands outperforms a catalog of typed function calls. Unix made the design decision 50 years ago to make everything a text stream. Large language models made an almost identical decision by making everything tokens. The text-based system that Unix uses is a natural fit for large language models. They can act as terminal operators faster than any human. Developers just need to take what Unix has proven over the last half-century and hand it directly to AI.
The Shape of the Thing (10 minute read)
AI transitioned from co-intelligence to an era of AI management, with systems like Claude Code completing complex tasks autonomously. Exponential improvements in AI capabilities have sparked radical work methods, such as StrongDM's AI-driven Software Factory, and prompted significant market and employment shifts.
China's ByteDance Gets Access to Top Nvidia AI Chips (5 minute read)
ByteDance is assembling computing power with high-end Nvidia chips outside China. It is working with a Southeast Asian company called Aolani Cloud on plans to use around 500 Nvidia Blackwell computing systems in Malaysia. The hardware involved is valued at more than $2.5 billion. ByteDance plans to use the computing power for AI research and development outside of China.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email