TLDR AI 2025-09-15
Grok 4 Fast ๐ค, post-training 101 ๐, xAI layoffs ๐ผ
Baseten Raises $150M Series D at $2.15B Valuation (Sponsor)
As the next generation of AI applications comes to market, ambitious teams are turning to
Baseten for inference infra that not only keeps up with but accelerates their innovation.
Fueled by the growth of customers like OpenEvidence, Abridge, Clay, Sourcegraph, Hex, and Zed, Baseten has announced a $150M Series D at a $2.15B valuation - just 6 months after a $75M Series C.
This funding will help the team meet surging demand and continue powering the fastest-growing AI companies. Try them out here.
xAI Lays Off 500 Data Annotators (1 minute read)
xAI has reportedly laid off a third of its data annotation team as it pivots to expand its specialized AI tutor division.
xAI launches Grok 4 Fast in early access beta with up to 10x speed (1 minute read)
Grok 4 Fast, the newest addition to xAI's lineup, is now available for users on the Grok web interface via the model selector. It can be accessed by enabling a new toggle in the Subscription settings. Marked as an early access beta, Grok 4 Fast is up to 10 times quicker than the standard Grok 4. It is optimized to respond rapidly by spending minimal processing time on complex tasks, which limits its creative abilities.
๐ง
Deep Dives & Analysis
Post-Training 101 for LLMs (39 minute read)
A walkthrough of the entire post-training lifecycle of LLMs, from supervised fine-tuning and reward modeling to reinforcement learning methods such as RLHF, along with evaluation best practices.
The Vertical AI Playbook (Book)
Despite billions invested, 42% of enterprise AI initiatives were discontinued in 2024. This was caused by how models were embedded into business. The winners redesign workflows, rethink structures, and take ownership of the service layer where value is created. The next generation of CEOs will treat AI as a labor class and deploy the technology with the same discipline that the most successful serial acquirers apply to capital.
Breaking GPT-OSS: A brief investigation (6 minute read)
This article evaluates different jailbreaking methods against gpt-oss. The model appears to have had robust safety training in both system prompting and refusal vector attacks. It is tricky to work with, and not all libraries support its idiosyncrasies.
๐จโ๐ป
Engineering & Research
Gartner's latest Magic Quadrant compares the top cloud infrastructure providers (Sponsor)
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs (1 minute read)
Real-world value often stems from the length of a task an agent can complete. Marginal gains in single-step accuracy can compound into exponential improvements in the length of a task a model can successfully complete. Models are more likely to make mistakes when the context contains errors from previous turns. Failures when tasks are made longer arise from mistakes in execution rather than an inability to reason.
The second wave of MCP: Building for LLMs, not developers (3 minute read)
Teams that shift from API shaped tools to workflow-shaped tools see meaningful improvements in reliability and efficiency. MCP works best when tools handle complete user intentions rather than exposing individual API operations. Large language models don't work like developers - they have to constantly rediscover which tools exist, how to use them, and in what order, so building tools around workflows produces better results.
VaultGemma: The world's most capable differentially private LLM (11 minute read)
VaultGemma is a model that Google trained from scratch with Differential Privacy (DP). DP offers a mathematically robust solution to user privacy that adds calibrated noise to present memorization. It has some trade-offs, like reducing training stability and significantly increasing batch size. There is still a utility gap between DP-trained and non-DP-trained models, but that gap can be systematically narrowed with more research on mechanism design for DP training.
AI Will Not Make You Rich (35 minute read)
Most of the new value created by AI will be captured by consumers, who will see wider and more affordable access to services like medical care, education, and advice. Knowledge-intensive services will get cheaper, allowing consumers to buy more of them. At the same time, services that require person-to-person interaction will get more expensive and take up a greater percentage of household spending. There will be obvious opportunities in both. Think through the implications of knowledge workers becoming more efficient, imagine what markets this efficiency unlocks, and invest in those.
You should be rewriting your prompts (6 minute read)
Models aren't perfectly interchangeable - if you are switching models, rewrite your prompts. Prompts overfit to models the same way models overfit to data. They need to be tested, evaluated, and aligned with the defaults of the new model. Adapting prompts will save tokens while producing better results.
Warp announces Warp Code - the ultimate agentic development environment (Sponsor)
Warp already beat Claude Code and Cursor in agent benchmarks. Now it has a nifty editor, code review, and other tools that make it the perfect AI coding environment.
Try Warp Code for freeManaging Agent Memory with Sessions (19 minute read)
How to manage short-term memory for AI agents using the OpenAI Agents SDK, employing trimming and compression techniques to keep sessions coherent, fast, and reliable.
Nvidia steps back from DGX Cloud โ stops trying to compete with AWS and Azure (2 minute read)
Nvidia now uses its DGX Cloud capacity for internal research.
OpenAI Grove Program Announcement (1 minute read)
OpenAI has announced a 5-week residency for early-stage technical founders, offering mentorship, early tool access, and peer collaboration to explore new AI product ideas.
Understanding GPU Architecture (35 minute read)
Cornell's Center for Advanced Computing published an interactive workshop covering GPU memory hierarchies, streaming multiprocessors, and detailed breakdowns of NVIDIA's Tesla V100 and Quadro RTX 5000 architectures.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email