TLDR AI 2025-07-16
Thinking Machines’ $12B valuation 💰, working at OpenAI 💼, Mistral Voxtral 🔊
Reflections on Working at OpenAI (27 minute read)
A former OpenAI employee shares personal reflections on the company's culture and mission, describing it as a uniquely impactful yet complex place to work. The post provides insight into the internal atmosphere during a pivotal time.
Grok 4 Various Things (48 minute read)
xAI had the goal of releasing something that could be called 'the world's smartest artificial intelligence' - and it successfully found benchmarks to enable it to take that claim. However, these benchmarks are misleading. While Grok 4 has a lot of raw intelligence, for most practical purposes, it seems to be inferior to OpenAI's o3. This post takes a more nuanced look at Grok 4's abilities.
👨💻
Engineering & Research
Don't build AI agents - hire them instead (Sponsor)
Context Rot: How Increasing Input Tokens Impacts LLM Performance (31 minute read)
LLM performance degrades significantly as input length increases, even on simple tasks like text retrieval and replication. Multiple controlled experiments revealed that even frontier models don't process context uniformly, and performance becomes increasingly unreliable with longer inputs.
Block Open Sources Goose AI Agent (GitHub Repo)
A coding AI agent that supports any LLM backend, including local models, and has both desktop and CLI interfaces. Like typical coding agents, it handles end-to-end development workflows from planning to testing.
Asymmetry of verification and verifier's law (6 minute read)
Asymmetry of verification is the idea that some tasks are much easier to verify than to solve. Examples of this are everywhere, for example, Sudokus take a lot of time to solve, but it is trivial to check if any given solution is correct. One of the most important realizations about asymmetry of verification is that it is possible to improve the asymmetry by having privileged information about the task - for example, it is trivial to check test answers with an answer key on hand. AI will likely be much better at verifiable tasks because it's much easier to solve verifiable tasks.
The "Bubble" of Risk: Improving Assessments for Offensive Cybersecurity Agents (4 minute read)
Improving AI agents for offensive cybersecurity tasks is cheap and easy: with just $36 of compute time, Princeton researchers increased attack success rates by over 40% through simple techniques like prompt refinement and self-training. Static safety assessments miss this "bubble of risk" where adversaries can cheaply adapt open-source models beyond their original safety profiles, especially in cybersecurity, where clear success signals enable rapid iteration.
Underwriting Superintelligence (27 minute read)
The Incentive Flywheel, discovered by Benjamin Franklin, has been at the heart of balancing progress and security for new technology waves since it was created when fires threatened Philadelphia's growth. It won't appear fast enough on its own for AI - it needs to be jump-started. This essay outlines 25 actions that entrepreneurs and policymakers must take by 2030 across agents, foundation models, and data centers. If the West slows down progress in AI, China could dominate the 21st century, but if it accelerates recklessly, accidents will halt progress - like with nuclear power.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email