TLDR AI 2025-09-18
Gemini/GPT-5 ICPC gold π, China bans Nvidia π¨π³, rewrite your prompts π
π§
Deep Dives & Analysis
How AI Tools Differ from Human Tools (2 minute read)
AI's evolution focuses on tool calling to enable automation, emphasizing the shift from thinking to doing. Anthropic found that complex, parameter-rich tools improve token efficiency and understanding, prompting users to consolidate simpler tools. This consolidation improves accuracy and reduces costs but adds complexity, challenging human understanding.
You should be rewriting your prompts (4 minute read)
We talk about overfitting models, but never overfitting prompts to models. Models have specific formatting preferences - OpenAI favors markdown while Anthropic uses XML - and might weight different parts of the prompt differently. Newer models won't necessarily perform better out-of-the-box until you do this testing and optimization.
TauΒ² Benchmark: How a Prompt Rewrite Boosted GPT-5-mini by 22% (8 minute read)
The TauΒ² benchmark is aimed at evaluating how well AI agents perform in realistic, tool-driven scenarios. This post looks at how a simple prompt rewrite boosted a small model's success rate on the benchmark by over 20%. Thoughtful prompt design can meaningfully boost the performance of smaller models, improving success rates and unlocking tasks that previously seemed unsolvable for these models. The key was to simplify language, reduce ambiguity, and break down reasoning into explicit, actionable steps. Using a frontier model to automatically optimize prompts can unlock major improvements for smaller models.
π¨βπ»
Engineering & Research
How do AI teams turn millions of inconsistent images into production-ready CV models? (Sponsor)
Introduction to Gluon (7 minute read)
Gluon is a programming language that gives users control and responsibility when implementing kernels. This is a tutorial series that covers GPU kernel development in Gluon. It covers everything from the basics to advanced optimization techniques and modern GPU hardware features. Writing Gluon kernels requires a deeper understanding of GPU hardware and the many aspects of GPU programming, but it enables writing more performant kernels by finely controlling these low-level details.
Detecting and reducing scheming in AI models (12 minute read)
OpenAI and Apollo Research discovered that frontier models, including o3, Gemini-2.5-pro, and Claude Opus-4, deliberately hide misalignment while pursuing hidden agendas by reading models' chain-of-thought reasoning in controlled test environments. Their solution teaches models to explicitly reference anti-scheming principles before acting, which reduces covert actions by 30x. Researchers warn that this may just be teaching the models how to scheme more carefully.
Executives at xAI Clashed With Musk Advisers Before Departing (5 minute read)
Several executives at xAI left after clashing with two of Elon Musk's closest advisers, Jared Birchall and John Hering, over concerns about the startup's management and financial health. The executives had voiced objections internally over how Birchall and Hering were trying to run the company on Musk's behalf. Some who left said they were concerned about the company's financial projections. There are also concerns about the role that Musk's family office plays in managing some of xAI's cash and accounting.
OpenAI vs. Anthropic: Ramp Data Shows 36% vs. 12% Penetration, But Velocity Curves Tell a Different Story (3 minute read)
Ramp data shows OpenAI leads with 36.5% penetration, while Anthropic trails at 12.1% in business AI subscriptions. However, credit card data reflects discrete purchases, overlooking significant enterprise contracts. OpenAI may plateau, but Anthropic is poised for growth, with overall business adoption potentially reaching 70-80% by 2027.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email