TLDR AI 2025-07-25
Cursor Bugbot π, GPT-5 in August 5οΈβ£, startup AI budgets π°
π§
Deep Dives & Analysis
Kimi K2 vs. Claude 4 Sonnet for Agentic Coding (13 minute read)
This post benchmarks Moonshot AI's Kimi K2, a low-cost open-source model optimized for agentic coding tasks, against Anthropic's Claude 4 Sonnet. The comparison covers code quality, performance, and pricing. Kimi K2 showed competitive results and a significant cost advantage.
AI As Profoundly Abnormal Technology (58 minute read)
AI technology will experience a profound jump in capabilities within the next decade. While some groups predict that AI adoption will be slow due to concerns about safety and other barriers to progress, the technology seems to be advancing very quickly. Control without alignment may not be sufficient to mitigate risk. Developers should prepare for risks that aren't immediate, as doing otherwise would be irresponsible.
Budgeting for AI in Your Startup (2 minute read)
Startups should allocate 10-15% of their R&D budget to AI, as engineer salaries average $200k and AI tools cost about $30k per year. AI adoption varies, with AI-native startups potentially spending more. Companies should adjust as AI becomes more integrated into operations.
π¨βπ»
Engineering & Research
How Delve turned 4 newsletter ads into $1M pipeline (Sponsor)
Delve ran just 4 ads in TLDR newsletters. The payoff?
- 66 high-quality leads (including enterprise companies)
- $1M in pipeline
- 52x ROI
We break down Delve's exact strategy, detailed results, and share the actual ads that crushed it.
Read the Delve case study.
Gemini 2.5 Pro Capable of Winning Gold at IMO 2025 (1 minute read)
Completing the International Mathematical Olympiad (IMO) requires deep insight, creativity, and formal reasoning. Large language models usually struggle with Olympiad-level tasks, but Google's Gemini 2.5 Pro was able to complete five out of six problems correctly on the IMO 2025. The result shows how important it is to develop optimal strategies to harness the full potential of powerful models for complex reasoning tasks.
Qwen-MT: Where Speed Meets Smart Translation (6 minute read)
The latest update of Qwen-MT (qwen-mt-turbo) builds upon the powerful Qwen3, achieving significant improvements in translation accuracy and linguistic fluency. The updated model leverages trillions of multilingual and translation tokens to comprehensively enhance its multilingual understanding and translation capabilities. Its key features include multilingual support for 92 languages, high customizability, low latency, and cost efficiency. This post provides a quick start guide along with benchmark results.
Memories.ai introduces a new model that remembers at superhuman scale (2 minute read)
Memories.ai is a research lab built by ex-Meta researchers to take on the challenge of true, persistent video understanding at any scale. It recently introduced an advanced memory system for video AI inspired by human memory. The system enables persistent Video Chat across entire archives. Memories.ai aims to become an indispensable tool for creators, marketers, researchers, and developers seeking to unlock the value hidden inside their video archives.
TimeScope Video Understanding Benchmark (18 minute read)
TimeScope is an open-source benchmark designed to test vision-language models on long videos by inserting short "needle" clips. It measures localized retrieval, information synthesis, and fine-grained temporal perception, showing that many top models still lack robust temporal understanding.
$1 Billion Worth of Nvidia AI Chips Smuggled to China Despite Export Controls (10 minute read)
A thriving black market has moved over $1 billion worth of banned Nvidia B200 chips to China in just 3 months by routing shipments through Southeast Asian countries like Malaysia and Thailand before final delivery. Chinese distributors are openly advertising ready-to-deploy server racks at 50% premiums and developing new routes through European countries as the US prepares to tighten controls on regional intermediaries.
The Three Layers of ROI for AI Agents (4 minute read)
This post discusses a three-layer framework for determining where the ROI for AI agents comes from. The first layer is labor efficiency - while this layer is easy to explain, developers need to understand that AI efficiency doesn't equal immediate realized ROI. The second layer is net-new revenue, the backlog of stuff businesses never did until AI that create new value. The final layer is optimization - AI models bring decision fluency and ML brings decision precision, which creates value.
Google's AI Overviews have 2B monthly users, AI Mode 100M in the US and India (2 minute read)
Google's AI Overviews now serve 2 billion monthly users across 200 countries, up from 1.5 billion in May, and the company's monthly token processing has doubled to 980 trillion tokens.
I've joined Cognition (2 minute read)
Prem Qu Nair, employee #2 at Windsurf, has joined Cognition to work on the future of software engineering
NEW LABS EXPERIMENT (1 minute read)
Opal, Google Labs' new tool that helps users build and share AI mini-apps by linking together prompts, models, and tools while using simple, natural language, is now available in US-only public beta.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email