TLDR AI 2025-08-15
Gemma 3 270M π€, Cohere raises $500M π°, doomprompting π±
Deploy the fastest open source LLMs with Baseten (Sponsor)
Deploy Kimi K2, GPT-OSS, DeepSeek, Qwen 3, and more with Baseten Model APIs. Get leading performance and the most competitive time to first token on open source LLMs. Baseten's inference runtime unlocks production performance at any scale.
What that means for you:
- πDeploy state of the art open source models on launch day
- π₯Leading performance as measured on OpenRouter:
- 500+ TPS on GPT-OSS
- 75+ TPS on Kimi K2
- 100+ TPS on DeepSeek R1
- 70+ TPS on Qwen3 Instruct
- 85+ TPS on DeepSeek v3
- βοΈPay per million tokens, get started in 3 lines of code
- β
99.99% reliability
Deploy Baseten Model APIs
Introducing Gemma 3 270M: The compact model for hyper-efficient AI (5 minute read)
Google's Gemma 3 270M is a compact 270 million parameter model designed for task-specific fine-tuning that offers strong instruction-following and text structuring capabilities. The model ensures extreme energy efficiency - it only uses 0.75% battery for 25 conversations on a Pixel 9 Pro SoC. Ideal for high-volume, well-defined tasks, it allows for rapid iterations and on-device processing to reduce costs and ensure privacy.
Leak: OpenAI's browser will use ChatGPT Agent to control the browser (2 minute read)
OpenAI is preparing a Chromium-based browser that will use Agent mode to control actions. Agent mode is being updated to choose between operating on a remote cloud/virtual browser and a first-party local browser. It looks like the cloud browser feature will only be enabled as a fallback when using OpenAI's upcoming browser.
Cohere hits a $6.8B valuation as investors AMD, Nvidia, and Salesforce double down (3 minute read)
Cohere raised $500 million at a $6.8 billion valuation, marking a $1.3 billion increase from its previous round just over a year ago. The Toronto-based company has carved out a distinct position targeting enterprise security rather than consumer applications, recently nabbing longtime Meta research head Joelle Pineau as chief AI officer.
Meta Released DINOv3 (4 minute read)
Meta's DINOv3 is a scalable self-supervised learning model that delivers state-of-the-art results across varied image domains like web and satellite imagery.
π§
Deep Dives & Analysis
Doomprompting Is the New Doomscrolling (12 minute read)
"Doomprompting", the cycle where AI interactions devolve from purposeful queries into mindless iteration loops, has become a new form of digital addiction. What starts as intentional problem-solving degrades into passive negotiation with systems that offer infinite variations without genuine understanding. Unlike doomscrolling's passive consumption, doomprompting creates an illusion of productivity where users feel they're thinking and creating while they're actually outsourcing cognitive effort.
Exploring Foundation Models' Tool-Use Efficacy (4 minute read)
Anthropic's Model Context Protocol (MCP) standardizes LLM tool use. It is quickly becoming the norm for tool integrations. Despite MCP's rapid adoption, experiments show that models often failed to use tools effectively, with GPT-5 performing best but struggling when limited to only relevant tools. This highlights ongoing challenges in generalizing tool use and optimizing AI agent performance in complex scenarios.
π¨βπ»
Engineering & Research
βοΈ Cut your QA cycles down to minutes with QA Wolf (Sponsor)
If slow QA processes bottleneck you or your software engineering team and you're releasing slower because of it β you need to check out
QA Wolf.
QA Wolf's fully-managed, AI-native service supports web and mobile apps, delivering 80% automated test coverage in weeks and helping teams ship 5x faster by reducing QA cycles to minutes.
With QA Wolf, Drata's team of 80+ engineers achieved 4x more test cases and 86% faster QA cycles.
β Rated 4.8/5 on G2. Trusted by Cohere, AutoTrader, Salesloft, and many others.
Schedule a demo to learn more
In-Context Vector Arithmetic (17 minute read)
A theoretical framework explaining how transformers perform factual-recall ICL tasks using vector arithmetic. It builds on hierarchical concept modeling to show that nonlinear residual transformers trained with gradient descent converge on 0-1 loss.
OpenCUA: Open Foundations for Computer-Use Agents (21 minute read)
OpenCUA provides a comprehensive open-source toolkit for building computer-use agents that includes data collection tools, training pipelines, and 22K human demonstration trajectories across three operating systems and 200+ applications. Its key innovation is "reflective long Chain-of-Thought" reasoning that helps agents identify and recover from errors during multi-step tasks.
Crystal (GitHub Repo)
Crystal is an Electron desktop application that lets users run, inspect, and test multiple Claude Code instances simultaneously using git worktrees. Users can resume conversations anytime and test changes instantly without leaving Crystal. Crystal has built-in rebase and squash operations, and users can view diffs and track modifications. It allows users to test, compare approaches, and manage AI-assisted development workflows in one desktop app.
Is chain-of-thought AI reasoning a mirage? (10 minute read)
Whether chain-of-thought reasoning is real or not is primarily a philosophical question. It depends on having a clear definition of what 'real' reasoning is, and there is no single consensus definition of ideal reasoning. Good model reasoning papers both directly assess the quality of human reasoning skills, or ideally, provide a tight philosophical definition of what 'real' reasoning is, and use tasks that actually require reasoning rather than simply computation. They use tasks that have many paths to success and don't draw sweeping conclusions about 'real' reasoning.
Twitter's Ex-CEO Is Moving Past His Elon Musk Drama and Starting an AI Company (7 minute read)
Parag Agrawal, Twitter's former CEO, started a company called Parallel Web Systems Inc. after being fired by Elon Musk. The company has built an infrastructure product to help AI agents search the web to find the most accurate information. One of its initial product lines is an API called 'deep research' that draws from a variety of relevant data sources on the web to create an organized analysis with citations and confidence scores that measure reliability. While Agrawal likes to say his customer is the AI agent itself, for now, Parallel's customers are developers and businesses that want to integrate its tools into their AI workflows.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email