TLDR AI 2025-12-17
ChatGPT removes router πΊοΈ, GPT-Image-1.5 π·, Meta SAM Audio π΅
OpenAI Rolls Back ChatGPT's Model Router System for Most Users (3 minute read)
OpenAI quietly abandoned the automatic model router, a key feature of its GPT-5, for Free and Go tier users, defaulting them to GPT-5.2 Instant instead of routing complex queries to reasoning models. The router had increased free users' reasoning model usage from under 1% to 7% but a source says it hurt daily active users because people don't want to wait 20 seconds for a better answer.
GPT-Image-1.5 (9 minute read)
OpenAI has launched GPT-Image-1.5, a faster and more accurate image generation model with improved instruction-following and editing. The rollout follows competitive pressure from Google's Gemini 3 and Nano Banana Pro models, which recently led key benchmarks.
SAM Audio (3 minute read)
Meta has introduced SAM Audio, a new model capable of isolating specific sounds from complex audio using text, visual, or time-based prompts. As part of the Segment Anything family, it brings flexible, prompt-driven sound editing to tasks like removing background noise or isolating instruments in a recording.
π§
Deep Dives & Analysis
Prompt caching: 10x cheaper LLM tokens, but how? (30 minute read)
Catched input tokens are currently 10 times cheaper than regular input tokens for both OpenAI and Anthropic's APIs. They can even reduce latency by up to 85% for long prompts. It is clear that these providers aren't just saving responses and reusing them. This post takes a look at what's actually cached and how prompt caching works in general.
Inference Economics 101: Reserved Compute versus Inference APIs (20 minute read)
The infrastructure layer is no longer converging on a single 'best' model. Instead, it is splitting into two durable and economically attractive markets. One side provides reserved and hourly compute platforms designed to deliver predictability, control, and determinism for customers willing to operate infrastructure directly. The other side operates inference APIs, trading some control for scale while absorbing utilization risk and abstracting complexity in exchange for cost-efficiency and speed. Understanding the split is essential to evaluating where value will accrue as inference becomes ubiquitous.
Is resumable LLM streaming hard? No, it's just annoying (13 minute read)
The state-of-the-art in large language model streaming is surprisingly bad. Both Gemini and Claude require users to refresh the page if their conversation is interrupted mid-stream. This guide shows readers how to implement good, resumable LLM streams. This allows users to refresh tabs mid-stream, switch between chats mid-stream, navigate to other sections of an app without interrupting a stream, and continue streams through momentary internet connection disruptions.
The Shift From Text to Dynamic AI Experiences (3 minute read)
OpenAI's new CEO of Applications outlines how ChatGPT is moving beyond chat toward a "fully generative UI" that surfaces the right components based on what you're doing: a dedicated image studio, inline writing blocks, quick visual answers for measurements or sports scores, and tappable highlights that pull up additional context.
Meta AI Glasses with Audio Focus (2 minute read)
Meta updated its smartglasses with a feature to isolate conversations in noisy environments and a new Spotify integration that plays music based on what you're looking at. The updates roll out first on Ray-Ban Meta and Oakley Meta HSTN glasses in the US and Canada.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email