TLDR AI 2025-06-20
GPT-5 this summer 5️⃣, LLM economics 💰, Software 3.0 💻
Algolia's new MCP server makes AI search a breeze (Sponsor)
Tired of spending valuable time analyzing, monitoring and searching through your index? Algolia's
new MCP server makes these tasks simple.
AI agents can now easily handle prompts like:
- "Search my 'products' index for Nike shoes under $100."
- "Add the top 10 programming books to my 'library' index using their ISBNs as objectIDs."
- "Show me the top 10 searches with no results in the DE region from last week.”
More than 18,000 customers across 150+ countries use Algolia to deploy fast, scalable search in their applications and websites.
See more examples and get started here →
Sam Altman Says GPT-5 Coming This Summer, Open to Ads on ChatGPT (1 minute read)
Early testers are calling GPT-5 "materially better" than GPT-4, though Sam Altman gave no specific launch date for the new model beyond summer. Altman floated advertising possibilities but drew a hard line against letting payments influence responses, suggesting ads might appear outside the model's output stream.
MiniMax's Hailuo 02 tops Google Veo 3 in user benchmarks at much lower video costs (4 minute read)
MiniMax's second-generation video AI model, Hailuo 02, features major upgrades in both performance and price. It uses an architecture called Noise-aware Compute Redistribution that improves training and inference efficiency by a factor of 2.5. The architecture handles long video sequences differently depending on the stage of training. Hailuo 02 has three times more parameters and four times more training data compared to its previous version. A video generated with the model is available in the article.
Rethinking Recommendation & Search in LLM Era (11 minute read)
Recommendation and search systems are shifting from item IDs to rich "Semantic IDs," generative retrieval, and multimodal embeddings, enabling cold‑start coverage, long‑tail discovery, and unified search‑recs architectures that scale efficiently.
Compiling LLMs into a MegaKernel: A Path to Low-Latency Inference (7 minute read)
Traditional large language model (LLM) systems often rely on sequences of GPU kernel launches and external communications calls, which results in underutilized hardware. This post discusses how a team created a compiler that automatically transforms LLM inference into a single megakernel, which eliminates launch overhead, enables fine-grained software pipelining, and overlaps computation with communication across GPUs. The end-to-end GPU fusion approach reduces LLM inference latency by 1.2 to 6.7 times.
Inference Economics of Language Models (35 minute read)
The first comprehensive model of LLM serving economics reveals why current approaches to scaling inference hit walls faster than expected, as AI companies race to serve token-intensive reasoning models and agents. Network latency, not bandwidth, creates the primary bottleneck that prevents companies from simply adding more GPUs to increase capacity. Algorithmic breakthroughs like speculative decoding, which delivers double the speed at no additional cost, continue to reshape the economic landscape as providers struggle to match surging demand.
👨💻
Engineering & Research
How 100+ Security Leaders Are Tackling AI Risk (Sponsor)
AI adoption is accelerating—and new research shows most security programs are still working to catch up.
Get a clear view into how real teams are securing AI in the cloud:✅ See where AI adoption is outpacing security✅ Learn what top orgs are doing to manage shadow AI✅ Benchmark your AI maturity against industry peers✅ Get practical next steps to close the AI risk gap
Get the insights
Improving Naturalness in Generative Spoken Language Models (16 minute read)
An end‑to‑end variational encoder that augments semantic speech tokens with automatically learned prosodic features, removing hand‑engineered pitch inputs and yielding more natural continuations in human preference tests.
Changes made to the Model Context Protocol (2 minute read)
This document lists major changes made to the Model Context Protocol (MCP) specification since the previous revision, 2025-03-26. Some of the changes include the removal of support for JSON-RPC batching, the added support for structured tool output, and the clarification of security considerations and best practices in the authorization spec. A link to the complete list of all changes is available.
Detecting Unlearning Traces in LLMs (GitHub Repo)
Machine‑unlearned LLMs leave detectable behavioral and activation‑space "fingerprints". Simple classifiers can spot unlearning with >90 % accuracy, raising privacy and copyright concerns.
Improving Fine-Grained Subword Understanding in LLMs (15 minute read)
StochasTok randomly decomposes tokens during training: instead of always seeing "strawberry" as one unit, models encounter it split as "straw|berry," "str|awberry," or even "s|t|r|a|w|b|e|r|r|y," learning the internal structure humans naturally perceive. Models trained with this method achieve near-perfect accuracy on character counting and multi-digit math while maintaining performance on standard benchmarks.
Six-month-old, solo-owned vibe coder Base44 sells to Wix for $80M cash (3 minute read)
Israeli developer Maor Shlomo recently sold his six-month-old, bootstrapped vibe-coding startup, Base44, to Wix for $80 million cash. His eight employees will collectively receive $25 million of the $80 million as a 'retention' bonus. Base44 grew to 250,000 users in six months. It generated $189,000 in profit in May even after covering high LLM token costs. The startup grew mostly through word of mouth.
Andrej Karpathy on How AI is Changing Software (39 minute video)
Andrej Karpathy argues we're entering "Software 3.0" where LLMs function as cloud-based operating systems programmable through English - best captured by his concept of "vibe coding". Rather than pursuing full autonomous AI agents, he advocates for "autonomy sliders" in tools like Cursor that offset AI limitations through human oversight, and emphasizes the need for LLM-friendly documentation as AI agents become major consumers of digital information.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email