TLDR AI 2026-06-15
Anthropic Fable shutdown ποΈ, GLM-5.2 π€, OpenRouter Fusion π§
Google is working on Skills Marketplace for Gemini Business (2 minute read)
Google is integrating its products under Gemini Enterprise with a new "Skills Marketplace" tab for pre-defined, Google-optimized skills. The marketplace, aimed at helping teams develop dashboards and reporting tools without long engineering delays, includes a Skills Management UI, a Skills Builder, and the Marketplace itself.
Anthropic Suspended Access to Fable 5 and Mythos 5 (3 minute read)
Anthropic said it disabled Fable 5 and Mythos 5 for all users after receiving a US government export-control directive tied to national security concerns and reported jailbreak risks.
GLM-5.2 (1 minute read)
GLM-5.2, a new flagship model from Z.ai, is now available to all GLM Coding Plan users. It delivers powerful coding capabilities, usable 1M-context support, and continued strengths in long-horizon tasks. API and chatbot services will launch next week. The model will be open-sourced under the MIT License.
Amazon CEO's Talks With US Officials Triggered Crackdown on Anthropic Models (9 minute read)
Conversations between Amazon chief executive Andy Jassy and US officials prompted the Trump administration to halt all foreign use of Anthropic's most capable AI models. Researchers at Amazon had used a series of prompts to get Anthropic's Fable 5 model to provide them with information that could be used to aid cyberattacks. White House officials asked Anthropic to fix the vulnerabilities or take down the model. Anthropic has shut down access to its Mythos and Fable models to comply, but says that the vulnerabilities flagged by Amazon are relatively basic and that other publicly available models are also capable of discovering them.
π§
Deep Dives & Analysis
Inference cost at scale with napkin math (13 minute read)
You need the following information to work out the dollar price-per-user: GPU hardware specs, context length, active parameter count of the model, and product-specific factors. The specifics of the model architecture matter surprisingly little, unless it's something entirely different, like diffusion. This post shows how to work out the math on paper. The exercise should reveal how various optimizations in inference engines help SaaS products remain profitable.
Today's Frontier AI companies will never exceed the AI capability frontier again (18 minute read)
Networks of smaller AI models are outperforming every frontier AI system on speed, accuracy, and cost. Everyone in the 1960s was wrong about the mainframe computer, and everyone is now wrong about centralized AI. The future is a network of neural networks.
The Physics of a Fable (25 minute read)
Rafa Schwinger reverse-engineers Claude Mythos and Fable by arguing the moat is not architecture but the environment foundry, with capability decomposing as base foundation times gradeable signal extracted on top, and verifiable reward becoming the scarce decisive input now that text and raw compute no longer are. The recipe stacks dense pretraining, GRPO-style verifier RL where reward-hacking soundness is the actual binding constraint, long-horizon process rewards with learned context-folding that beats million-token windows at 32K active, plus best-of-N test-time compute exposed as an effort dial.
π¨βπ»
Engineering & Research
Engineers say AI-generated code is betterβ¦yet 78% report more incidents? (Sponsor)
Nearly two-thirds of tech leaders say their teams ship AI code without line-by-line verification. No wonder it keeps breaking in prod! Read New Relic's
2026 State of AI Coding Report to see what engineers are doing with AI code, and what leaders think needs to happen next.
Get your copyCount Anything (2 minute read)
Object counting is still fragmented across domain-specific data sets and task formulations. Existing counting models are often tailored to specific scenarios and struggle to generalize across categories, visual domains, object scales, and density distributions. This paper presents a generalist model for text-guided object counting that achieves strong accuracy and multi-domain generalization.
MiniMax Sparse Attention for Million-Token Contexts (GitHub Repo)
MiniMax Sparse Attention is a sparse attention architecture that uses group-specific Top-k block selection to scale long-context inference while preserving model quality. On a 109B multimodal model, it matched GQA performance while cutting attention compute by ~30x at 1M tokens.
Kimi K2.7 Code (Hugging Face Repo)
Kimi K2.7 Code is a coding-focused agentic model that has stronger end-to-end task completion across complex software engineering workflows and improved token efficiency compared to Kimi K2.6. The Mixture-of-Experts model has 1 trillion total parameters. It can be accessed on Moonshot's OpenAI/Anthropic compatible API. The model works best with Kimi Code CLI as its agent framework.
Introducing the Open Knowledge Format (9 minute read)
The Open Knowledge Format is an open specification that formalizes the LLM-wiki pattern into a portable, interoperable format. It is vendor-neutral and agent- and human-friendly. The standard can represent the metadata, context, and curated knowledge that modern AI systems need. The specification uses familiar patterns with no complex compression scheme, new runtime, or required SDK.
olmo-eval: An evaluation workbench for the model development loop (7 minute read)
olmo-eval is a new evaluation workbench designed for iterative LLM development. Enhancing the OLMES standard, it streamlines adding benchmarks, supports agentic and multi-turn evaluations, and facilitates analysis by comparing changes across model checkpoints. Unlike Harbor, olmo-eval offers flexibility, minimizing resource use and focusing on development rather than public benchmarking.
Why Apple built a third-party AI system for Siri and then refused to show it at WWDC (6 minute read)
The iOS 27 beta contains an Extensions system for third-party AI. The system includes a settings panel and a dedicated App Store section. Both have been built, but are toggled off in the backend. Apple was in discussions with major AI providers about granting entitlements for the framework, but it appears the company has decided against announcing the feature for now.
TLDR is hiring a Senior PMM ($180k-$225k base + $40-50k annual target bonus, Fully Remote)
We're hiring a senior PMM to own product marketing at TLDR. You'll define our positioning, build out sales enablement, and lead every launch.
Learn more.
$10,000,000 on the line: how we measure Devin's engineering output (1 minute read)
Devin's engineering system guarantees more output than cost, with $10M pledged per customer. The system's effectiveness is validated using independent data. This bold claim aims to bolster confidence in their engineering productivity.
The Oracle and the Firm (5 minute read)
OpenAI and Anthropic have decided on different approaches to context management. OpenAI uses compaction, compressing everything and retaining only the relevant information. This results in one long thread that maintains a lot of coherence. Anthropic splits context windows across various agents, where each agent executes on the sub-problem within its own context window. Sub-agents do a large amount of work, then pass back only the relevant information to the parent agent. Anthropic's approach can result in sub-agents doing duplicate work, forgetting information, and generally wasting more tokens.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email