TLDR AI 2025-12-16
Nvidia Nemotron 3 3οΈβ£, Claude agentic tasks π€, OLMo 3 deep dive π
Tools to align AI pricing with value (Sponsor)
It might seem like everyone is struggling with sustainable monetization for AI, but Metronome works with the companies who have actually figured it out - Anyscale, NVIDIA, Databricks, and many others. Their team has created
two resources to help you do the same:
1οΈβ£ Self-assessment (5 minutes): Answer 8 quick questions to discover which pricing model best fits how your customers get value from your product.
2οΈβ£ The Pricing Experimentation Playbook: This guide can help you test, validate, and optimize your pricing strategy. (Complete the assessment to get the most out of the guide.)
Anthropic preparing new Agentic Tasks Mode for Claude (2 minute read)
Anthropic is testing a new interface for tasks in Claude's Agent mode. It is also introducing new modes for research, analysis, writing, and building. The updated interface introduces a toggle that allows users to switch between classic chat and agent modes. Screenshots of the new interface are available in the article.
NVIDIA Debuts Nemotron 3 Family of Open Models (4 minute read)
Nvidia released Nemotron 3 Nano (30B parameters, 3B active) with Super (100B) and Ultra (500B) coming in early 2026, with Nano's benchmark scores rivaling or exceeding closed-source rivals. Nvidia is publishing training data and releasing libraries for agent customization in what appears to be an attempt to undermine OpenAI, Google, and Anthropic, which are increasingly developing their own chips instead of using Nvidia.
Disney-OpenAI Licensing Deal Is One-Year Exclusive (1 minute read)
Disney's new partnership with OpenAI allows Sora to use over 200 characters from its IP catalog for one exclusive year. After that, Disney is free to partner with other AI platforms, hinting at broader content licensing strategies.
π§
Deep Dives & Analysis
When Machines Pay Machines: The Economics of Agentic AI (6 minute read)
x402 is a protocol that embeds payments directly into HTTP, allowing any API call to include a payment. It has processed over 100 million payments across APIs, applications, and AI agents since launching in May this year. This article takes a look at the protocol and how internet-native payments are becoming standard infrastructure.
Self Improving Agent with Dynamic Context and Continuous Learning (11 minute read)
This guide walks readers through how to build a self-improving Text-to-SQL agent using dynamic context and 'poor-man's continuous learning'. The agent answers questions by retrieving data from a knowledge base. It learns from successful runs by adding to the knowledge base, creating a self-improving feedback loop. A video that shows the agent in action is available at the end of the article.
GPT-5.2 Is Frontier Only For The Frontier (31 minute read)
GPT-5.2 is good at instruction following and a solid choice, but it isn't 'fun' to interact with. People strongly dislike its personality. The model is heavily constrained and censored. It is not likely to solve OpenAI's 'Code Red' problems. The company will likely try again in a month with GPT-5.3.
AI agents are starting to eat SaaS (11 minute read)
Basic SaaS providers now have thousands of new competitors: engineers with a spare Friday afternoon with an agent. Not all SaaS products will be affected, but the bar is going to be much higher for those that don't have a clear moat or proprietary knowledge. Companies with strong internal technical ability will be able to replace their SaaS spend, decreasing costs, while those who don't will see dramatically increasing costs as Saas providers try to recoup lost sales.
π¨βπ»
Engineering & Research
π Shopify's Agentic Storefronts lets you surface products across every AI platform (Sponsor)
Virtual try-on apps, AR-VR, voice - AI is opening up exciting new ways to build consumer shopping experiences. Agentic Storefronts (part of
Shopify's Winter '26 release) unlock AI-powered shopping for merchants with a new set of tools to help you ensure your products are discoverable and your brand stays authentic in agentic conversations.
Read the blog200k Tokens Is Plenty (6 minute read)
Claude Opus 4.5 is considered the best model for coding, and it only has a context window of roughly 200,000 tokens. While some people feel like that is not a lot, that can be plenty for developers who use short threads. The best threads are short, with just the right amount of context. The longer the conversation, the more an agent's context window gets filled up with junk. You only need to give agents the context they need to get the job done, and no more.
OLMo 3: A Deep Dive Into the Fully-Open LLM (30 minute read)
AI2 published the most comprehensive starting point for open LLM research: all checkpoints, training data, and code for OLMo 3. The walkthrough covers every stage from data mixing through the full SFT/DPO/RLVR post-training stack, including OlmoRL infrastructure that cuts RL training from 15 days to 6 days. RL with random rewards, which worked on Qwen, fails on OLMo 3.
Bring your research to life with integrated visual reports from Gemini Deep Research (1 minute read)
Google AI Ultra subscribers can now use Gemini Deep Research to generate rich visual reports complete with custom images, charts, and interactive simulations. The update can automatically illustrate data and create dynamic simulation models to forecast outcomes based on different variables. The feature combines deep analysis with dynamic visuals to transform dense data into tangible, easy-to-understand insights.
Structured Outputs Create False Confidence (7 minute read)
Constrained decoding forces models to prioritize output conformance over output quality. This is because it restricts models to only consider certain tokens. Using structured output makes it harder for models to refuse requests, warn users about contradictory information, and inform users of the right approach when they ask it to use the wrong approach. Letting models respond in as natural a fashion as possible is the most effective way to get the highest quality response from them.
Google expands Gemini with NotebookLM integration (2 minute read)
NotebookLM integration within Google Gemini lets users attach and utilize their notebooks as live data sources. This update, currently limited to select accounts, supports multiple notebook attachments for personalized data processing. Google plans to broaden Gemini's practical utility and reach, rolling out the feature to free accounts.
Terence Tao doubts that anything resembling genuine "artificial general intelligence" is within reach of current AI tools (2 minute read)
While 'artificial general intelligence' may be out of reach, 'artificial general cleverness' is becoming a reality in many ways. General cleverness is the ability to solve broad classes of complex problems via somewhat ad hoc means. While this doesn't qualify as true intelligence, it can result in a non-trivial success rate in a wide spectrum of tasks. Viewing the current generation of tools as a generator of clever thoughts and outputs may be a more productive perspective when trying to use them to solve difficult problems.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email