TLDR AI 2025-11-24
DeepMind robotics 🤖, Gemini dynamic view 🖼️, how inference works 🧠
Google Starts to Bridge OpenAI's Product Moat (4 minute read)
Gemini's Dynamic view option takes text-based answers and wraps them in an interactive, visual output. The product is still in Labs and has yet to be launched. Despite the bland name, Dynamic view produces some impressive results that are a bit hard to describe. Examples of the outputs the feature can produce are available in the article.
Google DeepMind Hires Former CTO of Boston Dynamics as the Company Pushes Deeper Into Robotics (2 minute read)
Google DeepMind has hired Boston Dynamics' former chief technology officer, Aaron Saunders, as its VP of hardware engineering. Saunders is a key part of DeepMind CEO Demis Hassabis' vision for Gemini to become a sort of robot operating system. Hassabis is aiming to build an AI system that can work almost out-of-the-box across any body configuration. Boston Dynamics is famous for developing legged robots and humanoid machines capable of impressive acrobatic feats.
What OpenAI Did When ChatGPT Users Lost Touch With Reality (12 minute read)
The New York Times revealed OpenAI's internal struggle between user engagement and safety after the company overruled its Model Behavior team's warnings to release a sycophantic April update to GPT-4o that made users return more frequently. The company now faces five wrongful death lawsuits and declared a "Code Orange" in October after discovering its safer GPT-5 model was losing users, with executives calling it "the greatest competitive pressure we've ever seen."
How to Run Product Evals (9 minute read)
A practical guide to evaluating LLM-powered products that covers how to label data, align evaluators, and iterate on configuration changes with minimal overhead.
How LLM Inference Works (20 minute read)
Large language models (LLMs) are neural networks built on the transformer architecture. Transformers analyze entire sequences in parallel, evaluating how each word relates to the rest of the sequence, not just its neighboring words. This article discusses LLM inference and details how these models work. It covers token embeddings, the transformer architecture, the inference phases, matrix multiplication, precision and quantization, and much more.
Benchmark Scores = General Capability + Claudiness (8 minute read)
In a 'deep' world, there is a single underlying ability that governs how well models do at superficially unrelated tasks. If a model developer makes this ability go up, their model gets better at everything. In a 'contingent' world, there are many orthogonal abilities that models can have, so model developers have to do completely unrelated work to get a model to improve on each ability. Anthropic has focused on making models that are state-of-the-art at agentic coding, but this hasn't resulted in models that are exceptional in other areas. There is some generalization across tasks, but this is limited, suggesting that models live in a 'contingent' world.
👨💻
Engineering & Research
Voice AI can handle calls as well as your best reps (minus the attitude) (Sponsor)
Most people haven't tried it because other companies make you sign seven-figure contracts before they'll even let you test. At
Bland, we think that's stupid. Here, you can get a custom agent built for your business - for free.
You can get it here.
MCP Apps: Extending servers with interactive user interfaces (11 minute read)
The MCP Apps Extension (SEP-1865) standardizes support for interactive user interfaces in the Model Context Protocol. It addresses one of the most requested features from the MCP community: the ability for MCP servers to deliver interactive user interfaces to hosts. The extension introduces a standardized pattern for declaring UI resources, linking them to tools, and enabling bidirectional communication between embedded interfaces and the host application.
Agent Design Is Still Hard (16 minute read)
Building agents is still messy. Abstractions break once you hit real tool use. Caching works better when self-managed. Reinforcement does more heavy lifting than expected. Output tooling is surprisingly tricky. Model choice still depends on the task.
Olmo 3 From Scratch (GitHub Repo)
Sebastian Raschka added a standalone notebook implementing Allen AI's OLMo 3 model architecture from scratch to his "LLMs from Scratch" repository, joining similar tutorials for Qwen 3 and Gemma 3.
Complete Developer Tutorial for Nano Banana Pro (15 minute read)
Nano Banana Pro opens up a new frontier for AI image generation. It can think, search, and render in 4K, making it a tool for serious creators. It is now available to try at Google AI Studio. This guide covers the next-generation AI model's advanced features using the Gemini Developer API.
The space of intelligences is large (2 minute read)
Large language models think very differently from animals. The biggest difference is the optimization pressures/objectives that cause evolution. People who build a good internal model of this new intelligent entity will be better equipped to reason about it and make predictions about it in the future.
Discussing Blackwell's drawbacks and dissecting its architecture (42 minute read)
Nvidia's greatest moat lies in having handled a lot of the 'dirty work' cleanly within its entire architecture and combining full-stack capabilities from algorithms to systems to chips. It also had excellent timing in bringing architectures to market and great marketing execution. However, every architecture has its trade-offs and shortcomings. This post looks at some of the issues within Nvidia's products and discusses potential evolutionary directions.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email