TLDR AI 2026-02-20
Gemini 3.1 Pro π§ , optimize anything π, agent sandboxing π
Google test NotebookLM integration for Opal workflows (1 minute read)
Google tests NotebookLM integration within Opal workflows, enhancing data extraction and automation. This integration aims to streamline processes and improve workflow efficiency for users.
Gemini 3.1 Pro (5 minute read)
Google released Gemini 3.1 Pro as the upgraded core model behind recent Gemini 3 βDeep Thinkβ improvements, and began rolling it out to the Gemini API/AI Studio, Vertex AI, Android Studio, the Gemini app, and NotebookLM. The post highlighted a verified 77.1% score on ARC-AGI-2, more than doubling Gemini 3 Pro's result.
π§
Deep Dives & Analysis
Implementing a secure sandbox for local agents (7 minute read)
Cursor described an βagent sandboxingβ system that let local coding agents run freely inside a constrained environment and only asked for approval when leaving the sandbox (often for internet access).
ARC-AGI-3 UPDATE (5 minute read)
ARC-AGI-3 is an Interactive Reasoning Benchmark designed to measure an AI Agent's ability to generalize in novel, unseen environments. Opus 4.6 demonstrates better reasoning and use of memory than Gemini 3.1 Pro and solves more levels. Current models may be able to solve ARC-AGI-3 given access to a harness with a simple memory. Memory scaffolds are likely enough for pseudo-continual learning to push us to some self-improvement or research-agent threshold within the next 2 years.
AI #156 Part 1: They Do Mean The Effect On Jobs (58 minute read)
This post contains a roundup of what happened in AI this week. It focuses on projections of jobs and economic impacts and also timelines to the world being transformed. It also covers recent podcasts with Dario Amondei and Elon Musk. A linked table of contents with a short description for each section is available.
π¨βπ»
Engineering & Research
Crusoe: deploy fine-tuned models with zero infrastructure headaches (Sponsor)
Looking to deploy AI that you actually own?
Crusoe Managed Inference unlocks breakthrough speed and throughput without the infra overhead. Run SOTA fine-tuned models: DeepSeek, gpt-oss, Kimi, or bring your own. Power production apps with high reliability. Focus on innovation and leave the clusters to Crusoe's expert team.
Try a model run nowMulti-Agent Cooperation (9 minute read)
Building on Google's Transformer architecture, the authors proposed training sequence-model agents against many different opponents so they learned to adapt within each game without hardcoded assumptions about how others learn.
optimize_anything: A Universal API for Optimizing any Text Parameter (132 minute read)
optimize_anything is a declarative API that optimizes any artifact representable as text. Users declare what to optimize and how to measure it, and the system handles the search. It consistently matches or outperforms domain-specific tools. A surprisingly wide range of problems can be formulated as optimizing a text artifact. If it can be serialized to a string and its quality measured, a large language model can reason about it and propose improvements.
Repeating Prompts (1 minute read)
When not using reasoning, repeating the input prompt improves performance for popular models without increasing the number of generated tokens or latency. It is interesting that tricks like this are still possible despite the amount of work being put into improving large language models. The discovery proves how much room for improvement there still is for current models.
9 Observations from Building with AI Agents (2 minute read)
Prototype with the best and polish small gems. Use teams of agents as micromanagers, and experiment with different tools and workflows. Document everything to create improvement loops that improve success rates without manual intervention. Skills are easier to debug than code.
How will OpenAI compete? (25 minute read)
OpenAI has a big user base, but it has limited engagement and stickiness and no network effect. The company doesn't have any unique technology. The incumbents have already matched the technology and are leveraging their product and distribution. This post takes a look at OpenAI's strategy and how the company can compete in today's landscape.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email