TLDR AI 2026-02-17
Qwen 3.5 Plus π€, Manus Agents π§βπ», inference economics π°
Cut Annotation Costs by 90% With Feedback-Driven Pipelines (Sponsor)
95% of labeled data never gets used, and most teams spend 5-7 review cycles before datasets are production-ready.
The problem? Disconnected workflows between annotation tools, data curation, and model evaluation create endless handoffs and wasted iteration cycles.
This hands-on workshop on February 18th shows you how to fix it. You'll learn how to:
- Build annotation pipelines that start with real model failures and data gaps
- Use zero-shot techniques to label only high-value data
- Structure human-in-the-loop workflows that cut annotation costs while improving model performance
- Create a closed loop from annotation to QA to model training
You'll also get this step-by-step guide for implementing a data-centric annotation workflow.
Register for the February 18th workshop β
Qwen3.5: Towards Native Multimodal Agents (40 minute read)
Qwen3.5-397B-A17B is the first model in the Qwen3.5 series. The native vision-language model demonstrates outstanding results in reasoning, coding, agent capabilities, and multimodal understanding. It uses an innovative hybrid architecture that fuses linear attention with a sparse mixture-of-experts. While it contains 397 billion parameters, only 17 billion are activated per forward pass. The model supports 201 languages and dialects.
Introducing Manus in Your Chat: Your Personal Agent, Everywhere You Are (3 minute read)
Manus Agents is a new way to access and use Manus directly inside messaging apps. Telegram is currently the only supported app, with more platforms coming soon. The agent features few reasoning, tools, and multi-step task execution. The feature makes agents accessible wherever users are.
Microsoft tests Researcher and Analyst agents in Copilot (2 minute read)
Microsoft is testing new Researcher and Analyst agents integrated into Copilot's upcoming "Tasks" feature. This feature will allow users to schedule complex prompts, leveraging OpenAI and o3-mini models for research and data analysis. The addition of an "Auto" mode aims to streamline task automation, potentially differentiating Copilot in productivity use cases.
π§
Deep Dives & Analysis
On Dwarkesh Patel's 2026 Podcast With Dario Amodei (11 minute read)
Anthropic's CEO, Dario Amodei, expects 'geniuses in a data center' to show up within a few years. While Anthropic's actions do not seem to fully reflect this optimism, its caution is necessary. This article contains notes from a recent podcast where Amodei discusses China, export controls, democracy, AI policy, AI risks, and continual learning.
Why I don't think AGI is imminent (12 minute read)
AGI is likely possible, but it probably won't come from Transformer-based models. Transformers are very powerful, but they have fundamental limitations. Solving these limitations could take decades. This isn't to say that LLMs aren't useful - the current technology is already fundamentally changing society.
How persistent is the inference cost burden? (10 minute read)
Inference costs may not be that much of a bottleneck for AI progress. The cost to reach a given capability level falls fast, so the inference cost burden is more transient than it might appear from looking at only frontier models at launch. The data on RL scaling is still thin, so it is difficult to draw conclusions yet. It will be interesting to see how quickly cheaper models catch up to frontier capability levels, and how inference costs for fixed tasks decrease over time.
π¨βπ»
Engineering & Research
What bottleneck? 50% of agentic AI projects are in production (Sponsor)
Autonomous operations are rapidly expanding, and 74% of enterprises expect AI budgets to rise further in 2026.
Dynatrace surveyed 900+ global decision-makers about how they're operationalizing agentic AI, with observability as the foundation for trust and control. See where the market is headed with
this in-depth research reportAnnouncing Spreadsheet Arena (2 minute read)
Spreadsheet Arena is an open platform for evaluating LLM-generated spreadsheets. Formatting and structure often influence user preference more than formula complexity. There are significant differences in domain-specific preferences, with academic models suffering from heavy formatting and finance models benefiting from professional color coding. Crowd preferences often diverge from expert ratings, particularly in color coding and formatting.
ZVEC: A lightweight, lightning-fast, in-process vector database (GitHub Repo)
Alibaba's ZVEC is an open-source, in-process vector database enabling rapid, scalable similarity searches using Alibaba's PROXIMA engine. It supports dense and sparse vectors with hybrid searches and can be deployed across various platforms, including notebooks and edge devices. Installation is straightforward via Python or Node.js, offering a lightweight solution for handling vector data efficiently.
How much are AI reasoning gains confounded by expanding the training corpus 10,000x? (5 minute read)
Benchmark performance gives biased estimates of out-of-distribution generalization if LLM training data is polluted with benchmark test data. Typical decontamination filters fail to detect semantic duplicates. This suggests that recent benchmark gains are confounded - the prevalence of soft contamination means gains reflect both genuine compatibility improvements and the accumulation of test data and effective test data in the growing training corpora.
Micron Is Spending $200 Billion to Break the AI Memory Bottleneck (9 minute read)
Micron Technology, the largest American maker of memory chips, is rushing to add manufacturing capacity to break the memory bottleneck. The company is spending $50 billion to more than double the size of its 450-acre campus. It will build two new chip factories, the first of which is expected to start production of DRAM in mid-2027. Micron also recently broke ground on a $100 billion fab complex in New York, and it announced a $9.6 billion fab investment in Japan last year.
On Anthropic's Consumer Marketing (4 minute read)
Anthropic has a big consumer marketing problem. Its story, while well known in Silicon Valley, isn't widely heard or legible to the greater cultural conscience. It is puzzling how a company so good at aesthetics and narrative engineering can be so bad at this.
Flapping Airplanes on the future of AI: 'We want to try really radically different things' (20 minute read)
Flapping Airplanes aims to revolutionize AI by developing data-efficient training methods, reducing reliance on vast datasets. Backed by $180 million, the founders emphasize diverging from traditional methods, drawing inspiration from the human brain without replicating it exactly. They focus on creativity and fresh perspectives, employing a team oriented towards groundbreaking research rather than incremental improvements.
Startup Support That Delivers Results (Sponsor)
Let MongoDB for Startups partner with you as you build the next big thing with the tools, support, and go-to-market opportunities you need to accelerate from idea to IPO.
Get started here.The Scarcity Trap: Why AI Still Feels Like a Metered Utility (14 minute read)
AI is constrained by silicon, supply chains, and economics.
Will reward-seekers respond to distant incentives? (10 minute read)
AIs optimized as βreward-seekersβ might be influenced not just by local training incentives but also by distant retroactive rewards or simulated scenarios administered later or by powerful actors.
The Long Tail of LLM-Assisted Decompilation (17 minute read)
LLMs can assist with decompiling Nintendo 64 games up to a certain point - this post describes a developer's attempt, how their workflow evolved as the project matured, what helped, and where they're currently stuck.
You see tech and AI everywhere, but in the productivity statistics (1 minute read)
US productivity increased roughly 2.7% for 2025, a near doubling from the sluggish 1.4% annual average that characterized the past decade.
Q: I want to wash my car. The car wash is 50 meters away. Should I walk or drive? (1 minute read)
Nearly all models tested responded that they should walk.
The Economics of LLM Inference: Batch Sizes, Latency Tiers, and Why Model Labs Have an Advantage (14 minute read)
Model labs have a structural cost advantage that pure inference providers will struggle to match.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email