TLDR AI 2026-01-23
Claude Code Tasks 🤖, Google personal AI search 🔍, evals for skills 📝
Airia: Enterprise AI orchestration that unifies experimentation, prod, and governance (Sponsor)
You want AI to be more than a never-ending work in progress, and that means enabling no-code, low-code, and pro-code development without IT gatekeepers standing in the way. Yet that doesn't mean governance goes out the window.
Airia is the enterprise AI platform built to drive AI adoption by unifying innovation and security. Teams at Stryker, BuzzFeed, and ArcelorMittal use it to:
- Test prompts, LLMs, and agent variants in safe, prod-like environments
- End AI anxiety by controlling agent sprawl and implementing automatic guardrails
- Discover risks, orchestrate agents, and monitor threats in one place
Get a demo and make AI adoption a reality
Personalized Search with AI Mode (4 minute read)
Google is expanding "Personal Intelligence" in Search, enabling AI Mode to use private context from Gmail and Photos for more personalized results.
We're turning Todos into Tasks in Claude Code (2 minute read)
Anthropic has upgraded Todos in Claude Code to Tasks, a new primitive that helps Claude Code track and complete more complicated projects and collaborate on them across multiple sessions or subagents. Claude can create Tasks with dependencies on each other that are stored in the metadata, mirroring how projects work. Tasks are stored in the file system so that multiple subagents or sessions can collaborate on them. Updates are broadcasted to all sessions currently working on the same Task List when one session updates a Task.
Building Deep Agent Frontends (38 minute read)
CopilotKit published a tutorial on building a fullstack Deep Agent app that includes resume ingestion, skill extraction, sub-agents with web search, and a streaming UI.
Pantera Blockchain Letter :: January 2026 (25 minute read)
Macro, positioning, flows, and market structure effects were the dominant drivers last year, particularly for assets outside of Bitcoin. It was a difficult year for much of the token market, but the year also saw advanced institutional adoption, clarified product-market fit, and compressed valuations across large segments of the ecosystem. A strong fundamental backdrop following a year-long bear market for the broader token universe could present an opportunity. Forward-looking setups appear increasingly asymmetric, provided fundamentals stabilize and breadth returns.
Overcoming Compute and Memory Bottlenecks with FlashAttention-4 on NVIDIA Blackwell (7 minute read)
FlashAttention is an input/output aware algorithm that computes the same mathematical result as standard attention, but more efficiently. It uses reduced memory access and near-linear memory to achieve this. These optimizations lead to faster training and inference and enable models to handle longer sequences of tokens. FlashAttention-4, the latest iteration of optimized CUDA kernels, achieves a peak performance of 1,605 TFLOPS/s.
👨💻
Engineering & Research
Design, build, and publish your site in hours on Framer — free for the first year (Sponsor)
Get from idea to live site in record time when you launch on
Framer. Hundreds of YC-backed founders have already designed and published beautiful sites without dev resources. Now pre-seed and seed-stage startups get their first year free. Make your website the easy part of your startup and
claim your free yearRender-of-Thought: Visualizing Reasoning Chains (18 minute read)
Render-of-Thought (RoT) converts reasoning steps into visual representations using VLMs, offering a more compact and analyzable alternative to Chain-of-Thought prompting. The method achieves better token compression and faster inference while maintaining competitive reasoning performance.
Robotics Video Benchmark and Dataset (GitHub Repo)
RBench is a benchmark for robotics video generation, and RoVid-X is a million-scale dataset for training embodied video models.
Qwen3-TTS Family is Now Open Sourced: Voice Design, Clone, and Generation! (35 minute read)
Qwen3-TTS is a series of speech generation models that supports voice cloning, voice design, ultra-high-quality human-like speech generation, and natural language-based voice control. The models support 10 mainstream languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, and Italian) along with various dialects. They exhibit strong contextual understanding, allowing them to adapt tone, rhythm, and emotional expression based on instructions and text semantics.
Small models, big results: Achieving superior intent extraction through decomposition (6 minute read)
For agents to be truly helpful, the underlying models need to be able to understand what users are trying to do when they are interacting with them. Large multimodal LLMs are already good at understanding user intent, but this typically involves sending information to a server, which can be slow, costly, and carries the potential risk of exposing sensitive information. The task can be made more tractable for small models by separating user intent understanding into two stages.
D4RT: Unified, Fast 4D Scene Reconstruction & Tracking (3 minute read)
D4RT is a unified AI model for efficient 4D scene reconstruction and tracking that achieves up to 300x efficiency over traditional methods. Utilizing a unified encoder-decoder Transformer architecture, D4RT processes scenes dynamically by answering targeted queries about 3D space and time from 2D video inputs. This speed and scalability suit applications in robotics and augmented reality.
Thoughts on Evals (11 minute read)
Raindrop's AI monitors agents by generating billions of labels monthly, detecting issues, and providing performance insights in the real world. Unlike Eval Driven Development, Raindrop focuses on monitoring after deployment, allowing real-world feedback and faster iteration. Monitoring tools are more effective for AI products due to their dynamic nature, offering comprehensive real-world performance insights that static evals can't match.
Salesforce Adopts Cursor at Scale (3 minute read)
Over 90% of Salesforce's 20,000 engineers now use Cursor in daily workflows, improving development speed and code quality. The shift has helped power product releases like Agentforce and signals broader industry trends in AI-assisted software engineering.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email