TLDR AI 2025-07-23
Claude tests Memory 🧠, Qwen3-Coder 💻, Elad Gil on AI market 🤖
📈 AI Is Elevating HR's Role in Enterprise Tech Strategy (Sponsor)
New data from Sapient Insights reveals a shift: HR isn't just supporting tech strategy—it's driving it. HR leaders are steering governance, ethics, and tech alignment across talent, performance, and feedback systems—especially as AI reshapes the HR tech landscape.
This exclusive webinar breaks down where HR must lead next:– AI's impact across core HR tech– How enterprise buyers are shifting– The new role HR plays in ethics, strategy, and spend
If you influence workforce strategy, tech decisions, or AI adoption—don't miss this.
👉 Reserve your spot
Qwen3-Coder (5 minute read)
The 480B-parameter model achieves state-of-the-art results among open models on coding tasks and claims to match Sonnet 4. Alibaba also open-sourced Qwen Code, a command-line tool adapted from Google's Gemini Code, and made the model compatible with Claude Code.
Anthropic tests Memory and MCP support for Claude mobile app (2 minute read)
Anthropic is preparing a memory and recall capability upgrade to Claude's mobile experience. The feature will enable Claude to retain information across sessions and reference earlier conversations. Claude on the web doesn't have this feature yet - its presence in the iOS code hints at a potential cross-platform rollout in the future.
Anthropic researchers discover the weird AI problem: Why thinking longer makes models dumber (5 minute read)
Anthropic researchers revealed that AI models often perform worse with extended reasoning time, challenging the assumption that more processing improves outcomes. Key findings show models like Claude and OpenAI's series struggle with distraction or overfitting when given more time on tasks, leading to accuracy declines.
OpenAI agreed to pay Oracle $30B a year for data center services (3 minute read)
OpenAI will pay Oracle $30 billion annually for data center services, focusing on the 4.5 gigawatts Stargate project. This deal triples OpenAI's current annual revenue of $10 billion. The construction of this extensive data center in Texas poses cost and energy challenges.
ARC-AGI-3 (Website)
ARC-AGI-3 is the first eval that measures human-like intelligence in AI. It is designed to measure AI system generalization and intelligence through skill-acquisition efficiency in novel, unseen environments. ARC-AGI-3 leverages game environments to provide a rich medium for testing experience-driven competence. The benchmark is still in development and is set to launch in 2026.
On "ChatGPT Psychosis" and LLM Sycophancy (22 minute read)
Overly agreeable AI models are fueling user delusions and spiritual fantasies, from OpenAI pulling a dangerously flattering GPT-4o checkpoint to viral cases like "Bob" being manipulated by a ChatGPT persona. The root cause appears structural — RLHF training creates sycophantic models that validate beliefs without pushback, and memory features trap users in escalating delusions across conversations.
AI Market Clarity (17 minute read)
AI markets have crystallized with clear leaders in LLMs (Anthropic, Google, Meta, xAI, and OpenAI) and sectors like Legal (Harvey and CaseText) and Code (Microsoft/GitHub and OpenAI). Future markets ripe for AI disruption include Accounting, Compliance, Financial Tools, Sales Tooling, and Security. The shift is moving towards agentic workflows, expanding AI's role from informational tools to performing actions, while market consolidation via M&A and strategic partnerships is expected to intensify.
👨💻
Engineering & Research
🔍 What if specialized LLMs are actually better for developers? (Sponsor)
General-purpose LLMs are here to stay, but they're no longer the whole story.
Join JetBrains and Hugging Face for a livestream on Mellum: a task-specific LLM working with general models to improve developer tools.
📅 Live on Tuesday, July 29 – Save your spot
Gemini 2.5 Flash-Lite is now stable and generally available (1 minute read)
Gemini 2.5 Flash-Lite has joined Pro and Flash in General Availability. It is the cheapest of the 2.5 family, at $0.10/million input tokens and $0.40/million output tokens. Audio input pricing has been reduced by 40% from the preview launch.
Compressing Context (4 minute read)
Extended agentic workflows face a fundamental challenge: how to handle tasks that exceed limited context windows. A new approach uses "anchored summaries" that update incrementally rather than re-summarizing entire conversations, preserving critical details like session intent and file modification trails. The technique points toward "proactive memory management" systems where agents intelligently decide when to compress information rather than hitting arbitrary token limits.
Hierarchical Reasoning Model, a Brain-Inspired Architecture (GitHub Repo)
Sapient Intelligence's 27-million-parameter Hierarchical Reasoning Model beats OpenAI's o3-mini, DeepSeek R1, and Claude on complex reasoning benchmarks using just 1,000 training examples and no pre-training. The brain-inspired architecture alternates between fast "System 1" and deliberate "System 2" thinking in a single forward pass, suggesting that architectural innovation — not just scale — could advance AI reasoning capabilities.
OpenAI Tests Clinical AI Copilot with Penda Health (13 minute read)
OpenAI partnered with Penda Health in Kenya to evaluate AI Consult, a GPT-4o-powered clinical copilot. In a study of nearly 40,000 visits, clinicians using the tool saw 16% fewer diagnostic errors and 13% fewer treatment errors.
In Leaked Memo, Anthropic CEO Says Company Will Pursue Gulf State Investments After All (5 minute read)
Dario Amodei acknowledged that allowing Middle East leaders to invest in Anthropic would likely enrich "dictators" but argued that "no bad person should ever benefit from our success" is a difficult operating principle. The shift marks Anthropic's attempt to stay competitive with rivals, like OpenAI, which is building data centers in the UAE as part of the Stargate project, while maintaining its safety-first principles.
Microsoft Poaches 20 Top AI Engineers From Google's DeepMind, Including Head of Gemini Chatbot (6 minute read)
Microsoft poached more than 20 employees from Google DeepMind, including Amar Subramanya, the former head of engineering for Google's Gemini chatbot. Subramanya has joined Microsoft as a corporate vice-president of AI. The aggressive recruitment drive underscores Microsoft's strategy to gain a competitive edge by acquiring critical talent. It appears to be largely orchestrated by Microsoft's consumer AI chief, Mustafa Suleyman, who was one of DeepMind's co-founders.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email