TLDR AI 2025-07-07
Grok 4 rumors ๐ค, Character AIโs video model ๐น, American DeepSeek project ๐ป
Build the future of agentic commerce with the PayPal MCP server (Sponsor)
The PayPal MCP server delivers speed and reliability using prebuilt functions that connect directly to PayPal's APIs, reducing development time and errors during commerce operations. Its architecture is scalable and flexible, allowing you to connect with a range of platforms.
Check out this video featuring PayPal engineers Brenden Lane and Vikram Bhoomidi as they explore the Model Context Protocol, its benefits, and how it allows interactions with payment systems using natural language commands.
Watch now and visit PayPal.ai for more details.
Character AI's Real-Time Video Generation (3 minute read)
Character.AI's TalkingMachines is a real-time, audio-driven video generation model that creates FaceTime-style animations from a single image and voice input.
Grok 4 benchmarks leak with 45% score on Humanity Last Exam (1 minute read)
Leaked benchmarks show that Grok 4 will be state-of-the-art. References to the model have surfaced in the xAI console. If the benchmarks are real, Grok 4 could outperform leaders like Gemini 2.5 Pro, o3 Pro, and Claude 4 Opus. xAI faces competitive pressure to release Grok 4 before the market shifts again - OpenAI, Google, and Anthropic are rumored to be preparing fresh releases for launch.
๐ง
Deep Dives & Analysis
Adding Memory to Gemini 2.5 Chatbots (12 minute read)
A guide explaining how to use the Gemini API and the open-source mem0 tool to give Gemini 2.5 chatbots long-term memory. The setup allows bots to recall past interactions, personalize responses, and reduce repetition for more context-aware conversations.
Baba is Eval (10 minute read)
'Baba is You' is a puzzle game where the rules have to be manipulated to win. The level of abstraction required to solve most levels makes it a formidable reasoning benchmark. This study looks at how large language models fare in playing the game. At the moment, Claude 4 is pretty bad at playing the game - a reasoning model may be better equipped to play, so the next step in the study may be to test those models.
The American DeepSeek Project (10 minute read)
Meta's recent AI shortcomings have created a vacuum in the open-source AI ecosystem that has largely been filled by Chinese models. If current dynamics continue, the AI world will be split between powerful but expensive, closed-source American models and low-cost, ubiquitous but potentially compromised Chinese models. The US likely has a very small window, roughly the next two years, to counter this trend by investing $100-500 million into an open-source model that is as good as the best closed-source model.
๐จโ๐ป
Engineering & Research
Local-first GenAI with NVIDIA AI Workbench on Dell Pro Max (Sponsor)
agent-squad (GitHub Repo)
A framework for building collaborative multi-agent AI systems that can plan, delegate, and work together to solve complex tasks.
What can agents actually do? (18 minute read)
There's a lot of excitement about what AI can do, but many of the conversations about the technology are so abstract that they border on meaningless. This post attempts to concisely summarize how AI agents work. It looks at a handful of real-world use cases for the technology. AI agents are a multiplier on software quality and system design. If software or systems are poorly designed, agents will only cause harm.
Sakana AI's TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30% (7 minute read)
Sakana AI's TreeQuest uses Multi-LLM AB-MCTS to combine multiple LLMs. Achieving a 30% performance boost over individual models, this technique leverages the unique strengths of different models through Adaptive Branching Monte Carlo Tree Search, dynamically assigning the optimal model for each task. Available as an open-source tool, TreeQuest allows businesses to apply this approach to complex problems, enhancing AI capabilities and reducing hallucination risks.
Why I don't think AGI is right around the corner (15 minute read)
It's hard to get normal humanlike labor out of LLMs because they lack some fundamental capabilities. They don't get better over time, and this lack of continual learning is a huge problem. There's no way to give models human-level feedback, so users are stuck with the abilities they have out of the box. Messing around with system prompts doesn't produce anything close to the kind of learning and improvement that human employees experience. Humans are able to build up context, interrogate their own failures, and pick up small improvements and efficiencies as they practice a task.
Google Faces EU Complaint Over AI Overviews (1 minute read)
The Independent Publishers Alliance filed an antitrust complaint with the European Commission, alleging that Google's AI Overviews misuse web content and have led to traffic and revenue loss for publishers.
Want more news from TLDR? (Sponsor)
You'll probably like our flagship newsletter. It's all about tech, science, and programming.
Same quick format. Still free.
Subscribe now.
Nvidia's deal to buy Canadian AI startup CentML could top US$400M (3 minute read)
CentML makes software that operates between users' AI models and the chips powering them, making the systems run better.
Researchers seek to influence peer review with hidden AI prompts (1 minute read)
Researchers are embedding hidden AI prompts in academic papers on arXiv to influence peer reviews positively.
NFDG: The $1.1B VC Fund That 4X'd in Two YearsโThen Got Acquired by Meta (13 minute read)
This post looks at NFDG's portfolio, advisory board, performance, success factors, and more.
Grok 4 spotted ahead of launch with special coding features (2 minute read)
Grok 4 (grok-4-0629) offers unparalleled performance in natural language, math, and reasoning.
Elon Musk confirms xAI is buying an overseas power plant to power its new data center (4 minute read)
xAI's next data centers are expected to house millions of AI chips.
HOLY SMOKES! A new, 200% faster DeepSeek R1-0528 variant appears from German lab TNG Technology Consulting GmbH (10 minute read)
24-year-old German firm TNG Technology Consulting GmbH's DeepSeek-TNG R1T2 Chimera delivers a notable boost in efficiency and speed while using significantly fewer output tokens.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email