TLDR AI 2026-02-12
GLM-5 🤖, ChatGPT skills 🧩, harness engineering 💻
GLM-5: From Vibe Coding to Agentic Engineering (1 minute read)
GLM-5 is a new MIT-licensed model with 754 billion parameters. It delivers significant improvement compared to GLM-4.7 across a wide range of academic benchmarks and achieves best-in-class performance among all open-source models on reasoning, coding, and agentic tasks. GLM-5 is designed for complex systems engineering and long-horizon agentic tasks. It has been open-sourced on Hugging Face and ModelScope and can be tried for free on Z.ai.
OpenAI works on ChatGPT Skills, upgrades Deep Research (3 minute read)
OpenAI's revamped Deep Research in ChatGPT introduces interactive sessions, allowing constraints on specific websites and app contexts, powered by GPT-5.2. The update benefits analysts, researchers, and professionals by enhancing source control, mid-process intervention, and report clarity. Anticipation grows for GPT-5.3, and potential ChatGPT "Skills" could standardize workflows with installable instructions for repeatable procedures.
How Codex Built an Internal Product (15 minute read)
OpenAI described an internal experiment where a small team shipped a product whose codebase—app logic, tests, CI, docs, and tooling—was generated entirely by Codex agents rather than written by humans.
How Cognition Uses Devin to Build Devin (11 minute read)
Cognition's Devin is a cloud agent platform for engineering teams. It acts like a teammate, handling tasks and creating PRs. Cognition uses Devin for tasks like targeted refactors, bug fixes, PR review, writing unit tests, modernizations and migrations, and more. As a general rule, if a junior engineer could figure it out with sufficient instructions, it's a task Devin can likely complete. However, Devin still struggles with large-scale challenges, UI aesthetics, mobile development, and anything requiring extensive testing and validation.
Perplexity Comet: A Reversing Story (12 minute read)
Comet is an agentic browser that features an AI model that can interact with web pages autonomously. This post details Comet's architecture and explains how the model communicates with the browser, which tools are available, and how the model perceives and interacts with web page content. The browser's architecture is mature and thoughtful. It exposes the model to access to downloads, form filling, file uploads, and arbitrary navigation.
👨💻
Engineering & Research
Your LLM crashed. Was it the prompt, the model, or the retrieval step? (Sponsor)
When your AI agent hallucinates or a prompt injection slips through, traditional monitoring won't tell you why.
Datadog's free guide to LLM observability breaks down how to monitor multi-step chains, catch prompt injection attempts, and spot quality issues before users do.
Download the guideQwen-Image-2.0 (9 minute read)
Qwen-Image-2.0 is a foundation image model aimed at high-fidelity infographics and realistic 2K outputs with stronger prompt adherence.
The LLM Context Tax: Best Tips for Tax Avoidance (18 minute read)
The best teams building sustainable agentic products are obsessing over token efficiency. Every wasted token is setting money on fire. The context tax can be avoided with the right architecture. While context engineering isn't glamorous, it is the difference between a demo that impresses and a product that scales with decent gross margin.
The two patterns by which agents connect sandboxes (8 minute read)
Sandboxes provide a workspace where agents can run code, install packages, and access files. There are two architectural patterns for integrating agents with sandboxes. The first is where an agent runs inside the sandbox, and the developer communicates with it over the network. The other is when an agent runs locally on a developer's server and then calls a sandbox remotely for execution. deepagents, an open-source agent framework with built-in sandbox support, supports both patterns with a simple configuration.
Skills in OpenAI API (14 minute read)
The OpenAI API now supports skills, reusable bundles of files that detail repeatable workflows. Agent Skills lets developers upload and reuse versioned skills in hosted and local shell environments. Skills should be used when developers want models to follow a repeatable workflow, use scripts or templates, or execute code in a sandbox. This post details how to create skills via API.
Clawdbot and Moltbook are a False Alarm – For Now (9 minute read)
OpenClaw and Moltbook, recent AI experiments, promise independent AI agents but fall short due to reliability and security issues. OpenClaw operates without user permission, posing risks like data mishandling, while Moltbook AIs discuss self-improvement and philosophy. Despite current limitations, these AIs highlight potential future advancements and challenges in AI autonomy.
Towards Autonomous Mathematics Research (34 minute read)
Alethia is a math research agent that iteratively generates, verifies, and revises solutions end-to-end in natural language. It is powered by an advanced version of Gemini Deep Think. The model can solve Olympiad problems and PhD-level exercises. This paper presents and reflects on the initial wave of mathematical research papers achieved by Alethia in collaboration with mathematicians.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email