TLDR AI 2025-12-15
OpenAI adopts Skills 🤝, Tinker GA 💻, reverse engineering Claude 👨💻
How Ramp built an AI operating system for scalable work (Sponsor)
Learn how Ramp became one of the most productive companies in the world by adopting a Builder mindset—understanding that work is fundamentally changing and actively building an AI operating system instead of waiting for the perfect tool.
Key takeaways from the story:
- The Builder mindset: Don't wait for AI to get easier—start designing your future work now
- Three steps to scale: Getting precise with AI, centralizing information, and building workflows without engineers
- Real results: 270 features shipped in H1 2025 (more than all of 2024 combined) with 90% of 1,200 employees using Notion AI monthly
The blog includes a CTA to watch the Make with Notion session with Ben Levik (Ramp's operations and AI product leader)
OpenAI is quietly adopting skills, now available in ChatGPT and Codex CLI (8 minute read)
Skills support is quietly showing up in OpenAI's Codex CLI tool and ChatGPT. The skills folder can be accessed by prompting, 'Create a zip file of /home/oai/skills'. So far, the skills cover spreadsheets, docx, and PDFs. A link to a repository with a copy of the skills is available in the article.
Tinker adds vision input and goes GA (3 minute read)
Tinker is open to everyone now, featuring a new reasoning model, Kimi K2 Thinking, and an OpenAI API-compatible interface for seamless integration. Vision input capabilities have been added with Qwen3-VL models, allowing processing of images and text together. These updates enhance Tinker's utility in image classification, outperforming traditional models with limited labeled data.
I Reverse Engineered Claude's Memory System, and Here's What I Found! (10 minute read)
Claude uses on-demand tools and selective retrieval for its memory system. This post explores Claude's memory system through conversations with the bot. Claude appears to be cooperative, transparent, and willing to share information about its internal structure, tools, and prompt format. However, it is worth noting that Claude can hallucinate, so some of the information may be inaccurate.
Text Diffusion Models are Faster at Writing Code (7 minute read)
Diffusion language models generate code at a faster rate than large language models. Increased structure tends to correlate with reduced entropy, which leads to higher confidence token predictions, which directly means more tokens decoded in parallel per step. Tests suggest that it really is the structuredness of the output, not memorization, that matters.
Can LLMs give us AGI if they are bad at arithmetic? (17 minute read)
While large language models are useful tools, it's hard to look at the frontier models and see something approximating human-level intelligence when there are such evident cognitive gaps. These models aren't being fine-tuned to make accurate judgments about even small datasets. There needs to be more efficient ways to attach data that doesn't burn tokens while still allowing models to pass data sets through to efficient tools. This would make tooling a great deal more efficient.
Inside our effort to improve the Mintlify assistant (3 minute read)
Mintlify's AI-powered assistant helps end users get answers from docs with clear citations and useful code examples. This article walks through how the team analyzed and improved the assistant after they decided that it wasn't performing the way they wanted. The team rebuilt its feedback pipeline, moved conversation data into ClickHouse, and categorized negative interactions at scale. Its analysis surfaced that search quality was the assistant's biggest weakness, while most other responses were strong.
👨💻
Engineering & Research
Tinkering with prompts can only get you so far. (Sponsor)
Most companies get stuck tinkering with prompts and wonder why their agents fail to deliver dependable results.
This guide from You.com breaks down the evolution of agent management, revealing the five stages for building a successful AI agent and why most organizations haven't gotten there yet.
Go beyond the prompt: get the guide.
Skills vs Dynamic MCP Loadouts (7 minute read)
The easiest way to work with tools is to ask agents to write their own tools as a skill. This leaves control of the tool largely to the user. Whenever it breaks or needs modification, the user can just ask the agent to adjust it. Dynamic tool loading with MCP will likely become a thing, but it will probably take quite a few protocol changes to bring in skill-like summaries and built-in manuals for the tools.
How we used Codex to build Sora for Android in 28 days (15 minute read)
The initial version of Sora's production Android app was built in 28 days using OpenAI Codex. It took a lean engineering team and roughly 5 billion tokens to ship the project. The app has a crash-free rate of 99.9%. This article describes how OpenAI used GPT-5.1-Codex - the same version available to any developer or business - to build the app.
Agentic coding tools should give more control over message queueing (7 minute read)
Claude Code uses boundary-aware queuing, where new messages are inserted at natural break points, which changes the model's course of action smoothly without stopping ongoing generation. OpenAI Codex uses post-turn queuing, where user messages wait until the current action finishes completely before they are handled. Agentic tools should implement both types of queuing and let users choose which to use. Having that option would make a difference in agentic workflows where users are running three to four agents in parallel.
Evaluating Gemini Robotics Policies in a Veo World Simulator (7 minute read)
Google used its video generation model Veo to build a world simulator that predicts how robotics algorithms will perform in novel environments without physical testing. The system accurately ranked eight policy checkpoints and identified safety vulnerabilities—like a robot knocking over a laptop or grabbing a bottle too aggressively—through 1,600+ simulated rollouts that correlated strongly with real-world results.
Claude Code's DX is too good, and that's a problem (11 minute read)
Claude Code's capabilities have grown tremendously. This means that developers have a lot more to learn. Claude Code is currently optimizing hard for the power user while trying not to lose everyone else. While the learning curve is manageable, every new capability adds weight. The risk is that Claude Code becomes so capable that you need to learn Claude Code to use it.
OpenAI Ends ‘Vesting Cliff' for New Employees in Compensation-Policy Change (6 minute read)
OpenAI has ended a compensation policy that required employees to work at the company for at least six months before their equity vests. The change is designed to encourage new employees to take risks without fear of being let go before accessing their first chunk of equity. OpenAI had shortened its vesting period for new employees to six months from the industry standard of 12 months in April. xAI also made a similar change in the late summer.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email