TLDR AI 2025-08-08

OpenAI GPT-5 5️⃣, Cursor CLI 💻, Grok ads 📰

Get Audit-Ready Fast with Vanta's Compliance Automation (Sponsor)

Whether you're a fast-growing startup or an established security team, Vanta can help you achieve continuous compliance (and more).

Join the live demo on Aug 20 learn how Vanta can help you:

Streamline evidence collection and audit for frameworks like SOC 2, ISO 27001, HIPAA, CMMC, and ISO 42001
Continuously monitor controls to build and reinforce your security foundation
Scale your program across internal and vendor risk, demonstrate trust, and answer questionnaires with automation and AI

Plus, get answers to your questions directly from the Vanta team. Secure your spot today.

🚀

Headlines & Launches

X to Introduce Ads in Grok Responses (3 minute read)

Elon Musk announced that X will begin inserting ads into Grok's AI responses, aiming to boost the platform's ad revenue using xAI's targeting tech and chatbot suggestions.

OpenAI GPT-5 (3 minute read)

OpenAI has launched GPT-5, its most advanced AI model yet, now available to all ChatGPT users.

Cursor Releases Terminal Coding Agent in Early Beta (1 minute read)

Similar to Claude Code and Gemini CLI, Cursor CLI brings AI coding assistance directly to the terminal and allows developers to switch seamlessly between command-line and editor-based AI workflows.

🧠

Deep Dives & Analysis

METR's Evaluation of GPT-5 (44 minute read)

METR assessed whether OpenAI GPT-5 could pose catastrophic risks before it was externally deployed. This post provides detailed findings from METR's assessment. METR determined that GPT-5 does not have the prerequisite capabilities - by a large margin - to pose catastrophic risk.

Vibe Check on GPT-5 (13 minute read)

GPT-5 excels as a daily driver for most users, with API pricing that aggressively undercuts competitors by up to 12x. However, it is too cautious for writing feedback or for autonomous coding workflows, making it feel like a significant upgrade to an old paradigm rather than a leap forward for developers already using multi-agent tools like Claude Code.

GPT-5 Hands-On: Welcome to the Stone Age (3 minute read)

GPT-5 marks the "stone age" for AI. It doesn't just use tools - it thinks with them, like humans first learning to shape stones changed everything. Testing showed it one-shotted dependency conflicts that stumped every other model by using yarn commands like Deep Research uses web search, iterating and reasoning through problems instead of just guessing. While it's worse at writing than GPT-4.5 (producing more "LinkedIn slop"), it's unequivocally the best coding model - creating entire production-ready websites with SQLite databases in one shot, where other models gave scaffolding or plans, automating software engineering from maybe 65% to 72% in one leap.

👨‍💻

Engineering & Research

Momentic: AI that makes testing effortless (Sponsor)

YC-backed Momentic uses AI to automate web app testing. Write tests in Plain English ("the login button should be visible") and let AI execute the test. Join hyper-growth companies like Retool, Notion, Webflow, and 100s of others using Momentic. See how it works

Octo (GitHub Repo)

Octo is a very friendly open-source coding helper. It works with any OpenAI- or Anthropic-compatible LLM API. Octo allows developers to switch models at will mid-conversation when a particular model gets stuck. Users have the option to use custom-trained models to automatically handle tool call and code edit features from the main models. Octo has zero telemetry.

Notte (GitHub Repo)

Notte is a web agent framework built for speed, cost-efficiency, scale, and reliability. It allows developers to rapidly build reliable web automation agents. Notte provides all the essential tools for building and deploying AI agents that interact seamlessly with the web. The full-stack framework combines AI agents with traditional scripting for maximum efficiency. It enables users to develop, deploy, and scale their own agents and web automations within a single API.

From Hard Refusals to Safe-Completions: Toward Output-Centric Safety Training (23 minute read)

The technique uses two reward parameters during training: safety constraints that penalize policy violations by severity, and helpfulness maximization that rewards both direct compliance and informative refusals with safe alternatives. In tests, GPT-5 with safe-completions achieved higher safety scores than o3 on dual-use prompts while providing substantially more helpful responses, and when failures did occur, they were significantly less severe.

Achieving 10,000x training data reduction with high-fidelity labels (11 minute read)

Identifying policy-violating content requires solutions capable of deep contextual and cultural understanding, an area that large language models (LLMs) excel at over traditional machine learning systems. However, fine-tuning models for such complex tasks requires high-fidelity training data that is difficult and expensive to curate at the necessary quality and scale. This post describes a scalable curation process for active learning that drastically reduces the amount of training data needed for fine-tuning LLMs while significantly improving model alignment with human experts. In experiments, the process reduced the scale of training data needed from 100,000 to under 500 training examples while increasing model alignment with human experts by up to 65%.

🎁

Miscellaneous

Stability AI Launches Enterprise Creative Solutions (3 minute read)

Stability AI introduced a new enterprise offering that delivers tailored generative AI models and workflows for creative production.

OpenAI's o3 Crushes Grok 4 In Final, Wins Kaggle's AI Chess Exhibition Tournament (6 minute read)

OpenAI's o3 beat Grok 4 in the Kaggle Arena AI Chess Exhibition Tournament. Gemini 2.5 Pro defeated o4-mini 3.5-0.5 for the bronze medal. Grok appears to struggle with end games. o3 could understand the endgame much better and came up with better moves, eventually checkmating Grok.

⚡️

Get the most interesting AI stories and breakthroughs delivered in a free daily email.

Join 1,100,000 readers for one daily email

Privacy Careers Advertise