TLDR DevOps 2025-11-19
State of Cloud Security 🥷, Impact of Coding Agents ✨, Google’s Antigravity 🔮
Kgateway v2.1 is released! (5 minute read)
Kgateway v2.1 features agentgateway integration for AI connectivity, Kubernetes Gateway API v1.3.0 conformance, and global namespace policies.
Key learnings from Datadog's 2025 State of Cloud Security study (12 minute read)
Datadog's 2025 State of Cloud Security study revealed persistent weaknesses across AWS, Azure, and Google Cloud, including widespread use of long-lived credentials, lagging adoption of IMDSv2, and insufficient guardrails against public storage access. Datadog Cloud Security helps organizations strengthen their posture by enforcing organizational guardrails, identifying risky workloads, securing data perimeters, and detecting misconfigurations across multi-cloud environments.
Replicate is joining Cloudflare (6 minute read)
Replicate, a platform for running AI models, has been acquired by Cloudflare to integrate Replicate's platform directly into Cloudflare and expand the model catalog. As part of the acquisition, Replicate's 50,000+ open-source models and fine-tunes will be brought to Workers AI. Cloudflare plans to introduce fine-tuning capabilities and custom model support to Workers AI, leveraging Replicate's Cog tool. Current users of Replicate and Workers AI can expect uninterrupted service and enhanced performance due to Cloudflare's global network.
How when AWS was down, we were not (30 minute read)
Authress stayed up during the major us-east-1 AWS outage by engineering for 5-nines reliability: eliminating unreliable dependencies, using multi-region and edge failover, running custom health checks, doing incremental rollouts, and applying anomaly detection, rate-limits, and customer-driven incident signals. Their architecture assumes “everything fails,” so they continuously validate data, automatically fail over regions, block abusive traffic, and treat customer support as part of their reliability system.
The productivity impact of coding agents (3 minute read)
A large study of tens of thousands of Cursor users found that after agents became the default, organizations merged 39% more PRs with no increase in revert or bug-fix rates. Senior developers accepted agent-written code more often, planned more before coding, and generally used agents more effectively, while most user requests (61%) were for code implementation
Azure Developer CLI: Azure Container Apps Dev-to-Prod Deployment with Layered Infrastructure (8 minute read)
This guide explains how to use Azure Developer CLI v1.20.0 with Azure Container Apps to implement a “build once, deploy everywhere” workflow through separate container operations and layered infrastructure. It demonstrates how to deploy a Flask application across development and production environments using shared resources, CI/CD pipelines, and GitHub Actions for consistent container management.
Plan Your IT and Security Convergence (Sponsor)
Atuin Desktop (GitHub Repo)
Atuin Desktop, a local-first runbook editor currently in open beta, aims to bridge the gap between documentation and automation for terminal workflows. The editor allows users to create executable runbooks that can be used to solve common infrastructure problems.
Google Antigravity (Resource)
Google Antigravity is an agent-first development platform that pairs an AI-powered IDE with autonomous agents capable of planning and executing complex software tasks across multiple surfaces. It adds task-level transparency, async agent management, easy feedback, and built-in learning to support higher-autonomy coding with models like Gemini 3.
How Dash uses context engineering for smarter AI (7 minute read)
Dropbox Dash has evolved from a search system into an agentic AI to better interpret, summarize, and act on information, requiring a shift towards context engineering. Dash curates context through retrieval consolidation, relevant context filtering, and specialized task agents to improve the AI's decision-making, addressing issues like "analysis paralysis" and "context rot" that arise with too many tools and excessive data.
Cloudflare outage on November 18, 2025 (12 minute read)
Cloudflare's global outage yesterday was caused by a ClickHouse permissions change that accidentally doubled the size of a Bot Management “feature file,” exceeding a hard-coded limit and causing the core proxy (FL/FL2) to crash and return widespread 5xx errors. After initially suspecting a DDoS attack, Cloudflare stopped propagation of the bad file, restored a known-good version, and fully recovered services by 17:06 UTC.
Get our free daily newsletter with curated tools 💻, trends 📈, and insights 💡, for DevOps Engineers 👨💻
Join 340,000 readers for
one daily email