TLDR DevOps 2026-02-20
Modernizing Prometheus 🌆, Metrics That Matter 🧱, Kyverno 1.17 🆕
Is AI creating accountability debt? (Sponsor)
VP: "Just have AI write all your Playwright tests, and you'll release features 10x faster."
Dev: "Sure, but who's accountable if something goes wrong?"
VP: "Still you."
Unreasonable? Yes. But when AI writes the code, who's actually accountable for quality?
This blog post from mabl talks about how most teams pair AI code assistants with Playwright and assume they're covered. But risks accumulate, and coded automation alone isn't enough; quality governance is required. Read the blog
Predictive Optimization at Scale: A Year of Innovation and What's Next (5 minute read)
Databricks' Predictive Optimization feature is now enabled by default for all new Unity Catalog managed tables after managing millions of production tables throughout 2025, delivering up to 22% faster queries and significantly reduced storage costs through automated maintenance. The company is expanding the feature in 2026 with Auto-TTL for automated row deletion based on time-to-live policies and a new Data Governance Hub dashboard that shows storage cost savings and optimization metrics.
VS Code becomes multi-agent command center for developers (5 minute read)
Visual Studio Code v1.109 introduces multi-agent orchestration with support for Anthropic Claude and OpenAI Codex alongside GitHub Copilot, unified session management, parallel subagents, and MCP Apps. The release enhances context retention, security sandboxing, performance, and positions VS Code as a universal AI interface.
Modernizing Prometheus: Native Storage for Composite Types (4 minute read)
The Prometheus community is evolving its TSDB from classic primitive sample storage to native composite sample support for histograms and other types, improving efficiency, transactionality, and reliability. Ongoing work across OpenMetrics 2.0, Remote Write 2.0, and PromQL compatibility aims to enable transparent migration without breaking existing queries.
Metrics that matter: Measuring platform success and maturity (5 minute read)
Nearly 30% of platform teams don't measure success at all, and another 24% can't determine if their metrics have improved, creating an accountability gap that threatens funding and growth, according to the State of Platform Engineering Report Volume 4. The report recommends structured measurement frameworks like DORA metrics (deployment frequency, lead time, MTTR, and change failure rate) and the SPACE framework to translate technical improvements into business value, with teams that fail to establish measurement practices by 2026 facing potential budget cuts.
Chat with Your App Service Logs Using GitHub Copilot (7 minute read)
An open source MCP server integrates GitHub Copilot with Azure App Service logs, enabling natural language debugging, log queries, deployment correlation, and root cause analysis directly in the IDE using existing Azure credentials. The proof of concept improves observability by combining real time tooling with embedded domain specific debugging guidance.
Run NanoClaw in Docker Shell Sandboxes (3 minute read)
Docker Sandboxes launched a new shell sandbox feature that lets developers run AI agents like NanoClaw (a Claude-powered WhatsApp assistant) inside isolated microVMs with secure credential management. The shell sandbox provides a minimal Ubuntu environment with Node.js, Python, and Git pre-installed, allowing users to install any AI tool while keeping API keys protected through Docker's credential proxy system that prevents actual keys from existing inside the sandbox.
Choosing a Language Based on Its Syntax? (7 minute read)
Choosing a programming language based solely on its surface-level syntax misunderstands what truly matters: semantics, type systems, and overall language design. Syntax affects ergonomics and readability, but it reflects deeper semantic decisions, and experienced programmers prioritize those foundations over aesthetic preferences or first-exposure bias.
AWS CloudWatch Alarm Mute Rules eliminate alert fatigue (2 minute read)
Amazon CloudWatch's Alarm Mute Rules allow teams to temporarily silence up to 100 alarms during deployments or maintenance while preserving visibility. Expired rules automatically trigger pending actions if states persist, reducing alert fatigue without risking missed critical issues.
Get our free daily newsletter with curated tools 💻, trends 📈, and insights 💡, for DevOps Engineers 👨💻
Join 340,000 readers for
one daily email