TLDR DevOps 2025-06-11
Airbnb Load Testing ⚖️, Azure’s SRE Agent ✨, Rust Build Times 🧱
Pricing and usage model updates for Amazon EC2 instances accelerated by NVIDIA GPUs (2 minute read)
AWS now offers savings plans for EC2 P6-B200 instances, which were previously limited to EC2 Capacity Blocks for ML. Pricing for EC2 P5, P5en, P4d, and P4de instances has been reduced by up to 45 percent, with expanded On-Demand capacity availability across multiple global regions.
Announcing Pulumi Identity and Access Management (IAM) (5 minute read)
Pulumi IAM is a new capability designed to embed granular security directly into cloud development lifecycles. With the initial phase of Pulumi IAM, users can define custom roles built from fine-grained permissions and apply them specifically to Organization Access Tokens. The new service will be rolled out in phases with Granular Access Tokens & Custom Roles available now, User & Team Role Assignment coming soon, and Advanced Authorization & Scalability being a future release.
Connecting Applications to Self-Service Datastores (6 minute read)
Zendesk's engineering team automated datastore credential delivery and rotation using Kubernetes. They used init containers injected by mutating admission webhooks to provide applications with credentials in a consistent location and a sidecar container to evict pods before credential expiration, ensuring automatic compatibility with regularly rotated credentials.
On Azure's new SRE Agent (3 minute read)
Azure's new SRE Agent provides quick insights and visualizations in response to user questions, but it drew a clearly incorrect conclusion from its own data and proceeded with an investigation based on that error. Although the agent offers impressive functionality, including proposing fixes, its flawed logic and pushy behavior raise serious concerns about reliability and safety.
Why Environments Beat Clusters For Dev Experience (9 minute read)
Most Kubernetes tools are infrastructure-focused and fail to address the real needs of developers, who prioritize environments, promotions, and version clarity over clusters, deployments, and Git hashes. Codefresh GitOps Cloud was created to fill this gap by offering a developer-centric platform centered around environments, product versions, real-time deployment health, and simplified application promotions.
Load Testing with Impulse at Airbnb (7 minute read)
Airbnb's Impulse is an internal load-testing-as-a-service framework that allows service owners to conduct context-aware load tests, mock dependencies, and collect traffic data. The Impulse framework includes independent tools for generating synthetic loads, mocking dependencies, and collecting traffic data. It has been implemented in several customer support backend services, with other teams planning to leverage it.
Inside GitHub: How we hardened our SAML implementation (18 minute read)
GitHub has supported enterprise SAML single sign-on since 2014, but the complexity and security risks of maintaining a custom implementation led to persistent concerns. To improve trust and sustainability, the engineering team re-evaluated their approach by auditing libraries, introducing A/B testing, tightening schema validation, and using multiple XML parsers to reduce attack surface and impact.
Why doesn't Rust care more about compiler performance? (14 minute read)
Despite frequent complaints about Rust's slow compile times, the Rust Project does care deeply about compiler performance and has made meaningful improvements, nearly halving clean build times over the past three years. However, further progress is constrained by technical complexity, limited contributor resources (often volunteer-based), necessary trade-offs in performance vs. maintainability, and competing priorities like stability, new features, and broad platform support.
Get our free daily newsletter with curated tools 💻, trends 📈, and insights 💡, for DevOps Engineers 👨💻
Join 340,000 readers for
one daily email