TLDR DevOps 2024-03-29

Terraform 1.7 🏗️, Netflix ML Systems 🪄, Scaling AI Infrastructure ⚖️

News & Trends

CloudBees CD/RO v2024.03.0 Release Brings IPv6 Support, DSL Git Synchronization CLI, and Enhanced Security (3 minute read)

The v2024.03.0 release of CloudBees CD/RO introduces IPv6 support, a DSL Git Synchronization CLI, Red Hat Enterprise Linux 9 ARM compatibility, enhanced web server configuration, and redesigned procedure step and pipeline stage editor interfaces.

Terraform 1.7 is now GA (4 minute read)

Terraform 1.7, now available for use, introduces test mocking for enhanced module testing and a config-driven removal workflow for safer state manipulation. The test mocking feature allows for simulating provider behavior, reducing test runtime and improving module quality. The config-driven remove feature enables bulk and plannable state removal, addressing challenges like workspace migration and cleanup after failures.
Opinions & Tutorials

Supporting Diverse ML Systems at Netflix (12 minute read)

Netflix’s Machine Learning Platform (MLP) team provides tools for Metaflow, an open-source machine learning infrastructure framework. Metaflow's human-friendly API and integrations with production systems enable a smooth transition from prototype to production, supporting a diverse array of ML and AI projects within Netflix. This article highlights some examples.

Scaling AI/ML Infrastructure at Uber (7 minute read)

Uber has marked eight years of advancing machine learning, transitioning from rule-based models to deep learning and Generative AI, necessitating efficient infrastructure. Some of its key metrics include uptime, training efficiency, and developer velocity. Its team has worked on projects including optimizing on-prem infrastructure through federated batch jobs and network upgrades and improving GPU allocation with memory upgrades.

Setting Up Kafka Multi-Tenancy (9 minute read)

DoorDash has enhanced its testing-in-production capabilities through a multi-tenant Kafka system, allowing seamless real-time event processing across both test and production environments without interfering with live operations. This innovative approach minimizes operational overhead by utilizing the production stack for testing, ensuring data isolation and efficient testing. With tenant-aware Kafka implementation, DoorDash achieves accurate load testing and scalability, simplifying the development process and improving system reliability by directly integrating testing into the live environment.

Octopus Deploy is introducing external feed triggers (3 minute read)

Octopus Deploy is introducing external feed triggers, allowing users to initiate releases upon the arrival of new container images or Helm charts in their designated repositories. This enhanced pull-based approach enables more streamlined automation of GitOps workflows within Continuous Delivery pipelines.

Why choose async/await over threads? (9 minute read)

Threads are best for CPU-intensive tasks as they are managed by the OS with higher overhead. Async/await excels in I/O-bound operations, offering efficient, scalable concurrency within user space. The choice between them hinges on the task's demands, with async/await being ideal for high I/O scenarios like web servers due to its lower overhead and improved scalability.
Quick Links

Using GitHub Copilot in your IDE: Tips, tricks, and best practices (9 minute read)

The key to maximizing Copilot’s assistance lies in the quality and clarity of your inputs.

CNCF Incubates Strimzi to Simplify Kafka on Kubernetes (2 minute read)

Strimzi offers Kubernetes-native tools, including operators, to streamline configuration, deployment, and management of Kafka on Kubernetes.
Get our free daily newsletter with curated tools 💻, trends 📈, and insights 💡, for DevOps Engineers 👨‍💻
Join 200,000 readers for