TLDR DevOps 2026-01-02
2025 Cloudflare Radar ☁️, MongoBleed 🥷, Year In LLMs ⌛
The 2025 Cloudflare Radar Year in Review: The rise of AI, post-quantum, and record-breaking DDoS attacks (4 minute read)
Cloudflare Radar 2025 reported 19% internet traffic growth, Googlebot dominance, aggressive AI crawling with extreme crawl-to-refer ratios, post-quantum encryption securing about half of human web traffic, and Go-based API clients exceeding 20% adoption.
2025: The year in LLMs (28 minute read)
2025 was defined by reasoning-driven models and practical agents—especially coding agents and CLI workflows—unlocking longer autonomous tasks and widespread prompt-based image editing, while raising new safety risks around YOLO usage, AI browsers, and prompt injection. Meanwhile, Chinese open-weight models surged, OpenAI's lead narrowed as Gemini advanced, cloud models pulled ahead of local ones, AI “slop” went mainstream, and data centers drew increasing backlash.
MongoBleed explained simply (7 minute read)
MongoBleed (CVE-2025-14847), a critical vulnerability in MongoDB's zlib1 message compression path, has allowed unauthenticated attackers to read arbitrary heap memory, including sensitive data, across most versions since 2017. Though a fix has been issued for supported versions, over 213,000 internet-exposed MongoDB databases remain vulnerable to this "dead-easy" exploit.
How to integrate Kairos architecturally into an edge AI platform (6 minute read)
Aurea Imaging, a Dutch agricultural tech startup, addressed the challenge of managing and remotely updating a global fleet of NVIDIA Jetson-powered remote sensing devices by adopting a cloud-native approach, including K3s and the CNCF Kairos project. This enabled atomic, image-based OS upgrades, eliminating inconsistent "snowflake" devices and significantly improving operational efficiency.
Observing and scaling MLOps infrastructure on Amazon EKS (7 minute read)
This post explains how to observe and scale MLOps infrastructure on Amazon EKS using Prometheus, Grafana, and Kubernetes autoscaling, with detailed guidance on monitoring GPUs, AWS accelerators, ML-specific metrics, and integrating open source and third-party observability tools.
Terraform Parallelism: How It Works, Tuning, & Best Practices (15 minute read)
This post explains Terraform parallelism, how concurrent resource operations affect provisioning speed, and how to configure and manage parallelism within Terraform and external systems, along with best practices to optimize infrastructure deployment time.
Optimizing Datadog at scale: Cost-efficient observability at Zendesk (19 minute read)
Zendesk engineers reduced Datadog observability costs by auditing metrics, traces, and logs, adopting single-span tracing, targeted sampling, and log deduplication, flattening spend while preserving visibility, performance insights, and engineering workflows.
Efficient image and model caching strategies for AI/ML and generative AI workloads on Amazon EKS (9 minute read)
This post details caching and storage strategies for AI and ML workloads on Amazon EKS. It covers container image caching, data loading, checkpointing, and storage services like Amazon S3, S3 Express One Zone, and FSx for Lustre to optimize performance and cost.
Get our free daily newsletter with curated tools 💻, trends 📈, and insights 💡, for DevOps Engineers 👨💻
Join 340,000 readers for
one daily email