TLDR Data 2026-03-05
Code Mode Tooling ⚙️, Agentic BI Evaluation Gaps ⚠️, GitHub’s Search Architecture 🔎
How we rebuilt the search architecture for high availability in GitHub Enterprise Server (5 minute read)
GitHub rebuilt the search architecture using Elasticsearch's Cross Cluster Replication (CCR) to run independent single-node clusters per instance (primary and replicas), enabling durable persistence, asynchronous replication triggered after Lucene segments are created, custom workflows for setup and failover management, zero-downtime migrations, and automatic replica promotion for failover.
Optimizing Recommendation Systems with JDK's Vector API (9 minute read)
Netflix reduced CPU utilization for its Ranker service's serendipity scoring feature from 7.5% to ~1% per node by re-architecting its scoring logic. Key optimizations included transitioning from O(M×N) scalar dot products to batched, cache-friendly matrix multiplies with flat buffers, leveraging the JDK Vector API for SIMD performance gains in pure Java, and eliminating unnecessary allocations. These changes yielded a 7% CPU drop, 12% latency reduction, and 10% improvement in CPU/RPS.
Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale (19 minute read)
A validation-aware, two-tier caching strategy for production-grade RAG systems reduces LLM token costs by over 30% and slashes response times from ~36 seconds to milliseconds for semantically similar queries. Combining semantic caching (embedding-based, ~95% similarity) and retrieval caching (context/topic-level, >70%), the architecture addresses redundancy, data staleness, and cache invalidation via timestamp checks, SHA-256 fingerprinting, and predicate caching.
The Reckoning Is Already Here (3 minute read)
AI tools are already replacing a lot of routine data engineering and analytics work right now (not in the future), so prioritize deep business understanding, irreplaceable domain expertise, strong community ties, and staying ahead by mastering the newest AI models.
How Long Until We Call AI Agents Data Products (7 minute read)
AI agents in production must be managed as full-fledged data products, requiring rigorous observability, security, and iterative product analytics beyond standard logging. Treating agent interactions as actionable feedback loops drives roadmap decisions, while layered security and conversational discoverability are essential for user trust and adoption.
Layer by Layer, We Built Data Systems No One Understands (6 minute read)
The modern data stack has evolved into incomprehensible "fractal" complexity through endless layering of tools, driven by promises of "ease" that enable rapid prototyping but foster departmental silos, decision avoidance, unchecked AI/LLM code generation, business logic over-modeling, and disconnection from real business value.
SQL Is Solved. Here's Where Chat-BI Still Breaks (7 minute read)
Empirical testing of agentic chat-BI systems using BIRD and DABStep benchmarks revealed high SQL generation accuracy (over 70% correct on BIRD) but exposed critical failure nodes: ambiguous metric definitions, out-of-scope questions, and common-sense gaps. Context and rule files (e.g., RULES.md) help but induce compounding errors and overfitting as complexity grows. Iterative human-in-the-loop evaluation, structured error classification, deterministic metric definitions, and reproducible CI testing are essential for reliability.
Stop Calling Tools, Start Writing Code (Mode) (8 minute read)
The code mode pattern improves MCP tool usage by having the LLM write and execute a script that composes multiple tools in a sandbox, instead of calling tools sequentially. This reduces context window bloat and round-trip overhead, making large tool catalogs far more scalable and efficient for LLMs to use.
PgJitter (GitHub Repo)
PgJitter is a lightweight PostgreSQL extension that replaces the default LLVM JIT compiler with faster alternatives (sljit, AsmJIT, and MIR), enabling native code generation in microseconds instead of milliseconds. This dramatically reduces compilation overhead and makes JIT practical for a wider range of queries, especially OLTP workloads.
PostgreSQL Blink-tree Implementation (7 minute read)
PostgreSQL implements a high-concurrency version of B-tree indexes called Blink-Tree, adding a simple "link" pointer between sibling nodes and a "high-key" boundary marker in each node. This lets searches move quickly to the right sibling if needed without holding locks across multiple levels (no lock-coupling during reads), while structure changes like page splits use brief bottom-up lock-coupling on just a few nodes at a time, reducing lock contention dramatically.
AI Evals in the Real World: Human Judging, LLM Judges, and the Gaps Between (4 minute read)
Regular AI test scores don't work well for customer-service bots that need to keep conversations going, understand hidden intent, and actually get users to share contact info. The team built a better scoring system that mixes human taste-testing for tricky parts with LLM-as-judge auto-scoring for scale, plus human spot checks on bad cases.
Something is afoot in the land of Qwen (5 minute read)
The Qwen 3.5 open-weight model family from Alibaba is gaining attention for delivering strong performance across a wide range of model sizes, including very small models that run locally while still supporting reasoning and multimodal tasks. However, the project's future is uncertain after the sudden resignation of its lead researcher and several core team members following an internal Alibaba reorganization.
Curated deep dives, tools and trends in big data, data science and data engineering 📊
Join 400,000 readers for
one daily email