DuckDB npm Attack ⚠️, 2025 Data Eng Trends 📊, Kestra Orchestration 1.0 Release 🚀

Peeking Inside the SQL Server Transaction Log (9 minute read)

SQL Server's change data capture currently relies on system change tables populated by the SQL Server Agent, which Debezium polls at configurable intervals to stream CDC events. Direct parsing of the SQL Server transaction log—mirroring Oracle CDC approaches—could reduce latency and increase efficiency. This article details the physical storage architecture of transaction logs (virtual log files, blocks, and LSNs), data files, partitions, and data pages, with practical walkthroughs for low-level analysis using system views and DBCC utilities.

TLDR Data 2025-09-11

DuckDB npm Attack ⚠️, 2025 Data Eng Trends 📊, Kestra Orchestration 1.0 Release 🚀

Operationalizing first-party data (Sponsor)

Deep Dives

Peeking Inside the SQL Server Transaction Log (9 minute read)

Past Year's Data Engineering and Current Trends (2025 edition) (7 minute read)

TimescaleDB to ClickHouse Replication: Use Cases, Features, and How We Built It (6 minute read)

Opinions & Advice

Is Data Modeling Dead? (4 minute read)

Will AI Permanently Disrupt the Bundling and Unbundling Cycle? (34 minute podcast)

SCD2 Deep Dive with dlt: How Nested Data Affects Queries and Costs (5 minute read)

Launches & Tools

Unify Analytics & AI: Free Builder Workshops (Sponsor)

Can Collations Be Used Over citext? (6 minute read)

DuckDB npm Packages Compromised (2 minute read)

Kestra 1.0 — Declarative Orchestration with AI Agents and Copilot (17 minute read)

LLM Query Performance Testing (GitHub Repo)

Miscellaneous

The SELECT FOR UPDATE Trap Everyone Falls Into (7 minute read)

Big Data on the Move: DuckDB on the Framework Laptop 13 (5 minute read)

Quick Links

The CEO asked why monthly active users didn't match across reports (2 minute read)

Query Operator Structures (3 minute read)

Curated deep dives, tools and trends in big data, data science and data engineering 📊