DEN Newsletter #10

Data Engineering News

Jan 12, 2025

1. Netflix’s Distributed Counter Abstraction

How does Netflix handle billions of events daily with its distributed counter abstraction?

In this blog, Netflix engineers Rajiv Shringi, Oleksii Tkachuk and Kartik Sathyanarayanan share their innovative solution—a Distributed Counter Abstraction. Built on Netflix’s TimeSeries Abstraction, this service achieves near-real-time counting at global scale with low latency, while balancing trade-offs like consistency and performance.

https://netflixtechblog.com/netflixs-distributed-cou

2. Mastering Airflow DAG Standardization with Python’s AST: A Deep Dive into Linting at Scale

How Can Python’s AST revolutionize Airflow DAG Standardization at Scale?

Snir Israeli discusses that Python’s AST can revolutionize how we maintain consistent Airflow DAGs at scale. the author shares how Next Insurance built DAGL int to enforce best practices, streamline workflows, and boost maintainability. If you’re curious about leveraging AST for scalable linting and cleaner data pipelines, this blog is a must-read.

https://medium.com/apache-airflow/mastering-airflow-dag-standardization-with-pythons-ast-a-deep-dive-into-linting-at-scale-1396771a9b90

3. Parquet pruning in DataFusion

How does Data fusion make parquet queries lightning-fast?

Xiangpeng Hao discusses how Apache DataFusion leverages metadata, row group stats, and even Bloom filters to prune unnecessary data and speed up queries. If you’re curious about cutting-edge techniques like page-level pruning and filter pushdown, this blog is a fascinating read.

Xiangpeng HaoParquet pruning in DataFusion – Xiangpeng’s blog

4. Write Manageable Queries With The BigQuery Pipe Syntax

Can BigQuery’s pipe syntax simplify your SQL queries?

JBarti explores how this syntax reduces complexity in ELT workflows and compares it to languages like Flux from InfluxDB. If you’re curious about SQL innovation and quality-of-life improvements for data engineers, this blog is a must-read.

https://medium.com/@josip.bartulovic3/write-manageable-queries-with-the-bigquery-pipe-syntax-4263efd67487

5. Democratize Data and Information With Text-To-Code Models (text2sql)

Can text-to-code models democratize data access through SQL?

The author says that tools like Simon, Fiverr’s text-to-SQL system, can democratize data access by enabling non-technical users to query databases effortlessly. Idan Lenchner explores the process of choosing the right LLM, fine-tuning it with internal data, and implementing feedback loops to enhance performance. If you’re curious about bridging the gap between humans and data, this post is a fascinating guide.

https://medium.com/fiverr-engineering/democratize-data-and-information-with-text-to-code-models-text2sql-cb6beff6f820

6. Introducing the Prompt Engineering Toolkit

Is prompt engineering the Key to unlocking LLM potential?

The author explores how Uber's prompt engineering Toolkit is revolutionizing how we design, evaluate, and deploy prompts for Large Language Models. With features like auto-prompt generation, revision control, and dynamic context enrichment, it bridges the gap between experimentation and production use. If you're curious about advancing AI workflows responsibly and efficiently, this post is a must-read

https://www.uber.com/en-IN/blog/introducing-the-prompt-engineering-toolkit/

All rights reserved Den Digital, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employer” opinions.

dat-a-man — Data with Aman

Discussion about this post

Ready for more?