DEN Newsletter #3

Data Engineering News

Jan 10, 2025

1.Best practices for cost-efficient Kafka clusters

How to Optimize Kafka for Cost Efficiency Without Compromising Performance?

Managing Kafka clusters can be expensive due to their need for high throughput, low latency, and substantial resources. But what if you could optimize them for cost-efficiency without sacrificing performance? This guide breaks down the major cost drivers like compute, data transfer, and storage, and offers best practices for reducing expenses—such as using client-level compression, eliminating inactive resources, and adopting dynamic sizing. Whether you're just starting or scaling up, this is a must-read for mastering cost-efficient Kafka management.

https://stackoverflow.blog/2024/09/04/best-practices-for-cost-efficient-kafka-clusters

2. Sampling with SQL

How can you efficiently extract meaningful data from massive datasets?

Sampling with SQL offers a fast and effective solution. In this deep dive, Tom Moertel explains clever algorithms for taking weighted samples—without replacement—using SQL, revealing the hidden connection between sampling and Poisson processes.Whether dealing with vast datasets or needing deterministic sampling, this is essential reading for anyone working with big data.

https://blog.moertel.com/posts/2024-08-23-sampling-with-sql.html

3.Leveraging A/B Testing to “soft disable” unused features and reduce unnecessary calls

Can Disabling Features Help Reduce Digital Emissions?

Today tech-driven activities contribute up to 4% of global emissions, leboncoin, part of Adevinta, sought a solution. Solution- By using A/B testing to "soft disable" rarely-used features, they significantly cut API calls—lowering carbon emissions—without hurting user experience. The project, which won them the “Sobriety Prize,” showcases how small tweaks can drive meaningful environmental impact. This article is a must-read for anyone looking to combine eco-design with data-driven decision-making.

https://medium.com/adevinta-tech-blog/leveraging-a-b-testing-to-soft-disable-unused-features-and-reduce-unnecessary-calls-5a00a8dea7d4

4.Chapter One Overview: Data Engineering Described in ‘Fundamentals of Data Engineering’

What is Data Engineering? A Dive into 'Fundamentals of Data Engineering'?

What exactly is data engineering, and why is it so vital in today’s data-driven world? In this first chapter of Fundamentals of Data Engineering, Mohamed Elaraby breaks down the lifecycle of data engineering—from data generation to transformation and serving. The chapter provides a clear overview of how data engineers prepare raw data for analysis, enabling data scientists to focus on insights. Essential reading for anyone eager to understand the backbone of modern data systems.

https://medium.com/@stolzmo/a-beginners-guide-to-fundamentals-of-data-engineering-chapter-one-6611369d0131

5.Flink SQL Development Experience Sharing

How can you efficiently process real-time data streams?

This article shares essential lessons from developing with Flink SQL, a powerful tool for handling massive data streams in real-time. From optimizing query performance to managing system resources, it offers practical tips to streamline development and make the most of Flink's capabilities. Perfect for engineers looking to enhance their data processing pipelines with high efficiency and scalability.

https://www.alibabacloud.com/blog/flink-sql-development-experience-sharing_601569

All rights reserved Den Digital, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employer” opinions.

dat-a-man — Data with Aman

Discussion about this post