1. BigQuery HLL: How we cut COUNT(DISTINCT) query costs by 93% using HyperLogLog
Can BigQuery’s HyperLogLog transform COUNT(DISTINCT) queries and slash costs?
Elad Shaabi discusses that COUNT(DISTINCT) queries can be a massive bottleneck at scale, consuming resources and time. By adopting HyperLogLog (HLL), they achieved a 93% reduction in costs, cutting query times from hours to seconds, while scaling seamlessly across billions of records. This post is a must-read for anyone aiming to optimize BigQuery workflows and make smarter, faster decisions.
2. How does Netflix ensure the data quality for thousands of Apache Iceberg tables?
How does Netflix maintain high data quality across thousands of Apache Iceberg tables?
Vu Trinh discusses in this article that Netflix ensures data quality at scale using the Write-Audit-Publish (WAP) pattern, powered by Apache Iceberg's branching capabilities. By staging data changes, auditing them, and only merging high-quality data to production, they maintain robust data standards. This blog provides a deep dive into WAP implementation with Iceberg, making it essential reading for anyone curious about scalable data governance techniques.
3.Why is Kafka not Ideal for Event Sourcing?
Is Kafka the Right Choice for Event Sourcing, or there better alternatives?
The author discusses that while Kafka is a powerful tool for streaming and messaging, it falls short of being an ideal event store for event sourcing. With limitations in optimistic concurrency control, entity loading, and atomic writes, Kafka struggles to meet the unique demands of event-sourced systems. This post explores why purpose-built event stores are better suited for event sourcing and the trade-offs involved. A must-read if you're deciding between Kafka and dedicated event store solutions.
https://dcassisi.com/2023/05/06/why-is-kafka-not-ideal-for-event-sourcing/
4.A 12 step guide to using data governance for GDPR compliance
How can a 12-step data governance strategy ensure GDPR compliance?
The author explores the UK ICO’s 12-step GDPR preparation guide, illustrating how data governance platforms like Collibra streamline compliance. From tracking data ownership and privacy notices to managing data breaches and protection impact assessments, this guide emphasizes trust, collaboration, and accountability. If you're working towards GDPR compliance, this article offers practical insights and actionable steps to stay ahead of the curve.
https://www.collibra.com/us/en/blog/data-governance-and-gdpr?utm_source=chatgpt.com
5. From Data to Insights: Segmenting Airbnb’s Supply
How does Airbnb efficiently segment its global supply to uncover actionable insights?
Alexandre Salama, Tim Abraham discuss that understanding host availability patterns goes beyond raw data—it’s about behavior. How does Airbnb distinguish between seasonal and always-on hosts, or occasional and event-driven listings? Using innovative features like streakiness and seasonality, paired with machine learning, Airbnb has developed scalable, nuanced host segmentation. This article reveals how these insights drive personalized strategies and open new doors for industries beyond hospitality.
https://medium.com/airbnb-engineering/from-data-to-insights-segmenting-airbnbs-supply-c88aa2bb9399
All rights reserved Den Digital, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employer” opinions.