1.Building a transaction data lake using Amazon Athena, Apache Iceberg and dbt
How can you build a transaction data lake using Amazon Athena, Apache Iceberg, and dbt?
The UK Ministry of Justice revamped its data architecture using Amazon Athena, Apache Iceberg, and dbt to transform its data lake. This shift led to a 99% cost reduction, faster data refreshes, and improved maintainability, all without sacrificing reliability. Their journey offers key insights into optimizing ELT pipelines for scalability and efficiency.
2. Our journey to Snowflake monitoring mastery
How did we achieve Snowflake monitoring mastery?
Rob Scriva wants to show how mastering Snowflake monitoring can transform data visibility and cost efficiency. How do you track performance, optimize costs, and maintain transparency in a rapidly scaling data platform? This deep dive explores Canva’s journey, revealing key learnings, advanced metadata strategies, and a practical approach to Snowflake observability. A must-read for data teams navigating the challenges of modern analytics.
https://www.canva.dev/blog/engineering/our-journey-to-snowflake-monitoring-mastery/
3.Use of Time in Distributed Databases (part 5): Lessons learned
How does time influence performance and consistency in distributed databases?
This article explores how time has evolved from a simple ordering tool to a critical mechanism for coordination, performance, and correctness. From Google's Spanner to Aurora Limitless, modern systems leverage synchronized clocks, Hybrid Logical Clocks, and time-based speculation to optimize performance while maintaining consistency. A must-read for anyone interested in the future of distributed systems.
https://muratbuffalo.blogspot.com/2025/01/use-of-time-in-distributed-databases_14.html
4. Test smarter not harder: add the right tests to your dbt project
How can you add the right tests to your dbt project for smarter testing?
Faith McKenna and Jerrie Kumalah Kenney ask an important question, are you testing your dbt project effectively, or just adding noise? This article explores a structured approach to data quality, balancing critical errors with meaningful warnings. Learn how to prioritize tests that truly matter, reduce alert fatigue, and ensure your analytics drive real insights.
https://docs.getdbt.com/blog/test-smarter-not-harder
5. Duck Takes Flight: Streaming Data in DuckDB
How does DuckDB handle streaming data at scale?
Mike Ritchie explores a clever workaround for DuckDB’s biggest limitation, its lack of concurrent writes. How do you enable real-time analytics while keeping DuckDB’s speed and simplicity? The answer: Arrow Flight. This lightweight Python-based approach sidesteps concurrency issues, allowing multiple readers and writers to interact with DuckDB simultaneously. If you’re dealing with real-time data pipelines and want a simple, scalable solution, this is a must-read.
https://www.definite.app/blog/duck-takes-flight
All rights reserved Den Digital, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employer” opinions.