1. Learnings from optimising 22 of our most expensive Snowflake pipelines.
What were the key learnings and improvements identified while optimizing 22 of our most expensive Snowflake pipelines?
The author Raphael Montaud, explores a pragmatic approach to reducing Snowflake costs by optimizing 22 of Medium's most expensive data pipelines. This article is a must-read for engineers balancing cost efficiency with legacy system constraints. Learn actionable strategies like aligning pipeline schedules, leveraging partition pruning, simplifying predicates, and modularizing datasets to cut costs effectively without overhauling your system.
2. BigLake: BigQuery’s Evolution toward a Multi-Cloud Lake house.
How is BigLake transforming BigQuery into a multi-cloud lake house?
The authors explore how BigLake is evolving BigQuery into a true multi-cloud lake house, bridging the gap between data lakes and enterprise data warehouses. Learn how innovations like BigLake tables, unstructured data analysis with AI/ML, and cross-cloud capabilities empower organizations to unify their analytics across platforms. This article is a must-read for anyone navigating the complexities of modern cloud-based data systems.
3. Table format comparisons - Change queries and CDC.
Which table format is best for change data capture and query optimization?
Jack Vanlightly, discusses the evolving capabilities of table formats like Apache Iceberg, Delta Lake, Hudi and Paimon in managing Change Data Capture (CDC) and incremental change queries. How do they handle updates, deletes, and row-level tracking differently, and what does this mean for your data pipelines. This article is a must-read for those navigating modern lake house architectures and seeking efficient CDC solutions.
https://jack-vanlightly.com/blog/2024/9/19/table-format-comparisons-change-queries-and-cdc
4. Text-to-SQL’s Power Players: Comparing Claude 3.5 Sonnet, GPT-4o, Mistral Large 2, Llama 3.1
Which Text-to-SQL model reigns supreme: Claude 3.5 Sonnet, GPT-4o, Mistral Large 2, or Llama 3.1?
Text-to-SQL model handles complexity, efficiency, and cost best—GPT-4o, Claude 3.5 Sonnet, Mistral Large V2, or Llama 3.1? This post dives deep into model accuracy, query speed, and token efficiency across tasks like table joins, column selection, and handling massive schemas. Whether you're a data engineer or just curious about the evolving AI space, this analysis highlights key strengths and trade-offs you need to know.
5. Best Practices for Using QUERY_TAG in Snowflake
Unlocking Insights with QUERY_TAG in Snowflake
Jon Osborn discusses that QUERY_TAG in Snowflake isn't just a tool—it's a game-changer for tracking, debugging, and optimizing complex queries. How can a simple tag improve visibility across your Snowflake operations? By embedding structured metadata like JSON into queries, you can unlock powerful insights into query performance, resource usage, and costs. If you're looking to streamline analytics and enhance your Snowflake workflows, this is a must-read.
https://medium.com/snowflake/best-practices-for-using-query-tag-in-snowflake-32bfb8d4efba
All rights reserved Den Digital, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employer” opinions.