DEN Newsletter #8

Data Engineering News

Jan 12, 2025

1. Loading data into Redshift with dbt

What steps are involved in setting up a dbt pipeline to load and transform data in Amazon Redshift?

In this article, Christopher Arnold discusses how integrating dbt with Redshift Spectrum can cut down data loading times from hours to minutes while resolving common issues like schema changes and data deduplication.

https://engineeringblog.yelp.com/2024/11/loading-data-into-redshift-with-dbt.html

2. Revisiting the Outbox pattern

Could the outbox pattern be the key to simplifying your distributed systems?

in this article,Gunnar Morling discusses into the outbox pattern, exploring its benefits, challenges, and alternatives. Learn how to effectively implement this pattern to ensure seamless data exchanges and robust system architecture.

https://www.decodable.co/blog/revisiting-the-outbox-pattern

3. Building data pipelines effortlessly with a DAG Builder for Apache Airflow.

Are you looking for an effortless way to build and manage data pipelines using Apache Airflow's DAG Builder?

In this blog post, Gustavo Akashi from QuintoAndar reveals their innovative DAG Builder solution for Apache Airflow. Learn how leveraging YAML configurations and CI/CD pipelines can streamline your workflow creation, enhance productivity, and ensure consistency across all your data pipelines.

https://medium.com/quintoandar-tech-blog/building-data-pipelines-effortlessly-with-a-dag-builder-for-apache-airflow-2f5f307fb781

4. How we built a new powerful JSON data type for ClickHouse

How we built a new powerful JSON data type for ClickHouse?

In this comprehensive blog post, the ClickHouse team unveils their new JSON data type, addressing key challenges such as column-oriented storage, dynamic data types, and scalability. Discover how these innovations optimize performance and maintain data integrity, making ClickHouse a top choice for handling complex JSON data in modern analytics.

https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse

5. Enabling Infinite Retention for Upsert Tables in Apache Pinot

How can you maintain endless data history without sacrificing performance in Apache Pinot?

in this blog post, the author delves into recent enhancements that enable deletions and infinite retention for upsert tables, as implemented by Uber. Discover how these innovations optimize memory and disk usage, ensuring scalable and reliable analytics for massive datasets.

https://www.uber.com/en-IN/blog/enabling-infinite-retention-for-upsert-tables/

All rights reserved Den Digital, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employer” opinions.

dat-a-man — Data with Aman

Discussion about this post

Ready for more?