Data Engineering Newsletter #33

Data Engineering News

Jul 14, 2025

1. Driving content delivery efficiency through classifying cache misses

Optimizing content delivery by tracking and categorizing cache misses

Netflix tries to stream from the closest server for speed and quality. But sometimes it misses: that’s a cache miss. When that happens, Netflix uses real-time data, health checks, and predictions to quickly reroute your stream. That’s how it keeps things running smoothly.

https://netflixtechblog.com/driving-content-delivery-efficiency-through-classifying-cache-misses-ffcf08026b6c

2.Querying Apache Iceberg with Sub-Second performance

Apache Iceberg wasn’t built for low latency, but Firebolt makes it possible. With smart metadata caching, Parquet optimizations, and subresult reuse, interactive queries run in under a second, without moving data. For anyone working with large datasets, this changes what’s possible.

https://www.firebolt.io/blog/querying-apache-iceberg-with-sub-second-performance

3. All about Iceberg partitioning and partitioning writing strategies

Optimizing data layout with Iceberg partitioning and write strategies

Apache Iceberg brings precision to partitioning with rich metadata that enables manifest pruning and file skipping, dramatically improving query performance. PartitionSpec and manifest lists let query engines bypass irrelevant data files with ease, avoiding costly scans.The blog explores four file writing strategies tailored for different ingestion needs, from high-throughput, unordered streaming to memory-efficient, sorted batch writes. Each approach balances schema evolution, memory usage, and write patterns.

https://olake.io/iceberg/iceberg-partitioning-and-writing-strategies

4. OLake + Glue + Snowflake - A deep dive into modern data partitioning

Building an open, efficient lakehouse with OLake, Glue & Snowflake

Modern data ingestion doesn’t have to be fragile, expensive, or vendor-locked. With OLake, Apache Iceberg, AWS Glue, and Snowflake, you get a fully open, scalable, and efficient pipeline from databases to your lakehouse, no copy-paste ETL nightmares

https://olake.io/iceberg/olake-glue-snowflake

5. How Unity Catalog managed tables automate performance at scale

Smarter tables, less work: Unity Catalog’s secret to hands-free optimization.

Unity Catalog managed tables bring AI-driven optimizations, like auto-clustering, smart vacuuming, and metadata caching, that reduce storage costs by 50%+ and boost query speeds by up to 20x. No tuning, no scripts, just better performance at scale.

https://www.databricks.com/blog/how-unity-catalog-managed-tables-automate-performance-scale

Note: I have provided links for informational purposes and do not suggest endorsement to you. All views expressed in this newsletter are my own and do not represent current, former, or future employer” opinions.

Do you have a project or idea?

Feel free to drop me a line. If it’s interesting, let’s chat. If it’s weird, even better.

💌 amanguptanalytics@gmail.com

dat-a-man — Data with Aman

Discussion about this post

Ready for more?