noodls browser compatibility check

The security settings of your browser are blocking the execution of scripts.

To use noodls, javascript support must be enabled. Please change your browser's security settings to enable javascript.

If you have changed your browser's security settings, you can click here.

related announcements

News

NPS - National Park Service

Obed Announces Rock Climbing and Paddling Program for Sixth Grade[...]
Jones Lang LaSalle Inc.

Premier shopping center near Las Vegas Strip trades for $46.35M
Altice USA Inc.

News 12 Receives Market Leading 102 New York Emmy Award® Nominations

Science and Technology

Databricks Inc.

06/02/2025 | News release | Distributed by Public on 06/03/2025 18:00

Apache Iceberg v3: Moving the Ecosystem Towards Unification

Apache Iceberg v3, now approved by the Apache Iceberg^™ community, introduces advanced new features and data types. Iceberg v3 includes major improvements such as deletion vectors, row lineage, and new types for semi-structured data and geospatial use cases. These features allow customers to efficiently process and query data. Additionally, these improvements are consistent across Delta Lake, Apache Parquet, and Apache Spark^™, so customers can interoperate between Delta and Apache Iceberg^™ without rewriting data or row-level delete files.

In this blog post, we cover the newest developments in Iceberg v3:

Deletion Vectors
Row Lineage
Semi-Structured Data and Geospatial Types
Interoperability across Delta Lake, Apache Parquet, and Apache Spark

Deletion Vectors

Iceberg v3 introduces a new format for row-level deletes to improve read performance: deletion vectors. Row-level deletes significantly reduce write amplification by optimizing how deleted rows are stored and tracked - leading to faster ETL and ingestion. In Iceberg v2, engines were not required to compact delete files together during writes. The intent was for customers to use asynchronous maintenance. However, many customers did not schedule maintenance services, so their tables had too many unmaintained delete files. That led to slow read performance when engines had to merge many row-level delete files on read.

Iceberg v3 introduces a new deletion vector format and new compaction requirements for delete files. This new format avoids translation between Parquet files and in-memory representations used to apply the deletes. Additionally, engines must maintain a single deletion vector per file at write time. This requirement improves performance and statistics on data files. This also makes it easy to compare previous and current deletes, which simplifies processing a table's row-level changes as a stream.

Row Lineage

Another major Iceberg v3 feature is row lineage, used to simplify incremental processing. With row lineage, engines find row-level changes by matching versions of rows across commits.

Iceberg v3 introduces row lineage using row-level metadata: a row ID and the sequence number when the row was last modified or added. The IDs identify the same row across versions. Sequence numbers annotate when rows were last changed - not just relocated between files. This allows engines to process changes selectively, simplifying downstream updates with faster and cheaper workflows.

Row ID information is especially beneficial when combined with incremental processing objects like materialized views. These objects are optimized to compute only new or changed data since the last processing cycle.

Databricks Inc. published this content on June 02, 2025, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on June 04, 2025 at 00:00 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at support@pubt.io

Smartlinks | Databricks Inc. | News | Private Companies | Software Companies

Back

View original format