Home Knowledge Base Apache Iceberg

Apache Iceberg is the open table format for huge analytical datasets that provides ACID transactions, time travel, and schema evolution on top of object storage — originally created at Netflix to solve the reliability and performance problems of Hive Metastore partitioning at petabyte scale, now the engine-agnostic standard for data lakehouse table formats.

What Is Apache Iceberg?

Why Iceberg Matters for AI/ML

Core Iceberg Features

Snapshot-Based Architecture:

Time Travel: -- Query historical data SELECT FROM orders FOR SYSTEM_TIME AS OF TIMESTAMP '2024-01-01 00:00:00'; SELECT FROM orders FOR SYSTEM_VERSION AS OF 5234567890;

-- Rollback table to previous snapshot CALL catalog.system.rollback_to_snapshot('db.orders', 5234567890);

Partition Evolution: -- Change partitioning strategy without rewriting data ALTER TABLE orders REPLACE PARTITION FIELD year(order_date) WITH month(order_date);

Metadata Pruning:

Iceberg vs Alternatives

FormatEngine AgnosticMulti-WriterRow DeletesBest For
IcebergYesYes (v2)Yes (v2)Multi-engine, open standard
Delta LakePartialYesYesDatabricks/Spark focus
HudiPartialYesYesStreaming upserts
HiveNoNoNoLegacy only

Apache Iceberg is the open standard for analytical table formats that liberates data from single-engine lock-in — by defining a precise, engine-agnostic specification for storing metadata and data files, Iceberg enables any compute engine to reliably read, write, and time-travel on the same petabyte-scale tables with ACID guarantees.

icebergtable formatnetflix

Explore 500+ Semiconductor & AI Topics

From EUV lithography to CUDA optimization — search the full knowledge base or chat with our AI assistant.