Open table format that lets Spark, Trino, and Flink read the same huge analytic tables at once
Apache-2.0
- Java
- Scala
- Python

About Apache Iceberg
Apache Iceberg is an open table format for very large analytic datasets. It gives big data the reliability and behavior of SQL tables, so engines like Spark, Trino, Flink, Presto, Hive, and Impala can safely query and write the same tables at the same time.
Iceberg handles tables backed by Parquet, Avro, and ORC, reads Parquet into Arrow memory, and works with the Hive metastore. Engine connectors plug Iceberg into Spark, Flink, and Hive, so existing pipelines can adopt it without changing their underlying file storage.
The table format is stable and gains new capabilities with each release. This project is the Java implementation, and separate clients cover Go, Python, Rust, and C++ for teams working outside the JVM.
Key features
- Same tables read and written concurrently by Spark, Trino, Flink, Presto, Hive, and Impala
- Tables backed by Parquet, Avro, and ORC files
- Reads Parquet data into Arrow memory
- Hive metastore integration for table metadata
- Engine connectors for Spark, Flink, and Hive
Details
- First released
- 2018
- Type
- Open table format
- Language
- Java
- Storage
- Parquet · Avro · ORC
- Compatibility
- Spark · Trino · Flink · Presto · Hive
- Governance
- Apache Software Foundation
