Apache Iceberg logo

Apache Iceberg

Open table format that lets Spark, Trino, and Flink read the same huge analytic tables at once

Repository activity
  • Stars9k
  • Forks3.3k
  • Open Issues755
apache-iceberg health score - Linux Foundation Insights
License

Apache-2.0

Languages
  • Java
  • Scala
  • Python
Apache Iceberg screenshot

About Apache Iceberg

Apache Iceberg is an open table format for very large analytic datasets. It gives big data the reliability and behavior of SQL tables, so engines like Spark, Trino, Flink, Presto, Hive, and Impala can safely query and write the same tables at the same time.

Iceberg handles tables backed by Parquet, Avro, and ORC, reads Parquet into Arrow memory, and works with the Hive metastore. Engine connectors plug Iceberg into Spark, Flink, and Hive, so existing pipelines can adopt it without changing their underlying file storage.

The table format is stable and gains new capabilities with each release. This project is the Java implementation, and separate clients cover Go, Python, Rust, and C++ for teams working outside the JVM.

Key features

  • Same tables read and written concurrently by Spark, Trino, Flink, Presto, Hive, and Impala
  • Tables backed by Parquet, Avro, and ORC files
  • Reads Parquet data into Arrow memory
  • Hive metastore integration for table metadata
  • Engine connectors for Spark, Flink, and Hive

Details

First released
2018
Type
Open table format
Language
Java
Storage
Parquet · Avro · ORC
Compatibility
Spark · Trino · Flink · Presto · Hive
Governance
Apache Software Foundation