Apache Atlas logo

Apache Atlas

Metadata and data governance framework for Hadoop with lineage, audit, and RBAC and ABAC security

Repository activity
  • Stars2.1k
  • Forks911
  • Open Issues139
License

Apache-2.0

Languages
  • Java
  • TypeScript
  • JavaScript
Apache Atlas screenshot

About Apache Atlas

Apache Atlas is an extensible set of core governance services for Hadoop and the wider enterprise data ecosystem. It gives organizations a common metadata store so any metadata consumer can work together without point-to-point interfaces, and it helps teams meet compliance requirements.

It provides visibility into data through prescriptive and forensic models, technical and operational audit, and lineage enriched with business taxonomical metadata. Security is both role based and attribute based, and Apache Ranger is used to prevent non-authorized access paths to data at runtime.

Atlas is an Apache project that builds with Java and runs on a self-hosted server, with documented Docker build and run instructions. The distribution ships server and hook tarballs for HBase, Hive, Impala, Kafka, Sqoop, Storm, Falcon, and Couchbase to capture metadata from those systems.

Key features

  • Prescriptive and forensic views of data
  • Technical and operational audit
  • Lineage enriched with business taxonomical metadata
  • Role based and attribute based access control
  • Hooks for Hive, HBase, Impala, Kafka, Sqoop, and more

Details

On GitHub since
2017
Language
Java, TypeScript, JavaScript
Builds with
Java 8, 11, or 17
Security
RBAC and ABAC, Apache Ranger
Self-hosted
Yes, Docker build available
License
Apache-2.0