Data discovery and metadata engine for finding tables, dashboards, streams, and other data assets
- Stars4.8k
- Forks967
- Open Issues58
Apache-2.0
- Python
- TypeScript
- SCSS

About Amundsen
Amundsen is a data discovery and metadata engine that helps data analysts, scientists, and engineers find and understand data. It indexes resources such as tables, dashboards, and streams, then ranks search results by usage patterns so that frequently queried assets appear earlier.
It is built from microservices: a Flask app with a React frontend, a search service backed by Elasticsearch, and a metadata service that persists to Neo4j, Apache Atlas, or a relational database. Metadata is loaded through a Python script or an Airflow DAG, and supported entities include tables, dashboards, ML features, and people.
Amundsen is hosted by the LF AI & Data Foundation. Installation docs cover a quick start that bootstraps a default deployment with sample data, and connectors are available for databases, dashboard tools such as Superset, Tableau, and Redash, and Airflow for orchestration.
Key features
- Ranks data search results by usage frequency
- Indexes tables, dashboards, ML features, and people
- Metadata store on Neo4j, Apache Atlas, or a relational database
- Loads metadata via a Python script or Airflow DAG
- Connectors for many databases and dashboard tools
Details
- First released
- 2019
- Language
- Python, TypeScript
- Search backend
- Elasticsearch
- Metadata store
- Neo4j, Apache Atlas, or relational
- Self-hosted
- Yes
- License
- Apache-2.0
