Open Source ETL Tools
Moving data between systems looks like glue code until a source changes its schema or a sync silently drops half the rows, and then the pipeline is the most fragile thing you own - invisible until something downstream is wrong and impossible to trust without seeing how it actually moves. The open source options here run the connectors and the extract-load engine on your own infrastructure, so the data never detours through a vendor's cloud and you're not metered by rows or connectors as the volume grows.

Apache Airflow
Programmatically author, schedule, and monitor workflows as code

PostHog
Open source product analytics platform with session replay, feature flags, experiments, surveys, and data pipelines

Kestra
Open-source declarative orchestration for scheduled and event-driven data, AI, and infrastructure workflows

Prefect
Workflow orchestration framework for resilient Python data pipelines with scheduling, retries, and event automations

Vector
Observability data pipeline that collects, transforms, and routes logs and metrics as an agent or aggregator

Airbyte
Open-source ELT and data movement for moving data from APIs, databases, and files to warehouses, lakes, and AI apps

Dagster
Cloud-native data pipeline orchestrator with declarative assets, lineage, and observability

Logstash
Server-side data processing pipeline for ingesting, transforming, and forwarding logs and events

Mage
Self-hosted data pipeline development environment with visual notebooks and local ETL workflows