Data & Analytics
Data Pipelines
Data integration, ETL, ELT, ingestion, and transformation tools.
Best Data Pipelines tools by public signals
These picks are computed from scored public evidence. Use the openness column in the ranking to separate OSI-approved, source-available, open-core, proprietary, and unverified-license tools.
Use Case Rankings
Ordered by ToolVitals score, health, shipping, confidence, and then adoption as a tie-breaker.
| # | Tool | Health | Shipping | Openness | Stars | Score | Status |
|---|---|---|---|---|---|---|---|
| 01 | Dagster An orchestration platform for the development, production, and observation of data assets. | 100 | 100 | OSI-approved OSS | 15.7k | 100 | Active |
| 02 | dbt dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications. | 98 | 95 | OSI-approved OSS | 13k | 98 | Active |
| 03 | CocoIndex Incremental engine for long horizon agents 🌟 Star if you like it! | 95 | 100 | OSI-approved OSS | 10.3k | 98 | Active |
| 04 | Benthos Data streaming processor with yaml-driven pipeline configuration | 95 | 100 | License unknown | 8.7k | 98 | Active |
| 05 | MLRun MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications. | 93 | 100 | OSI-approved OSS | 1.7k | 97 | Active |
| 06 | Sail Drop-in Apache Spark replacement written in Rust for batch and streaming workloads. | 93 | 100 | OSI-approved OSS | 2.9k | 97 | Active |
| 07 | RudderStack Privacy and Security focused Segment-alternative, in Golang and React | 96 | 93 | License unknown | 4.4k | 96 | Active |
| 08 | Bruin Data pipeline platform with SQL, Python, and quality checks. | 91 | 100 | OSI-approved OSS | 1.6k | 96 | Active |
| 09 | Apache Airflow Apache Airflow - A platform to programmatically author, schedule, and monitor workflows | 93 | 95 | OSI-approved OSS | 45.8k | 95 | Active |
| 10 | CloudQuery Data pipelines for cloud config and security data. Build cloud asset inventory, CSPM, FinOps, and vulnerability management solutions. Extract from AWS, Azure, GCP, and 70+ cloud and SaaS sources. | 92 | 95 | OSI-approved OSS | 6.4k | 95 | Active |
| 11 | HPCC Systems Open-source distributed data processing and analytics platform for large-scale data workflows. | 91 | 100 | OSI-approved OSS | 609 | 94 | Active |
| 12 | Jitsu Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days | 88 | 95 | OSI-approved OSS | 4.8k | 93 | Active |
| 13 | Open Wearables Self-hosted platform to unify wearable health data through one AI-ready API. | 91 | 89 | OSI-approved OSS | 1.9k | 92 | Active |
| 14 | Pixeltable Data Infrastructure providing a declarative, incremental approach for multimodal AI workloads. | 83 | 98 | OSI-approved OSS | 1.6k | 92 | Active |
| 15 | Altimate Code Open-source agentic data engineering harness for dbt, SQL, and cloud warehouses. | 88 | 92 | OSI-approved OSS | 670 | 91 | Active |
| 16 | Pathway Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG. | 87 | 87 | License unknown | 63k | 90 | Active |
| 17 | Apache DevLake Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth. | 87 | 63 | OSI-approved OSS | 3k | 83 | Active |
| 18 | Meltano Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations. | 77 | 76 | OSI-approved OSS | 2.5k | 83 | Active |
| 19 | dbmazz CDC, radically simplified. One Rust binary for source-to-target data movement. | 84 | 64 | License unknown | 11 | 81 | Active |
| 20 | Airbyte The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted. | 83 | 69 | License unknown | 21.4k | 81 | Active |
| 21 | Multiwoven Open-source reverse ETL and customer data sync platform. | 81 | 55 | OSI-approved OSS | 1.7k | 76 | Active |
| 22 | AtroCore Open-source data management and system integration platform. | 72 | 69 | OSI-approved OSS | 222 | 75 | Warning |
| 23 | Arroyo Distributed stream processing engine in Rust | 65 | 48 | OSI-approved OSS | 4.9k | 67 | Warning |
| 24 | qData Open-source all-in-one data middle platform. | 63 | 56 | License unknown | 472 | 66 | Warning |
| 25 | Anyquery SQL query engine for querying apps, APIs, files, and SaaS data sources. | 71 | 28 | OSI-approved OSS | 1.7k | 64 | Warning |
| 26 | Mage 🧙 Build, run, and manage data pipelines for integrating and transforming data. | 67 | 21 | OSI-approved OSS | 8.8k | 61 | Warning |
| 27 | HRFlow 🚀 Open source python repository of HR data Connectors. | 56 | 30 | OSI-approved OSS | 39 | 59 | Warning |
| 28 | DocETL Agentic LLM-powered data processing and ETL. | 67 | 21 | OSI-approved OSS | 3.8k | 58 | Warning |
| 29 | SyncMaven Open-source reverse-ETL and data activation platform. | 36 | 0 | License unknown | 22 | 38 | Critical |
| 30 | Automate DV dbt package for creating and loading Data Vault 2.0 compliant data warehouses. | 33 | 0 | OSI-approved OSS | 588 | 36 | Critical |
| 31 | cptn.io Open-source platform for building and deploying integrations and data pipelines. | 33 | 0 | OSI-approved OSS | 493 | 35 | Critical |
| 32 | Neosync Open Source Data Security Platform for Developers to Monitor and Detect PII, Anonymize Production Data and Sync it across environments. | 27 | 0 | OSI-approved OSS | 4.2k | 34 | Critical |
| 33 | Documind Open-source platform for extracting structured data from documents using AI. | 31 | 0 | OSI-approved OSS | 1.5k | 33 | Critical |
| 34 | Seldon Deployment & monitoring for machine learning at scale | 31 | 0 | License unknown | 4.8k | 33 | Critical |
| 35 | Orchest No-code data pipelines builder and orchestration platform. | 31 | 0 | OSI-approved OSS | 4.1k | 31 | Critical |
| 36 | Ploomber The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️ | 18 | 0 | OSI-approved OSS | 3.6k | 31 | Critical |
| 37 | Grouparoo Data synchronization framework | 28 | 0 | OSI-approved OSS | 775 | 29 | Critical |
Evidence Watch
Tracked tools with useful public signals but no verdict score yet.