Best Data Pipelines tools by public signals

These picks are computed from scored public evidence. Use the openness column in the ranking to separate OSI-approved, source-available, open-core, proprietary, and unverified-license tools.

Use Case Rankings

Ordered by ToolVitals score, health, shipping, confidence, and then adoption as a tie-breaker.

# Tool Health Shipping Openness Stars Score Status
01 Dagster
An orchestration platform for the development, production, and observation of data assets.
100 100 OSI-approved OSS 15.7k 100 Active
02 dbt
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
98 95 OSI-approved OSS 13k 98 Active
03 CocoIndex
Incremental engine for long horizon agents 🌟 Star if you like it!
95 100 OSI-approved OSS 10.3k 98 Active
04 Benthos
Data streaming processor with yaml-driven pipeline configuration
95 100 License unknown 8.7k 98 Active
05 MLRun
MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.
93 100 OSI-approved OSS 1.7k 97 Active
06 Sail
Drop-in Apache Spark replacement written in Rust for batch and streaming workloads.
93 100 OSI-approved OSS 2.9k 97 Active
07 RudderStack
Privacy and Security focused Segment-alternative, in Golang and React
96 93 License unknown 4.4k 96 Active
08 Bruin
Data pipeline platform with SQL, Python, and quality checks.
91 100 OSI-approved OSS 1.6k 96 Active
09 Apache Airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
93 95 OSI-approved OSS 45.8k 95 Active
10 CloudQuery
Data pipelines for cloud config and security data. Build cloud asset inventory, CSPM, FinOps, and vulnerability management solutions. Extract from AWS, Azure, GCP, and 70+ cloud and SaaS sources.
92 95 OSI-approved OSS 6.4k 95 Active
11 HPCC Systems
Open-source distributed data processing and analytics platform for large-scale data workflows.
91 100 OSI-approved OSS 609 94 Active
12 Jitsu
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
88 95 OSI-approved OSS 4.8k 93 Active
13 Open Wearables
Self-hosted platform to unify wearable health data through one AI-ready API.
91 89 OSI-approved OSS 1.9k 92 Active
14 Pixeltable
Data Infrastructure providing a declarative, incremental approach for multimodal AI workloads.
83 98 OSI-approved OSS 1.6k 92 Active
15 Altimate Code
Open-source agentic data engineering harness for dbt, SQL, and cloud warehouses.
88 92 OSI-approved OSS 670 91 Active
16 Pathway
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
87 87 License unknown 63k 90 Active
17 Apache DevLake
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
87 63 OSI-approved OSS 3k 83 Active
18 Meltano
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
77 76 OSI-approved OSS 2.5k 83 Active
19 dbmazz
CDC, radically simplified. One Rust binary for source-to-target data movement.
84 64 License unknown 11 81 Active
20 Airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
83 69 License unknown 21.4k 81 Active
21 Multiwoven
Open-source reverse ETL and customer data sync platform.
81 55 OSI-approved OSS 1.7k 76 Active
22 AtroCore
Open-source data management and system integration platform.
72 69 OSI-approved OSS 222 75 Warning
23 Arroyo
Distributed stream processing engine in Rust
65 48 OSI-approved OSS 4.9k 67 Warning
24 qData
Open-source all-in-one data middle platform.
63 56 License unknown 472 66 Warning
25 Anyquery
SQL query engine for querying apps, APIs, files, and SaaS data sources.
71 28 OSI-approved OSS 1.7k 64 Warning
26 Mage
🧙 Build, run, and manage data pipelines for integrating and transforming data.
67 21 OSI-approved OSS 8.8k 61 Warning
27 HRFlow
🚀 Open source python repository of HR data Connectors.
56 30 OSI-approved OSS 39 59 Warning
28 DocETL
Agentic LLM-powered data processing and ETL.
67 21 OSI-approved OSS 3.8k 58 Warning
29 SyncMaven
Open-source reverse-ETL and data activation platform.
36 0 License unknown 22 38 Critical
30 Automate DV
dbt package for creating and loading Data Vault 2.0 compliant data warehouses.
33 0 OSI-approved OSS 588 36 Critical
31 cptn.io
Open-source platform for building and deploying integrations and data pipelines.
33 0 OSI-approved OSS 493 35 Critical
32 Neosync
Open Source Data Security Platform for Developers to Monitor and Detect PII, Anonymize Production Data and Sync it across environments.
27 0 OSI-approved OSS 4.2k 34 Critical
33 Documind
Open-source platform for extracting structured data from documents using AI.
31 0 OSI-approved OSS 1.5k 33 Critical
34 Seldon
Deployment & monitoring for machine learning at scale
31 0 License unknown 4.8k 33 Critical
35 Orchest
No-code data pipelines builder and orchestration platform.
31 0 OSI-approved OSS 4.1k 31 Critical
36 Ploomber
The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️
18 0 OSI-approved OSS 3.6k 31 Critical
37 Grouparoo
Data synchronization framework
28 0 OSI-approved OSS 775 29 Critical

Evidence Watch

Tracked tools with useful public signals but no verdict score yet.