Favicon of Apache Hudi

Apache Hudi

Apache Hudi is an open data lakehouse platform that enables efficient record-level upserts, incremental processing, and ACID transactions on data lakes.

Screenshot of Apache Hudi website

Apache Hudi is a powerful, open-source lakehouse platform that reimagines batch processing with efficient incremental data pipelines.

Key Features:

  • Record-Level Upserts — Quickly update and delete individual records with fast, pluggable indexing.
  • Incremental Processing — Replace batch pipelines with streaming ingestion for 10x faster data processing.
  • ACID Transactions — Guarantee atomic writes with snapshot isolation tailored for lake-scale operations.
  • Time Travel — Query historical data, audit changes, and roll back to previous table versions.
  • Multi-Engine Integration — Works with Spark, Flink, Presto, Trino, Hive, and dbt orchestration.
  • Streaming Ingestion — Ingest from Kafka, Pulsar, and CDC sources with built-in deduplication.

Whether you're building real-time analytics, managing CDC pipelines, or modernizing batch ETL, Apache Hudi delivers efficient lakehouse storage with minute-level freshness.

Share:

  • Stars

    6.1K
  • Forks

    0
  • Language

    Java
View Repository

Similar to Apache Hudi

Favicon

 

  
  
Favicon

 

  
  
Favicon