Heritrix is the Internet Archive's open-source, extensible, archival-quality web crawler designed for large-scale web preservation and data collection.
Nodriver is the successor to undetected-chromedriver, providing fast CDP-based browser automation with built-in anti-detection and no Selenium dependency.
Apache Hudi is an open data lakehouse platform that enables efficient record-level upserts, incremental processing, and ACID transactions on data lakes.
BullMQ is an open-source Redis-based message queue trusted by thousands of companies processing billions of jobs daily across Node.js, Python, and more.