
Apache Nutch is a highly extensible, production-ready web crawler that leverages Apache Hadoop for scalable, distributed batch crawling operations.
Key Features:
Whether you're building search engines, creating web archives, or running large-scale data acquisition, Apache Nutch provides a proven, extensible crawler for distributed environments.
Stars
3.1KForks
1.3KLast commit
2 months agoLicense
Apache-2.0Language
Java