webscraping.app
Browse
Pricing
About Us
Submit
Sign In
Latest tools
Categories
Tags
Pricing
Submit
About Us
/
Categories
/
Data Processing
/
Distributed Crawling
Distributed Crawling
A curated collection of the best distributed web crawling systems for large-scale data collection across multiple nodes.
Popular Categories:
Browser Automation
14
Scraping Frameworks
11
Analytics Databases
9
SERP APIs
9
ETL Tools
9
Workflow Orchestration
9
AI Web Scraping
8
Scraping APIs
8
Distributed Crawling
6
Cloud Compute
6
Proxy Services
6
Search Engines
6
Order by
Scrapy Cluster
Distributed on-demand scraping with Scrapy
Distributed Crawling
Scrapy Cluster uses Redis and Kafka to create a distributed, on-demand Scrapy crawling cluster for coordinated large-scale web scraping.
Frontera
Scalable crawl frontier framework
Distributed Crawling
Frontera is a Python crawl frontier framework for managing when and what to crawl, enabling web crawlers of any scale with Scrapy integration.
Nutch
Highly extensible and scalable web crawler
Distributed Crawling
Apache Nutch is a highly extensible, production-ready web crawler built on Hadoop for large-scale batch crawling and data acquisition tasks.
Heritrix
Web-scale archival web crawler
Distributed Crawling
Heritrix is the Internet Archive's open-source, extensible, archival-quality web crawler designed for large-scale web preservation and data collection.
Scrapy Redis
Redis-based distributed components for Scrapy
Distributed Crawling
Scrapy-Redis provides Redis-backed components for Scrapy, enabling distributed crawling with shared request queues and item pipelines.
Scrapy
Fast, high-level web crawling framework for Python
Distributed Crawling
Scraping Frameworks
Scrapy is an open-source Python framework for building fast, scalable web crawlers that extract structured data from websites efficiently.