webscraping.app
Browse
About Us
Submit
Sign In
Latest tools
Categories
Tags
Submit
About Us
/
Categories
/
Data Processing
/
Distributed Crawling
Distributed Crawling
A curated collection of the best distributed web crawling systems for large-scale data collection across multiple nodes.
Popular Categories:
Browser Automation
14
Scraping Frameworks
11
ETL Tools
9
Analytics Databases
9
SERP APIs
9
Workflow Orchestration
9
AI Web Scraping
8
Scraping APIs
8
Proxy Services
6
Distributed Crawling
6
Search Engines
6
Cloud Compute
6
Distributed Crawling – webscraping.app
Order by
Scrapy Cluster
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
Distributed Crawling
Frontera
A scalable frontier for web crawlers
Distributed Crawling
Nutch
Apache Nutch is an extensible and scalable web crawler
Distributed Crawling
Heritrix
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Distributed Crawling
Scrapy Redis
Redis-based components for Scrapy.
Distributed Crawling
Scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Distributed Crawling
Scraping Frameworks