Favicon of Scrapy Cluster

Scrapy Cluster

Scrapy Cluster uses Redis and Kafka to create a distributed, on-demand Scrapy crawling cluster for coordinated large-scale web scraping.

Screenshot of Scrapy Cluster website

Scrapy Cluster is a distributed, on-demand web scraping system that combines Scrapy with Redis and Kafka for coordinated, large-scale crawling operations.

Key Features:

  • Kafka Integration — API-driven crawl requests via Kafka for real-time, on-demand scraping
  • Redis Coordination — Distributed request management and deduplication across nodes
  • Horizontal Scaling — Add crawler nodes dynamically to increase throughput
  • Kafka Monitor — REST API for submitting and managing crawl jobs programmatically
  • Modular Design — Pluggable components for customizing crawl behavior and data flow

Whether you're building on-demand scraping services, scaling data collection infrastructure, or coordinating crawlers across a cluster, Scrapy Cluster provides a proven architecture for distributed Scrapy deployments.

Share:

  • Stars

    1.2K
  • Forks

    322
  • Last commit

    2 years ago
  • License

    MIT
  • Language

    Python
View Repository

Similar to Scrapy Cluster

Favicon

 

  
  
Favicon

 

  
  
Favicon