Favicon of Frontera

Frontera

Frontera is a Python crawl frontier framework for managing when and what to crawl, enabling web crawlers of any scale with Scrapy integration.

Screenshot of Frontera website

Frontera is a scalable, modular crawl frontier framework that manages crawl scheduling, prioritization, and goal tracking for web crawlers of any size.

Key Features:

  • Crawl Frontier — Manages URL prioritization, scheduling, and deduplication at scale
  • Scrapy Integration — Drop-in components for distributing Scrapy crawlers
  • Distributed Workers — Scale crawling across multiple workers with message bus backends
  • Pluggable Backend — Support for SQLAlchemy, HBase, and custom storage backends
  • Goal Tracking — Monitor crawl progress and detect when objectives are accomplished

Whether you're orchestrating large-scale crawls, managing URL frontiers for distributed scrapers, or building custom crawling infrastructure, Frontera provides the scheduling framework to crawl smarter at any scale.

Share:

  • Stars

    1.3K
  • Forks

    217
  • Last commit

    10 months ago
  • License

    BSD-3-Clause
  • Language

    Python
View Repository

Similar to Frontera

Favicon

 

  
  
Favicon

 

  
  
Favicon