webscraping.app
Browse
About Us
Submit
Sign In
Latest tools
Categories
Tags
Submit
About Us
/
Tools
Web Scraping Tools
Discover the best tools and software on webscraping.app.
Popular Categories:
Browser Automation
14
Scraping Frameworks
11
ETL Tools
9
Analytics Databases
9
SERP APIs
9
Workflow Orchestration
9
AI Web Scraping
8
Scraping APIs
8
Proxy Services
6
Distributed Crawling
6
Search Engines
6
Cloud Compute
6
Order by
Nutch
Apache Nutch is an extensible and scalable web crawler
Distributed Crawling
Heritrix
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
Distributed Crawling
Nodriver
Successor to undetected-chromedriver. CDP-based browser automation with built-in anti-detection. No Selenium dependency.
Anti-Detection
Browser Automation
parse5
HTML5 spec compliant parser
HTML Parsers
Algolia
Managed search, great for e-commerce
Search Engines
Inngest
The leading workflow orchestration platform. Run stateful step functions and AI workflows on serverless, servers, or the edge.
Task Queues
htmlparser2
Low-level, fast, forgiving HTML parser
HTML Parsers
dlt (data load tool)
Lightweight Python library for data loading. Auto schema inference, 5000+ sources supported.
Data Transformation
ETL Tools
Mechanicalsoup
A Python library for automating interaction with websites.
Scraping Frameworks
Camoufox
Anti-detect Firefox browser with fingerprint injection & anti-bot evasion. Playwright-compatible.
Anti-Detection
Browser Automation
Scrapy Redis
Redis-based components for Scrapy.
Distributed Crawling
Huey
Tiny but feature-rich, Redis or SQLite
Task Queues
Apache Hudi
Lakehouse platform for upserts and incremental processing. Efficient record-level updates.
Analytics Databases
Data Lakehouse
LLM-Scraper
TypeScript library for structured data extraction
AI Web Scraping
Scraping Frameworks
Rod
DevTools Protocol, handles JavaScript
Browser Automation
DynamoDB
AWS managed, auto-scaling NoSQL
NoSQL Databases
BullMQ
BullMQ - Message Queue and Batch processing for NodeJS, Python, Elixir and PHP based on Redis
Task Queues
Apache Iceberg
Open table format for huge analytics tables. Multi-engine (Spark, Trino, Flink). Used by Netflix, Airbnb.
Analytics Databases
Data Lakehouse
Delta Lake
Lakehouse storage framework. ACID transactions, schema evolution, time travel. Databricks standard.
Analytics Databases
Data Lakehouse
LanceDB
Serverless vector database in Rust. Embedded or cloud, no server needed. Native DataFrame integration.
Vector Databases
Mage AI
Hybrid Python/SQL/R in same pipeline
Workflow Orchestration
Scrapling
Adaptive scraping that learns from website changes
Scraping Frameworks
Cassandra
High availability, peer-to-peer architecture
NoSQL Databases
WebdriverIO
Node.js automation framework with extensive plugin ecosystem
Browser Automation
Headless Browsers
TestCafe
WebDriver-free browser testing, no plugins needed
Browser Automation
RQ (Redis Queue)
Simple, lightweight, Redis-based
Task Queues
dbt
Transform data in your warehouse using SQL. Version control, testing, documentation for data models.
Data Transformation
ETL Tools
Great Expectations
Data quality testing with 'Expectations'. Validate scraped data, auto-generate docs, CI/CD integration.
Data Transformation
StarRocks
Faster joins, easier maintenance
Analytics Databases
Manticore Search
2.8x faster for big data, 10x faster log analytics
Search Engines
OpenSearch
AWS-backed fork, Apache 2.0 license
Search Engines
Browserless
Deploy headless browsers in Docker. Run on our cloud or bring your own. Free for non-commercial uses.
Headless Browsers
Scraping APIs
KeyDB
Multithreaded, 1M+ ops/sec per node
Caching Databases
Chromedp
Headless Chrome automation in Go
Browser Automation
Apache Druid
Real-time OLAP database for event-driven data. Sub-second queries, streaming ingestion.
Analytics Databases
Memcached
Lightweight in-memory caching, multi-threaded
Caching Databases
Prev
Page 2 of 4
Page:
1
2
3
4
Next