webscraping.app
Browse
About Us
Submit
Sign In
Latest tools
Categories
Tags
Submit
About Us
/
Tags
/
Open Source
Best Open Source Web Scraping Tools
Browse Open Source web scraping tools. Compare features, limitations, and value to find the right fit for your budget.
Popular Categories:
Browser Automation
14
Scraping Frameworks
11
ETL Tools
9
Analytics Databases
9
SERP APIs
9
Workflow Orchestration
9
AI Web Scraping
8
Scraping APIs
8
Proxy Services
6
Distributed Crawling
6
Search Engines
6
Cloud Compute
6
Order by
Chroma
AI-native embedding database. Simple API for storing and querying embeddings. Popular choice for LangChain/LlamaIndex.
Vector Databases
Kestra
Event Driven Orchestration & Scheduling Platform for Mission Critical Applications
Workflow Orchestration
Pydantic
Data validation using Python type hints. Rust-powered core for speed. Define schemas for scraped data.
Data Transformation
Celery
Distributed Task Queue (development branch)
Task Queues
MongoDB
The MongoDB Database
NoSQL Databases
Qdrant
High-performance vector search engine in Rust. Built for production RAG and semantic search at scale.
Vector Databases
Cheerio
The fast, flexible, and elegant library for parsing and manipulating HTML and XML.
HTML Parsers
Selenium
A browser automation framework and ecosystem.
Anti-Detection
Browser Automation
DuckDB
Embedded OLAP, SQLite-like simplicity
Analytics Databases
Polars
Lightning-fast DataFrame library in Rust. 10-100x faster than pandas. Lazy evaluation, out-of-core processing.
Data Transformation
Milvus
Open-source vector database for AI. Handles billion-scale embeddings with GPU acceleration. LF AI & Data Foundation project.
Vector Databases
Apache Airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
ETL Tools
Workflow Orchestration
ClickHouse
ClickHouse® is a real-time analytics database management system
Analytics Databases
rclone
rsync for cloud storage. Sync, copy, mount 70+ storage providers. Essential for moving scraped data.
Object Storage
Meilisearch
Lightning-fast open-source search engine
Search Engines
Crawl4AI
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
AI Web Scraping
Scraping Frameworks
Scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Distributed Crawling
Scraping Frameworks
MinIO
MinIO is a high-performance, S3 compatible object store, open sourced under GNU AGPLv3 license.
Object Storage
Redis
For developers, who are building real-time data-driven applications, Redis is the preferred, fastest, and most feature-rich cache, data structure server, and document and vector query engine.
Caching Databases
Elasticsearch
Free and Open Source, Distributed, RESTful Search Engine
Search Engines
Browser Use
Open-source AI browser automation
AI Web Scraping
Browser Automation
Firecrawl
🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data
AI Web Scraping
Scraping APIs
Playwright
Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
Browser Automation
Headless Browsers
Puppeteer
JavaScript API for Chrome and Firefox
Browser Automation
Headless Browsers
n8n
Low-code/no-code, larger community
Workflow Orchestration
Prev
Page 3 of 3
Page:
1
2
3
Next