webscraping.app
Browse
About Us
Submit
Sign In
Latest tools
Categories
Tags
Submit
About Us
/
Tags
/
Java
Best Java Web Scraping Tools
Discover the best web scraping tools and libraries for Java. Compare features, community support, and documentation.
Popular Categories:
Browser Automation
14
Scraping Frameworks
11
ETL Tools
9
Analytics Databases
9
SERP APIs
9
Workflow Orchestration
9
AI Web Scraping
8
Scraping APIs
8
Proxy Services
6
Distributed Crawling
6
Search Engines
6
Cloud Compute
6
Order by
Algolia
Managed search, great for e-commerce
Search Engines
Apache Hudi
Lakehouse platform for upserts and incremental processing. Efficient record-level updates.
Analytics Databases
Data Lakehouse
Apache Iceberg
Open table format for huge analytics tables. Multi-engine (Spark, Trino, Flink). Used by Netflix, Airbnb.
Analytics Databases
Data Lakehouse
Cassandra
High availability, peer-to-peer architecture
NoSQL Databases
OpenSearch
AWS-backed fork, Apache 2.0 license
Search Engines
Weaviate
AI-native vector database with built-in ML models. Supports hybrid search (vector + keyword). GraphQL API.
Vector Databases
Temporal
Temporal service
Workflow Orchestration
Typesense
Fast, simple with sub-50ms response times
Search Engines
MongoDB
The MongoDB Database
NoSQL Databases
Selenium
A browser automation framework and ecosystem.
Anti-Detection
Browser Automation
DuckDB
Embedded OLAP, SQLite-like simplicity
Analytics Databases
Milvus
Open-source vector database for AI. Handles billion-scale embeddings with GPU acceleration. LF AI & Data Foundation project.
Vector Databases
ClickHouse
ClickHouse® is a real-time analytics database management system
Analytics Databases
Meilisearch
Lightning-fast open-source search engine
Search Engines
MinIO
MinIO is a high-performance, S3 compatible object store, open sourced under GNU AGPLv3 license.
Object Storage
Redis
For developers, who are building real-time data-driven applications, Redis is the preferred, fastest, and most feature-rich cache, data structure server, and document and vector query engine.
Caching Databases
Elasticsearch
Free and Open Source, Distributed, RESTful Search Engine
Search Engines
Playwright
Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
Browser Automation
Headless Browsers