thunderbit-mcp-server: Thunderbit MCP Server is an open-source toolkit for the Thunderbit Open API that ships three packages: a CLI for scripted extraction, an MCP server exposing seven scraping and distillation tools, and a Claude Code plugin. It converts any web page to clean LLM-ready Markdown, extracts structured data via JSON Schema, and supports batch processing — all backed by a free API key.; crw: CRW is a single-binary web scraper for AI agents, supporting CLI, MCP, and REST API. It is Firecrawl-compatible, offers JS rendering, and uses very low resources. It can be self-hosted or used via the cloud service.
Feeding clean web content into LLM pipelines for RAG or summarization
RAG pipelines for vector databases