Open Source Web Crawler
Scraping at any real scale is a fight with rate limits, JavaScript-rendered pages, and shifting page structure, so the engine that fetches and parses is the actual work - and a hosted crawler hides that behind per-page quotas and a ceiling you hit when a job gets interesting. The open source crawlers and scrapers here put the fetch, render, and extraction pipeline on your own machines, so you can crawl as deep as your hardware allows and keep the data without it detouring through someone else's cloud.

Firecrawl
API that scrapes live sites into clean Markdown for AI agents

Crawl4AI
Open-source web crawler that turns sites into LLM-ready Markdown

Scrapy
Fast, high-level Python framework for web crawling and scraping

EasySpider
No-code visual web crawler and browser automation tool with command-line execution

ScrapeGraphAI
Python scraping library that uses LLMs and graph logic

Colly
Fast, elegant scraping and crawling framework for Go

Crawlee
Node.js library for web scraping and browser automation

Katana
Fast CLI web crawler and spider for security recon
Photon
Fast Python OSINT crawler for URLs, files, keys, and DNS data