Crawl4AI logo

Crawl4AI

Open-source web crawler that turns sites into LLM-ready Markdown

Repository activity
  • Stars68.5k
  • Forks7k
  • Open Issues98
crawl4ai health score - Linux Foundation Insights
License

Apache-2.0

Languages
  • Python
  • JavaScript
  • Shell
Crawl4AI screenshot

About Crawl4AI

Crawl4AI is an open-source, LLM-friendly web crawler and scraper. It turns web pages into clean Markdown for RAG, agents, and data pipelines, and runs with no accounts or API keys required.

It produces structured Markdown with headings, tables, code, and citation hints, and supports structured extraction with or without an LLM. Sessions, proxies, cookies, user scripts, and hooks give fine control, while an async browser pool, caching, crash recovery, and a prefetch mode speed up large crawls.

A Python package at its core, Crawl4AI also ships a CLI and a self-hostable Docker API server. It is released under the Apache License 2.0 and is among the most-starred crawlers on GitHub.

Key features

  • Clean LLM-ready Markdown with tables and citations
  • Structured extraction with or without an LLM
  • Sessions, proxies, cookies, and hooks for control
  • Async browser pool with caching and crash recovery
  • Python library, CLI, and self-hostable Docker server

Details

First released
2024
Latest release
v0.8.9 · 2026
Platforms
Library · CLI · Docker
Language
Python
Output
Markdown for RAG and agents
License
Apache-2.0