Python scraping library that uses LLMs and graph logic
MIT
- Python
- Makefile
- Dockerfile

About ScrapeGraphAI
ScrapeGraphAI is a Python web scraping library that uses LLMs and graph logic to extract structured data from websites and local documents such as XML, HTML, JSON, and Markdown. You describe the information you want, and it builds the pipeline to get it, without hand-written selectors.
Scraping runs as configurable graphs, with pipelines for single pages, multiple pages, and search-driven extraction, plus parallel LLM calls in the multi-page variants. It works with local models through Ollama or hosted APIs such as OpenAI, Groq, Azure, and Gemini.
Released under the MIT License, it ships Python and Node.js SDKs and integrates with frameworks like LangChain, LlamaIndex, and CrewAI. A hosted API is offered separately at scrapegraphai.com.
Key features
- Extract structured data from pages and documents
- LLM-driven scraping pipelines as graphs
- Single-page, multi-page, and search workflows
- Parallel LLM calls in multi-page graphs
- Works with local Ollama or hosted LLM APIs
Details
- First released
- 2024
- Language
- Python
- SDKs
- Python · Node.js
- Models
- Ollama · OpenAI · Groq · Gemini
- Deployment
- Library · Hosted API
- License
- MIT
