ScrapeGraphAI logo

ScrapeGraphAI

Python scraping library that uses LLMs and graph logic

Repository activity
  • Stars27.2k
  • Forks2.6k
  • Open Issues0
scrapegraphai-scrapegraph-ai health score - Linux Foundation Insights
License

MIT

Languages
  • Python
  • Makefile
  • Dockerfile
ScrapeGraphAI screenshot

About ScrapeGraphAI

ScrapeGraphAI is a Python web scraping library that uses LLMs and graph logic to extract structured data from websites and local documents such as XML, HTML, JSON, and Markdown. You describe the information you want, and it builds the pipeline to get it, without hand-written selectors.

Scraping runs as configurable graphs, with pipelines for single pages, multiple pages, and search-driven extraction, plus parallel LLM calls in the multi-page variants. It works with local models through Ollama or hosted APIs such as OpenAI, Groq, Azure, and Gemini.

Released under the MIT License, it ships Python and Node.js SDKs and integrates with frameworks like LangChain, LlamaIndex, and CrewAI. A hosted API is offered separately at scrapegraphai.com.

Key features

  • Extract structured data from pages and documents
  • LLM-driven scraping pipelines as graphs
  • Single-page, multi-page, and search workflows
  • Parallel LLM calls in multi-page graphs
  • Works with local Ollama or hosted LLM APIs

Details

First released
2024
Language
Python
SDKs
Python · Node.js
Models
Ollama · OpenAI · Groq · Gemini
Deployment
Library · Hosted API
License
MIT