Scrapy logo

Scrapy

Fast, high-level Python framework for web crawling and scraping

Repository activity
  • Stars62.3k
  • Forks11.6k
  • Open Issues621
scrapy health score - Linux Foundation Insights
License

BSD-3-Clause

Languages
  • Python
  • Go Template
  • HTML
Scrapy screenshot

About Scrapy

Scrapy is a fast, high-level web crawling and scraping framework for Python that extracts structured data from websites. You write spiders in code, making it a fit for projects that need a programmable crawler rather than a point-and-click tool.

The framework is built on an asynchronous engine for concurrent requests, and is highly extensible through middlewares, item pipelines, and signals. It runs cross-platform and requires Python 3.10 or newer.

Scrapy is maintained by Zyte, formerly Scrapinghub, alongside many other contributors. It is released under the BSD 3-Clause license, installs with pip, and runs entirely locally with no hosted service.

Key features

  • Code-defined spiders for crawling and scraping
  • Asynchronous engine for concurrent requests
  • Structured data extraction from web pages
  • Extensible via middlewares and item pipelines
  • Cross-platform, runs on Python 3.10+

Details

First released
2010
Platforms
Windows · macOS · Linux
Language
Python
Deployment
Library
Maintainer
Zyte
License
BSD-3-Clause