Scrapy

Fast, high-level Python framework for web crawling and scraping

Repository activity

Stars62.3k
Forks11.6k
Open Issues621

scrapy health score - Linux Foundation Insights

License

BSD-3-Clause

Languages

Python
Go Template
HTML

Get it:Website PyPI

About Scrapy

Scrapy is a fast, high-level web crawling and scraping framework for Python that extracts structured data from websites. You write spiders in code, making it a fit for projects that need a programmable crawler rather than a point-and-click tool.

The framework is built on an asynchronous engine for concurrent requests, and is highly extensible through middlewares, item pipelines, and signals. It runs cross-platform and requires Python 3.10 or newer.

Scrapy is maintained by Zyte, formerly Scrapinghub, alongside many other contributors. It is released under the BSD 3-Clause license, installs with pip, and runs entirely locally with no hosted service.

Key features

Code-defined spiders for crawling and scraping
Asynchronous engine for concurrent requests
Structured data extraction from web pages
Extensible via middlewares and item pipelines
Cross-platform, runs on Python 3.10+

Details

First released: 2010
Platforms: Windows · macOS · Linux
Language: Python
Deployment: Library
Maintainer: Zyte
License: BSD-3-Clause