Documentation Index
Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Crawl traverses a site starting from a URL, follows links up to a depth you set, and returns each page in the formats you request. Crawls are async — you start a job, then poll (or get notified via webhook) until it completes.Try Crawl instantly in our interactive playground.
Pricing
A Crawl job costs 2 credits to start, plus the per-page Scrape cost for every page processed. Per-page format costs:| Format | Credits |
|---|---|
markdown | 1 |
html | 1 |
links | 1 |
images | 1 |
summary | 1 |
json | 5 |
screenshot | 2 |
branding | 25 |
stealth in fetchConfig adds 5 credits per page; render mode (auto/fast/js) does not affect the cost. See the pricing page for the full breakdown.
Getting Started
Quick Start
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Starting URL to crawl. |
formats | array | No | Output formats per page (see Scrape formats). |
maxPages / max_pages | int | No | Maximum number of pages to crawl. Default 50, max 1000. |
maxDepth / max_depth | int | No | How many levels deep to follow links. Default 2. |
maxLinksPerPage / max_links_per_page | int | No | Cap on links expanded per page. Default 10. |
allowExternal / allow_external | bool | No | Whether to follow links to other domains. Default false (same-origin only). |
includePatterns / include_patterns | array | No | URL patterns to include (e.g. ["/blog/*"]). |
excludePatterns / exclude_patterns | array | No | URL patterns to exclude (e.g. ["/admin/*"]). |
contentTypes / content_types | array | No | Limit crawled pages to these MIME types, e.g. ["text/html", "application/pdf"]. |
fetchConfig / fetch_config | object | No | Fetch options (see Scrape · FetchConfig). |
Get your API key from the dashboard.
Example Response (start)
Example Response (start)
Example Response (get)
Example Response (get)
Fetching page content
The crawl response returns each page as lightweight metadata (url, depth, scrapeRefId, …) — not the full body. Use the History service with each scrapeRefId to pull the formatted content the underlying scrape produced.
requestParentId linkage that ties each child scrape back to its parent crawl.
Managing Crawl Jobs
Advanced Usage
URL patterns and fetch config
Async Support (Python)
Key Features
Multi-Page Crawling
Traverse entire sites, following links automatically.
Flexible Formats
Request markdown, HTML, links, images, and more per page.
Job Control
Start, stop, resume, and delete crawl jobs.
URL Filtering
Include or exclude by URL pattern.
Integration Options
Official SDKs
- Python SDK
- JavaScript SDK (
scrapegraph-js≥ 2.1.0, Node ≥ 22)
AI Framework Integrations
Support & Resources
Documentation
Guides and tutorials
API Reference
Detailed API documentation
Community
Join our Discord community
GitHub
Check out our open-source projects

