Start crawl

POST https://v2-api.scrapegraphai.com/api/crawl

Starts an asynchronous crawl. The response returns a job id immediately; poll GET /api/crawl/:id or manage the job via the control endpoints.

Request body

url

string

required

Starting URL to crawl.

formats

array

Output formats captured for each crawled page. Same shape as the Scrape formats array.

maxPages

integer

Maximum number of pages to crawl.

maxDepth

integer

How many levels of links to follow from the starting URL.

maxLinksPerPage

integer

Cap on links expanded per page.

includePatterns

array

Glob-style URL patterns to include, e.g. ["/blog/*"].

excludePatterns

array

Glob-style URL patterns to exclude, e.g. ["/admin/*"].

fetchConfig

object

Fetch-time options applied to every page. See the Scrape endpoint.

Example request

curl -X POST https://v2-api.scrapegraphai.com/api/crawl \
  -H "SGAI-APIKEY: $SGAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://scrapegraphai.com/",
    "formats": [{ "type": "markdown" }],
    "maxPages": 5,
    "maxDepth": 2,
    "includePatterns": ["/blog/*"],
    "excludePatterns": ["/admin/*"]
  }'

Example response

{
  "id": "79694e03-f2ea-43f2-93cc-7c6fc26f999a",
  "status": "running",
  "total": 3,
  "finished": 0,
  "pages": []
}

Field	Description
`id`	Crawl job identifier used on every follow-up endpoint.
`status`	Lifecycle state: `"running"`, `"completed"`, `"failed"`, or `"stopped"`.
`total`	Total pages the crawler expects to process so far.
`finished`	Pages completed.
`pages`	Per-page results (empty until the job makes progress).

Poll progress: GET /api/crawl/:id
Stop, resume, or delete: Manage crawl jobs
Service overview: Crawl

API Documentation

Scrape

Extract

Search

Crawl

Monitor

Account

Request body

Example request

Example response

API Documentation

Scrape

Extract

Search

Crawl

Monitor

Account

​Request body

​Example request

​Example response

​Related

Request body

Example request

Example response

Related