Skip to main content
POST https://v2-api.scrapegraphai.com/api/crawl
Starts an asynchronous crawl. The response returns a job id immediately; poll GET /api/crawl/:id or manage the job via the control endpoints.

Request body

url
string
required
Starting URL to crawl.
formats
array
Output formats captured for each crawled page. Same shape as the Scrape formats array.
maxPages
integer
Maximum number of pages to crawl.
maxDepth
integer
How many levels of links to follow from the starting URL.
Cap on links expanded per page.
includePatterns
array
Glob-style URL patterns to include, e.g. ["/blog/*"].
excludePatterns
array
Glob-style URL patterns to exclude, e.g. ["/admin/*"].
fetchConfig
object
Fetch-time options applied to every page. See the Scrape endpoint.

Example request

curl -X POST https://v2-api.scrapegraphai.com/api/crawl \
  -H "SGAI-APIKEY: $SGAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://scrapegraphai.com/",
    "formats": [{ "type": "markdown" }],
    "maxPages": 5,
    "maxDepth": 2,
    "includePatterns": ["/blog/*"],
    "excludePatterns": ["/admin/*"]
  }'

Example response

{
  "id": "79694e03-f2ea-43f2-93cc-7c6fc26f999a",
  "status": "running",
  "total": 3,
  "finished": 0,
  "pages": []
}
FieldDescription
idCrawl job identifier used on every follow-up endpoint.
statusLifecycle state: "running", "completed", "failed", or "stopped".
totalTotal pages the crawler expects to process so far.
finishedPages completed.
pagesPer-page results (empty until the job makes progress).