Get crawl pages - ScrapeGraphAI

GET https://v2-api.scrapegraphai.com/api/crawl/:id/pages

Returns a cursor-paginated slice of crawl pages for a job started with POST /api/crawl. Each returned page includes its lightweight crawl metadata and, when available, the resolved scrape result for that page. Use this endpoint for page content. Keep GET /api/crawl/:id for lightweight status polling.

Path parameters

string

required

The crawl job UUID returned by POST /api/crawl.

Query parameters

limit

integer

default:"50"

Number of crawl pages to return in this response. Minimum 1, maximum 100.

cursor

integer

default:"0"

Zero-based index cursor. 0 starts at the first crawl page. Use the pagination.nextCursor value from the previous response to fetch the next slice.

Pagination behavior

limit controls the page size. If you omit it, the API returns up to 50 crawl pages. cursor is an index into the ordered crawl page list, not an opaque token. For example:

# First 50 crawl pages
curl -X GET "https://v2-api.scrapegraphai.com/api/crawl/:id/pages?limit=50&cursor=0" \
  -H "SGAI-APIKEY: $SGAI_API_KEY"

# If the response returns "nextCursor": "50", fetch the next 50
curl -X GET "https://v2-api.scrapegraphai.com/api/crawl/:id/pages?limit=50&cursor=50" \
  -H "SGAI-APIKEY: $SGAI_API_KEY"

When pagination.nextCursor is null, there are no more crawl pages to fetch.

Example request

curl -X GET "https://v2-api.scrapegraphai.com/api/crawl/79694e03-f2ea-43f2-93cc-7c6fc26f999a/pages?limit=50&cursor=0" \
  -H "SGAI-APIKEY: $SGAI_API_KEY"

Example response

{
  "data": [
    {
      "url": "https://example.com",
      "depth": 0,
      "title": "",
      "status": "completed",
      "parentUrl": null,
      "contentType": "text/html",
      "links": ["https://iana.org/domains/example"],
      "scrapeRefId": "83a911ed-c0bc-4a8c-ad62-8efeeb93f33a",
      "scrape": {
        "results": {
          "markdown": {
            "data": ["# Example Domain\n\nThis domain is for use in illustrative examples..."]
          }
        },
        "metadata": {
          "contentType": "text/html"
        }
      }
    }
  ],
  "pagination": {
    "limit": 50,
    "nextCursor": null
  }
}

Field	Description
`data[]`	Ordered crawl pages for this slice.
`data[].scrapeRefId`	UUID of the underlying Scrape request.
`data[].scrape`	Resolved Scrape response for the page, when the page has a `scrapeRefId` and the result is available.
`pagination.limit`	Echo of the requested page size.
`pagination.nextCursor`	Cursor for the next request, or `null` when there are no more pages.

scrape is resolved by default. There is no expand or populate query parameter. If you only need one page’s underlying Scrape request, you can also fetch data[].scrapeRefId with GET /api/history/:id.

Start a job: POST /api/crawl
Poll status: GET /api/crawl/:id
Fetch one underlying scrape: GET /api/history/:id
Stop / resume / delete: Manage crawl jobs

​Path parameters

​Query parameters

​Pagination behavior

​Example request

​Example response

​Related

Path parameters

Query parameters

Pagination behavior

Example request

Example response

Related