Skip to main content

Overview

History keeps a record of every API call your account makes (scrape, extract, search, monitor ticks, crawl jobs, schema generations) and lets you fetch the full result back later by ID. The most common use case is retrieving the formatted content of a crawled page — the Crawl service returns each page as a scrapeRefId, and History is what you call with that ID to get the markdown, HTML, JSON extraction, or screenshot the underlying scrape produced.

Getting Started

Quick Start

from scrapegraph_py import ScrapeGraphAI

# reads SGAI_API_KEY from env, or pass explicitly: ScrapeGraphAI(api_key="...")
sgai = ScrapeGraphAI()

# List recent scrape calls
page = sgai.history.list(service="scrape", limit=5)
for entry in page.data.data:
    print(entry.id, entry.service, entry.status, entry.elapsed_ms)

# Fetch one entry, including the full result
one = sgai.history.get("9701fc04-23de-4684-a48f-7e8fa287550b")
if one.status == "success":
    print(one.data.result)

Parameters

List (GET /api/history)
ParameterTypeRequiredDescription
pageintegerNoPage number (1-indexed). Default: 1.
limitintegerNoEntries per page. Default: 20.
servicestringNoFilter by service: scrape, extract, search, monitor, crawl, schema.
Get (GET /api/history/:id)
ParameterTypeRequiredDescription
idstringYesUUID of the request. Same UUID returned by the originating endpoint, or any scrapeRefId from a crawl.
Get your API key from the dashboard.

Fetching crawled page content

This is the canonical pattern: start a crawl, poll until done, then call History for each page.
import time
from scrapegraph_py import ScrapeGraphAI, MarkdownFormatConfig

sgai = ScrapeGraphAI()

start = sgai.crawl.start(
    "https://scrapegraphai.com/",
    formats=[MarkdownFormatConfig()],
    max_pages=5,
    max_depth=2,
)
crawl_id = start.data.id

while True:
    time.sleep(2)
    status = sgai.crawl.get(crawl_id)
    if status.data.status in ("completed", "failed"):
        break

# Pull the formatted content for every completed page
for page in status.data.pages:
    if page.status != "completed":
        continue
    entry = sgai.history.get(page.scrape_ref_id)
    md = entry.data.result.results.get("markdown", {}).get("data", [None])[0]
    print(page.url, "->", md[:80] if md else "(empty)")

Linking children to a parent crawl

Every child scrape entry produced by a crawl has requestParentId set to the parent crawl’s id. So you can also list all pages from a single crawl by filtering on the client:
page = sgai.history.list(service="scrape", limit=100)
children = [e for e in page.data.data if e.request_parent_id == crawl_id]

Entry shape

FieldDescription
idEntry UUID — same UUID the originating endpoint returned.
servicescrape | extract | search | monitor | crawl | schema.
statusrunning | completed | failed.
paramsThe request body that produced this entry.
resultThe full response payload (shaped per the originating service). null while running.
errorError object if status === "failed", otherwise null.
elapsedMsHow long the request took, in milliseconds.
requestParentIdParent UUID if this entry was created by another request (e.g. a scrape from a crawl). null for top-level.
createdAtISO-8601 timestamp.

Async Support (Python)

import asyncio
from scrapegraph_py import AsyncScrapeGraphAI

async def main():
    async with AsyncScrapeGraphAI() as sgai:
        page = await sgai.history.list(service="scrape", limit=10)
        if page.status == "success":
            for entry in page.data.data:
                print(entry.id, entry.created_at)

asyncio.run(main())

Key Features

Crawl Page Content

Resolve scrapeRefIds from crawl results to fetch each page’s formatted content.

Replay Past Requests

Fetch the full result of any past call without re-running it (no extra credits).

Service Filtering

Narrow by scrape, extract, search, monitor, crawl, or schema.

Parent Linking

requestParentId ties child requests back to the crawl or workflow that spawned them.

Integration Options

Official SDKs

Support & Resources

API Reference

Detailed endpoint documentation

Crawl Service

The most common source of scrapeRefIds

Community

Join our Discord community

GitHub

Check out our open-source projects