Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt

Use this file to discover all available pages before exploring further.

Why switch?

ScrapeGraph v2 offers AI-powered scraping, extraction, search, crawling, and first-class scheduled monitoring through a unified API. If you’re coming from Firecrawl, this page maps every endpoint, SDK method, parameter, and response shape to its ScrapeGraph equivalent so you can migrate quickly and confidently. The migration is mechanical for most workloads: change a header, swap an import, and adjust one or two argument names. The places that need genuine rethinking are change tracking (now a first-class monitor resource) and browser actions (replaced by a simpler fetchConfig model).

Feature comparison at a glance

CapabilityFirecrawlScrapeGraph v2
Single-page scrape (markdown, html, screenshot…)POST /v2/scrapePOST /api/scrape
Structured extraction (prompt + schema)POST /v2/extractPOST /api/extract
Web search with optional extractionPOST /v2/searchPOST /api/search
Async multi-page crawlPOST /v2/crawlGET /v2/crawl/{id}POST /api/crawlGET /api/crawl/{id}
URL discovery (sitemap + links)POST /v2/mapUse crawl.start with patterns (no one-shot map)
Batch scrape a list of URLsPOST /v2/batch/scrapeLoop concurrent scrape calls, or crawl.start with a URL list
Change trackingchangeTracking format on scrape/crawlFirst-class monitor resource with cron scheduling (POST /api/monitor)
Browser interactions before scrapeactions array on /v2/scrape (click/scroll/type/wait)fetchConfig (mode="js", stealth, wait, scrolls) on scrape/extract/search/crawl
WebhooksCrawl webhooksMonitor + crawl webhooks (webhookUrl)
Async SDKAsyncFirecrawlAsyncScrapeGraphAI
Response shapeDirect values (raises on error)ApiResult envelope (status + data + error)

Authentication

FirecrawlScrapeGraph v2
HeaderAuthorization: Bearer fc-...SGAI-APIKEY: sgai-...
Env varFIRECRAWL_API_KEYSGAI_API_KEY
Base URLhttps://api.firecrawl.dev/v2https://v2-api.scrapegraphai.com/api
Key formatfc- prefix, 32-char hexsgai- prefix, UUID-style
The header name is the most common source of migration bugs — SGAI-APIKEY is not a Bearer token.

SDK installation

FirecrawlScrapeGraph v2
Pythonpip install firecrawl-pypip install scrapegraph-py (≥ 2.1.0, Python ≥ 3.12)
Node.jsnpm i @mendable/firecrawl-jsnpm i scrapegraph-js (≥ 2.1.0, Node ≥ 22)
CLInpm i -g firecrawlnpm i -g just-scrape
MCP serverAvailablepip install scrapegraph-mcp

Migration checklist

1
Update dependencies
2
# Remove Firecrawl
pip uninstall firecrawl-py            # Python
npm uninstall @mendable/firecrawl-js  # Node.js

# Install ScrapeGraph
pip install -U "scrapegraph-py>=2.1.0"   # Python (3.12+)
npm install scrapegraph-js@latest        # Node.js (22+)
3
Update environment variables
4
# Replace
# FIRECRAWL_API_KEY=fc-...

# With
SGAI_API_KEY=sgai-...
5
Get your API key from the dashboard.
6
Update imports and client initialization
7
Python
# Before (Firecrawl)
from firecrawl import Firecrawl
fc = Firecrawl(api_key="fc-...")

# After (ScrapeGraph v2)
from scrapegraph_py import ScrapeGraphAI
# reads SGAI_API_KEY from env, or pass explicitly: ScrapeGraphAI(api_key="...")
sgai = ScrapeGraphAI()
JavaScript
// Before (Firecrawl)
import Firecrawl from "@mendable/firecrawl-js";
const fc = new Firecrawl({ apiKey: "fc-..." });

// After (ScrapeGraph v2)
import { ScrapeGraphAI } from "scrapegraph-js";
// reads SGAI_API_KEY from env, or pass explicitly: ScrapeGraphAI({ apiKey: "..." })
const sgai = new ScrapeGraphAI();
8
Scrape → scrape
9
Firecrawl’s scrape fetches a page in one or more formats. ScrapeGraph’s scrape mirrors that, with typed format configs in Python and plain objects in JS.
10
Format coverage
11
Firecrawl formatScrapeGraph formatNotes"markdown"MarkdownFormatConfig(mode="normal" | "reader" | "prune")reader strips chrome, prune is aggressive"html"HtmlFormatConfig(mode=...)Same mode options as markdown"rawHtml"HtmlFormatConfig(mode="normal")No separate raw variant — normal mode is the unprocessed page"links"LinksFormatConfig()Returns every outbound link"screenshot" / "screenshot@fullPage"ScreenshotFormatConfig(full_page=True, width=..., height=..., quality=...)Width 320–3840, height 200–2160, quality 1–100{"type": "json", ...}JsonFormatConfig(prompt="...", schema={...})Inline LLM extraction during scrape(n/a)ImagesFormatConfig()Every image URL on the page(n/a)SummaryFormatConfig()LLM-generated TL;DR(n/a)BrandingFormatConfig()Logo, palette, fonts{"type": "changeTracking"}Use monitor.create insteadSee Change tracking below
12
You can request several formats in a single call — they share the page fetch, so it costs one navigation.
13
Basic scrape
14
Python
# Before (Firecrawl)
doc = fc.scrape("https://example.com", formats=["markdown"])
print(doc.markdown)

# After (ScrapeGraph v2 — scrapegraph-py ≥ 2.1.0)
from scrapegraph_py import MarkdownFormatConfig

res = sgai.scrape(
    "https://example.com",
    formats=[MarkdownFormatConfig()],
)
if res.status == "success":
    print(res.data.results["markdown"]["data"][0])
JavaScript
// Before (Firecrawl)
const doc = await fc.scrape("https://example.com", { formats: ["markdown"] });
console.log(doc.markdown);

// After (ScrapeGraph v2)
const res = await sgai.scrape({
  url: "https://example.com",
  formats: [{ type: "markdown" }],
});
if (res.status === "success") {
  console.log(res.data?.results.markdown?.data?.[0]);
}
15
Multiple formats in one call
16
Python
# Before (Firecrawl)
doc = fc.scrape("https://example.com", formats=["markdown", "html", "links", "screenshot"])
print(doc.markdown, doc.html, doc.links, doc.screenshot)

# After (ScrapeGraph v2)
from scrapegraph_py import (
    MarkdownFormatConfig, HtmlFormatConfig,
    LinksFormatConfig, ScreenshotFormatConfig,
)

res = sgai.scrape(
    "https://example.com",
    formats=[
        MarkdownFormatConfig(),
        HtmlFormatConfig(mode="reader"),
        LinksFormatConfig(),
        ScreenshotFormatConfig(full_page=True, width=1440, height=900),
    ],
)
results = res.data.results
print(results["markdown"]["data"][0])
print(results["html"]["data"][0])
print(results["links"]["data"])
print(results["screenshot"]["data"][0])  # base64 PNG
JavaScript
// After (ScrapeGraph v2)
const res = await sgai.scrape({
  url: "https://example.com",
  formats: [
    { type: "markdown" },
    { type: "html", mode: "reader" },
    { type: "links" },
    { type: "screenshot", fullPage: true, width: 1440, height: 900 },
  ],
});
const r = res.data?.results;
console.log(r?.markdown?.data?.[0]);
console.log(r?.screenshot?.data?.[0]); // base64 PNG
17
Browser interactions: actionsfetchConfig
18
Firecrawl exposes an actions array (click, scroll, wait, type, press, screenshot) executed before the page is captured. ScrapeGraph replaces this with a declarative fetchConfig:
19
Firecrawl actionScrapeGraph equivalent{"type": "wait", "milliseconds": 2000}fetch_config=FetchConfig(wait=2000){"type": "scroll", ...} (repeated)fetch_config=FetchConfig(scrolls=5){"type": "click", "selector": "..."}Not supported — split into two scrapes, or use a webhook-driven workflow{"type": "screenshot"}Add ScreenshotFormatConfig() to formatsMobile / desktop UA toggleheaders={"User-Agent": "..."}Geolocation / proxy regioncountry="US" (ISO 3166-1 alpha-2)
20
fetchConfig accepts: mode ("auto" / "fast" / "js"), stealth (bool, residential proxy + anti-bot headers), headers, cookies, scrolls (0–100), wait (0–30000 ms), timeout (1000–60000 ms), country (2-letter ISO code).
21
Python
# Before (Firecrawl — actions array)
doc = fc.scrape(
    "https://example.com",
    formats=["markdown"],
    actions=[
        {"type": "wait", "milliseconds": 2000},
        {"type": "scroll", "direction": "down"},
        {"type": "scroll", "direction": "down"},
    ],
)

# After (ScrapeGraph v2 — declarative fetchConfig)
from scrapegraph_py import MarkdownFormatConfig, FetchConfig

res = sgai.scrape(
    "https://example.com",
    formats=[MarkdownFormatConfig()],
    fetch_config=FetchConfig(
        mode="js",         # render JavaScript
        stealth=True,      # rotate residential proxy
        wait=2000,         # ms after navigation
        scrolls=2,         # programmatic scroll ticks
        country="US",
    ),
)
JavaScript
// After (ScrapeGraph v2)
const res = await sgai.scrape({
  url: "https://example.com",
  formats: [{ type: "markdown" }],
  fetchConfig: {
    mode: "js",
    stealth: true,
    wait: 2000,
    scrolls: 2,
    country: "US",
  },
});
22
Extract → extract
23
Same shape: URL + natural-language prompt + optional JSON schema. ScrapeGraph also accepts inline html or markdown instead of a URL — useful when you already have the content.
24
Basic extract
25
Python
# Before (Firecrawl)
result = fc.extract(
    urls=["https://example.com"],
    prompt="Extract the main heading",
    schema={"type": "object", "properties": {"title": {"type": "string"}}},
)

# After (ScrapeGraph v2 — scrapegraph-py ≥ 2.1.0)
res = sgai.extract(
    "Extract the main heading",
    url="https://example.com",
    schema={"type": "object", "properties": {"title": {"type": "string"}}},
)
if res.status == "success":
    print(res.data.json_data)
JavaScript
// Before (Firecrawl)
const result = await fc.extract({
  urls: ["https://example.com"],
  prompt: "Extract the main heading",
  schema: { type: "object", properties: { title: { type: "string" } } },
});

// After (ScrapeGraph v2)
const res = await sgai.extract({
  url: "https://example.com",
  prompt: "Extract the main heading",
  schema: { type: "object", properties: { title: { type: "string" } } },
});
if (res.status === "success") {
  console.log(res.data?.json);
}
26
Pydantic schemas (Python)
27
scrapegraph-py accepts any dict that conforms to JSON Schema, so Pydantic models work via model_json_schema():
28
from pydantic import BaseModel, Field
from scrapegraph_py import ScrapeGraphAI

class Product(BaseModel):
    name: str
    price_usd: float = Field(description="Price in US dollars")
    in_stock: bool

sgai = ScrapeGraphAI()
res = sgai.extract(
    "Extract product details",
    url="https://example.com/product/42",
    schema=Product.model_json_schema(),
)

if res.status == "success":
    product = Product.model_validate(res.data.json_data)
    print(product.name, product.price_usd)
29
Extract from existing HTML or markdown
30
Skip the fetch when you already have the content (e.g., a cached page, an internal CMS document):
31
res = sgai.extract(
    "Extract the author and publication date",
    html="<html>...</html>",   # or markdown="# Article\n..."
    schema={"type": "object", "properties": {
        "author": {"type": "string"},
        "published_at": {"type": "string", "format": "date-time"},
    }},
)
32
Bulk URLs
33
Firecrawl accepts a list of URLs or wildcards in one call. On ScrapeGraph, call extract once per URL (run them concurrently) or use crawl.start to discover pages first and then extract from each.
35
ScrapeGraph’s search supports the same query-and-limit pattern, plus optional LLM extraction in a single call (Firecrawl’s scrapeOptions parameter).
37
Python
# Before (Firecrawl)
hits = fc.search(query="best programming languages 2026", limit=5)

# After (ScrapeGraph v2 — scrapegraph-py ≥ 2.1.0)
res = sgai.search(
    "best programming languages 2026",
    num_results=5,
)
if res.status == "success":
    for r in res.data.results:
        print(r.title, "-", r.url)
JavaScript
// Before (Firecrawl)
const hits = await fc.search({ query: "best programming languages 2026", limit: 5 });

// After (ScrapeGraph v2)
const res = await sgai.search({
  query: "best programming languages 2026",
  numResults: 5,
});
if (res.status === "success") {
  for (const r of res.data?.results ?? []) console.log(r.title, "-", r.url);
}
38
Search + extract in one call
39
Firecrawl exposes scrapeOptions to scrape each result; ScrapeGraph fuses search and structured extraction with a prompt + schema:
40
res = sgai.search(
    "open-source vector databases",
    num_results=10,
    prompt="Extract the project name, GitHub URL, and primary license",
    schema={
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "github_url": {"type": "string"},
            "license": {"type": "string"},
        },
    },
)
41
Parameter map
42
FirecrawlScrapeGraph (Python)ScrapeGraph (JS)queryquery (positional)querylimitnum_results (1–20)numResultstbs (time filter)time_range="past_hour" | "past_24_hours" | "past_week" | "past_month" | "past_year"timeRangelocationlocation_geo_code (ISO country code)locationGeoCodescrapeOptions.formatsformat="markdown" | "html" + modeformat + modescrapeOptions (full page scrape)prompt + schema for inline extractionsamesources=["web","news","images"]Web only (use time_range for recency)same
43
Crawl → crawl.start + crawl.get
44
Firecrawl’s crawl() blocks until completion; start_crawl() returns a job id. ScrapeGraph’s crawl is always async — start, then poll (or stop, resume, delete).
45
Start + poll
46
Python
# Before (Firecrawl — blocking)
job = fc.crawl("https://example.com", limit=50)

# Or non-blocking:
started = fc.start_crawl("https://example.com", limit=50)
status = fc.get_crawl_status(started.id)

# After (ScrapeGraph v2 — scrapegraph-py ≥ 2.1.0)
start = sgai.crawl.start(
    "https://example.com",
    max_depth=2,
    max_pages=50,
    include_patterns=["/blog/*"],
    exclude_patterns=["/admin/*"],
)
status = sgai.crawl.get(start.data.id)
print(status.data.status, status.data.finished, "/", status.data.total)
JavaScript
// Before (Firecrawl)
const job = await fc.crawl("https://example.com", { limit: 50 });
// Or non-blocking:
const started = await fc.startCrawl("https://example.com", { limit: 50 });
const status = await fc.getCrawlStatus(started.id);

// After (ScrapeGraph v2)
const start = await sgai.crawl.start({
  url: "https://example.com",
  maxDepth: 2,
  maxPages: 50,
  includePatterns: ["/blog/*"],
  excludePatterns: ["/admin/*"],
});
const status = await sgai.crawl.get(start.data.id);
47
Crawl with structured extraction
48
Attach a JsonFormatConfig to every crawled page so each result already has structured fields:
49
from scrapegraph_py import MarkdownFormatConfig, JsonFormatConfig

start = sgai.crawl.start(
    "https://docs.example.com",
    max_depth=3,
    max_pages=200,
    formats=[
        MarkdownFormatConfig(mode="reader"),
        JsonFormatConfig(
            prompt="Extract the page title and the list of code samples",
            schema={
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "code_samples": {"type": "array", "items": {"type": "string"}},
                },
            },
        ),
    ],
)
50
Parameter map
51
FirecrawlScrapeGraph (Python)ScrapeGraph (JS)limitmax_pages (1–1000, default 50)maxPagesmaxDepthmax_depth (default 2)maxDepthmaxDiscoveryDepthn/a — use max_depthn/aincludePathsinclude_patterns (glob)includePatternsexcludePathsexclude_patterns (glob)excludePatternsallowExternalLinksallow_external (default false)allowExternalallowBackwardLinksalways allowed inside max_depthsamewebhookNot on crawl — use a monitor for deliveryn/ascrapeOptions.formatsformats=[...]formats
52
Lifecycle: stop, resume, delete
53
# Pause an in-flight crawl
sgai.crawl.stop(start.data.id)

# Resume it later
sgai.crawl.resume(start.data.id)

# Drop a finished crawl and free retained pages
sgai.crawl.delete(start.data.id)
54
await sgai.crawl.stop(id);
await sgai.crawl.resume(id);
await sgai.crawl.delete(id);
55
Map / batch scrape
56
Firecrawl’s /map returns a list of URLs quickly. ScrapeGraph doesn’t have a one-shot map; use crawl.start with pattern filters and a shallow max_depth to discover URLs cheaply:
57
from scrapegraph_py import LinksFormatConfig

start = sgai.crawl.start(
    "https://example.com",
    max_depth=1,
    max_pages=500,
    max_links_per_page=50,
    include_patterns=["/docs/*", "/blog/*"],
    formats=[LinksFormatConfig()],   # cheapest format — just URL discovery
)
status = sgai.crawl.get(start.data.id)
urls = [p.url for p in status.data.pages]
58
For batch scraping a fixed list of URLs, fan out concurrent scrape calls — the SDK’s AsyncScrapeGraphAI is the easiest path (see Async / concurrency below).
59
Change tracking → monitor
60
Firecrawl ships change tracking as a changeTracking format bolted onto scrape/crawl. ScrapeGraph makes monitoring a first-class resource with cron scheduling, webhook delivery, and a queryable activity log.
61
Create a monitor
62
Python
# Before (Firecrawl — add changeTracking to formats)
doc = fc.scrape(
    "https://example.com",
    formats=["markdown", {"type": "changeTracking", "modes": ["git-diff"], "tag": "hourly"}],
)

# After (ScrapeGraph v2 — scheduled monitor, scrapegraph-py ≥ 2.1.0)
from scrapegraph_py import MarkdownFormatConfig

res = sgai.monitor.create(
    "https://example.com",
    "*/30 * * * *",                 # cron expression (positional)
    name="Homepage watch",
    formats=[MarkdownFormatConfig()],
    webhook_url="https://your-app.example.com/hooks/sgai",
)
cron_id = res.data.cron_id
JavaScript
// Before (Firecrawl)
const doc = await fc.scrape("https://example.com", {
  formats: ["markdown", { type: "changeTracking", modes: ["git-diff"], tag: "hourly" }],
});

// After (ScrapeGraph v2)
const res = await sgai.monitor.create({
  url: "https://example.com",
  name: "Homepage watch",
  interval: "*/30 * * * *",
  formats: [{ type: "markdown" }],
  webhookUrl: "https://your-app.example.com/hooks/sgai",
});
const cronId = res.data?.cronId;
63
Full monitor lifecycle
64
OperationPythonJavaScriptList all monitorssgai.monitor.list()sgai.monitor.list()Get onesgai.monitor.get(cron_id)sgai.monitor.get(cronId)Updatesgai.monitor.update(cron_id, interval="0 * * * *")sgai.monitor.update(cronId, { interval: ... })Pause / resumesgai.monitor.pause(cron_id) / .resume(cron_id)sameRecent tickssgai.monitor.activity(cron_id)sameDeletesgai.monitor.delete(cron_id)same
65
Each tick in monitor.activity returns status, created_at, elapsed_ms, plus a changed flag and a diffs field when content has moved since the previous run — same job as Firecrawl’s git-diff mode, persisted by ScrapeGraph for you.
66
Async / concurrency
67
Both SDKs ship an async client. The shape is identical — just await every call.
68
Python
import asyncio
from scrapegraph_py import AsyncScrapeGraphAI, MarkdownFormatConfig

async def fetch_many(urls):
    async with AsyncScrapeGraphAI() as sgai:
        return await asyncio.gather(*[
            sgai.scrape(u, formats=[MarkdownFormatConfig()]) for u in urls
        ])

results = asyncio.run(fetch_many([
    "https://example.com",
    "https://example.org",
]))
JavaScript
// The default `ScrapeGraphAI` client is already promise-based.
const urls = ["https://example.com", "https://example.org"];
const results = await Promise.all(urls.map((url) =>
  sgai.scrape({ url, formats: [{ type: "markdown" }] })
));
69
Handle the ApiResult wrapper
70
The ScrapeGraph Python and JS SDKs wrap every response in an ApiResult — no exceptions to catch on HTTP errors. Check status before reading data:
71
result = sgai.extract("...", url="https://example.com")
if result.status == "success":
    data = result.data.json_data
else:
    print(f"Error: {result.error}")
72
const result = await sgai.extract({ url: "https://example.com", prompt: "..." });
if (result.status === "success") {
  console.log(result.data?.json);
} else {
  console.error(result.error);
}
73
Direct HTTP callers (curl, fetch) receive the unwrapped response body — the envelope is applied client-side by the SDKs.
74
Envelope fields
75
FieldTypeNotesstatus"success" | "error"Always setdataT | NoneThe endpoint’s normal response body when status == "success"errorstr | NonePresent when status == "error"elapsed_ms (Py) / elapsedMs (JS)intClient-measured round-trip time
76
Error handling
77
Firecrawl raises exceptions on HTTP errors; ScrapeGraph returns a non-success ApiResult. The HTTP status codes map cleanly:
78
HTTPScrapeGraph error typeRetryable?Typical cause400validation (with details[])NoBad request body401auth_missing_keyNoSGAI-APIKEY header missing402insufficient_creditsNoTop up at the dashboard403auth_invalid_keyNoKey revoked or malformed404not_foundNoWrong endpoint or unknown job id429rate_limitedYes (backoff)SDKs already retry with backoff5xxinternal_errorYes (backoff)Transient — SDKs retry
79
A defensive wrapper looks the same as the one you wrote around Firecrawl, with one fewer except branch:
80
res = sgai.scrape("https://example.com", formats=[MarkdownFormatConfig()])
if res.status != "success":
    raise RuntimeError(f"scrape failed: {res.error}")
markdown = res.data.results["markdown"]["data"][0]
81
Test and verify
82
Run your existing test suite and compare outputs. ScrapeGraph returns equivalent data structures — the main differences are:
83
  • The ApiResult envelope in the SDKs (no exceptions on error)
  • The split crawl.start / crawl.get flow (always async)
  • The dedicated monitor resource in place of change-tracking formats
  • fetchConfig (declarative) in place of actions (imperative)
  • 84
    A quick equivalence script for a single URL:
    85
    from firecrawl import Firecrawl
    from scrapegraph_py import ScrapeGraphAI, MarkdownFormatConfig
    
    fc = Firecrawl()
    sgai = ScrapeGraphAI()
    
    URL = "https://example.com"
    fc_md   = fc.scrape(URL, formats=["markdown"]).markdown
    sgai_md = sgai.scrape(URL, formats=[MarkdownFormatConfig()]).data.results["markdown"]["data"][0]
    print("len(firecrawl)=", len(fc_md), "len(scrapegraph)=", len(sgai_md))
    

    Quick cURL sanity check

    curl -X POST https://v2-api.scrapegraphai.com/api/scrape \
      -H "SGAI-APIKEY: $SGAI_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"url":"https://example.com","formats":[{"type":"markdown"}]}'
    
    Response (note: no ApiResult envelope on the raw HTTP endpoint — the SDKs add that client-side):
    {
      "id": "3b1c81d9-3f3b-42b0-9cf7-6926d9ebc7f5",
      "results": {
        "markdown": { "data": ["# Example Domain\n\n..."] }
      },
      "metadata": { "contentType": "text/html" }
    }
    

    Common gotchas

    • Header name. It’s SGAI-APIKEY: sgai-..., not Authorization: Bearer .... Watch for proxies that normalize header casing — the API tolerates any case, but some HTTP libraries strip non-standard headers in redirects.
    • schema field name in Python. JsonFormatConfig and extract use schema= (the field is internally aliased from schema_ to avoid shadowing the BaseModel.schema method — pass schema= from your code and it works).
    • No actions array. If you relied on click/type/press actions, you’ll need to either split the flow into two scrapes (one to trigger a navigation that produces a stable URL, one to scrape the result) or contact support about the upcoming interactions API.
    • Crawl is always async. There is no blocking sgai.crawl(...) — call crawl.start and poll, or pass a webhookUrl via a monitor instead.
    • changeTracking is gone as a format. Use monitor.create — it gets you cron scheduling, persistent history, and webhook delivery in one resource.
    • Response shape per format. Each requested format lives under results[<format>].data (always an array). For most formats the array has one element; for links and images it’s the full list.
    • numResults caps at 20 for search. Firecrawl’s limit accepts higher values — split the query (e.g., by timeRange) if you need more.

    Full SDK documentation