Documentation Index
Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt
Use this file to discover all available pages before exploring further.
Why switch?
ScrapeGraph v2 offers AI-powered scraping, extraction, search, crawling, and first-class scheduled monitoring through a unified API. If you’re coming from Firecrawl, this page maps every endpoint, SDK method, parameter, and response shape to its ScrapeGraph equivalent so you can migrate quickly and confidently. The migration is mechanical for most workloads: change a header, swap an import, and adjust one or two argument names. The places that need genuine rethinking are change tracking (now a first-classmonitor resource) and browser actions (replaced by a simpler fetchConfig model).
Feature comparison at a glance
| Capability | Firecrawl | ScrapeGraph v2 |
|---|---|---|
| Single-page scrape (markdown, html, screenshot…) | POST /v2/scrape | POST /api/scrape |
| Structured extraction (prompt + schema) | POST /v2/extract | POST /api/extract |
| Web search with optional extraction | POST /v2/search | POST /api/search |
| Async multi-page crawl | POST /v2/crawl → GET /v2/crawl/{id} | POST /api/crawl → GET /api/crawl/{id} |
| URL discovery (sitemap + links) | POST /v2/map | Use crawl.start with patterns (no one-shot map) |
| Batch scrape a list of URLs | POST /v2/batch/scrape | Loop concurrent scrape calls, or crawl.start with a URL list |
| Change tracking | changeTracking format on scrape/crawl | First-class monitor resource with cron scheduling (POST /api/monitor) |
| Browser interactions before scrape | actions array on /v2/scrape (click/scroll/type/wait) | fetchConfig (mode="js", stealth, wait, scrolls) on scrape/extract/search/crawl |
| Webhooks | Crawl webhooks | Monitor + crawl webhooks (webhookUrl) |
| Async SDK | AsyncFirecrawl | AsyncScrapeGraphAI |
| Response shape | Direct values (raises on error) | ApiResult envelope (status + data + error) |
Authentication
| Firecrawl | ScrapeGraph v2 | |
|---|---|---|
| Header | Authorization: Bearer fc-... | SGAI-APIKEY: sgai-... |
| Env var | FIRECRAWL_API_KEY | SGAI_API_KEY |
| Base URL | https://api.firecrawl.dev/v2 | https://v2-api.scrapegraphai.com/api |
| Key format | fc- prefix, 32-char hex | sgai- prefix, UUID-style |
SGAI-APIKEY is not a Bearer token.
SDK installation
| Firecrawl | ScrapeGraph v2 | |
|---|---|---|
| Python | pip install firecrawl-py | pip install scrapegraph-py (≥ 2.1.0, Python ≥ 3.12) |
| Node.js | npm i @mendable/firecrawl-js | npm i scrapegraph-js (≥ 2.1.0, Node ≥ 22) |
| CLI | npm i -g firecrawl | npm i -g just-scrape |
| MCP server | Available | pip install scrapegraph-mcp |
Migration checklist
# Remove Firecrawl
pip uninstall firecrawl-py # Python
npm uninstall @mendable/firecrawl-js # Node.js
# Install ScrapeGraph
pip install -U "scrapegraph-py>=2.1.0" # Python (3.12+)
npm install scrapegraph-js@latest # Node.js (22+)
Get your API key from the dashboard.
Firecrawl’s
scrape fetches a page in one or more formats. ScrapeGraph’s scrape mirrors that, with typed format configs in Python and plain objects in JS."markdown"MarkdownFormatConfig(mode="normal" | "reader" | "prune")reader strips chrome, prune is aggressive"html"HtmlFormatConfig(mode=...)mode options as markdown"rawHtml"HtmlFormatConfig(mode="normal")normal mode is the unprocessed page"links"LinksFormatConfig()"screenshot" / "screenshot@fullPage"ScreenshotFormatConfig(full_page=True, width=..., height=..., quality=...){"type": "json", ...}JsonFormatConfig(prompt="...", schema={...})ImagesFormatConfig()SummaryFormatConfig()BrandingFormatConfig(){"type": "changeTracking"}monitor.create insteadYou can request several formats in a single call — they share the page fetch, so it costs one navigation.
Firecrawl exposes an
actions array (click, scroll, wait, type, press, screenshot) executed before the page is captured. ScrapeGraph replaces this with a declarative fetchConfig:{"type": "wait", "milliseconds": 2000}fetch_config=FetchConfig(wait=2000){"type": "scroll", ...} (repeated)fetch_config=FetchConfig(scrolls=5){"type": "click", "selector": "..."}{"type": "screenshot"}ScreenshotFormatConfig() to formatsheaders={"User-Agent": "..."}country="US" (ISO 3166-1 alpha-2)fetchConfig accepts: mode ("auto" / "fast" / "js"), stealth (bool, residential proxy + anti-bot headers), headers, cookies, scrolls (0–100), wait (0–30000 ms), timeout (1000–60000 ms), country (2-letter ISO code).Same shape: URL + natural-language prompt + optional JSON schema. ScrapeGraph also accepts inline
html or markdown instead of a URL — useful when you already have the content.scrapegraph-py accepts any dict that conforms to JSON Schema, so Pydantic models work via model_json_schema():from pydantic import BaseModel, Field
from scrapegraph_py import ScrapeGraphAI
class Product(BaseModel):
name: str
price_usd: float = Field(description="Price in US dollars")
in_stock: bool
sgai = ScrapeGraphAI()
res = sgai.extract(
"Extract product details",
url="https://example.com/product/42",
schema=Product.model_json_schema(),
)
if res.status == "success":
product = Product.model_validate(res.data.json_data)
print(product.name, product.price_usd)
res = sgai.extract(
"Extract the author and publication date",
html="<html>...</html>", # or markdown="# Article\n..."
schema={"type": "object", "properties": {
"author": {"type": "string"},
"published_at": {"type": "string", "format": "date-time"},
}},
)
Firecrawl accepts a list of URLs or wildcards in one call. On ScrapeGraph, call
extract once per URL (run them concurrently) or use crawl.start to discover pages first and then extract from each.ScrapeGraph’s search supports the same query-and-limit pattern, plus optional LLM extraction in a single call (Firecrawl’s
scrapeOptions parameter).Firecrawl exposes
scrapeOptions to scrape each result; ScrapeGraph fuses search and structured extraction with a prompt + schema:res = sgai.search(
"open-source vector databases",
num_results=10,
prompt="Extract the project name, GitHub URL, and primary license",
schema={
"type": "object",
"properties": {
"name": {"type": "string"},
"github_url": {"type": "string"},
"license": {"type": "string"},
},
},
)
queryquery (positional)querylimitnum_results (1–20)numResultstbs (time filter)time_range="past_hour" | "past_24_hours" | "past_week" | "past_month" | "past_year"timeRangelocationlocation_geo_code (ISO country code)locationGeoCodescrapeOptions.formatsformat="markdown" | "html" + modeformat + modescrapeOptions (full page scrape)prompt + schema for inline extractionsources=["web","news","images"]time_range for recency)Firecrawl’s
crawl() blocks until completion; start_crawl() returns a job id. ScrapeGraph’s crawl is always async — start, then poll (or stop, resume, delete).from scrapegraph_py import MarkdownFormatConfig, JsonFormatConfig
start = sgai.crawl.start(
"https://docs.example.com",
max_depth=3,
max_pages=200,
formats=[
MarkdownFormatConfig(mode="reader"),
JsonFormatConfig(
prompt="Extract the page title and the list of code samples",
schema={
"type": "object",
"properties": {
"title": {"type": "string"},
"code_samples": {"type": "array", "items": {"type": "string"}},
},
},
),
],
)
limitmax_pages (1–1000, default 50)maxPagesmaxDepthmax_depth (default 2)maxDepthmaxDiscoveryDepthmax_depthincludePathsinclude_patterns (glob)includePatternsexcludePathsexclude_patterns (glob)excludePatternsallowExternalLinksallow_external (default false)allowExternalallowBackwardLinksmax_depthwebhookmonitor for deliveryscrapeOptions.formatsformats=[...]formats# Pause an in-flight crawl
sgai.crawl.stop(start.data.id)
# Resume it later
sgai.crawl.resume(start.data.id)
# Drop a finished crawl and free retained pages
sgai.crawl.delete(start.data.id)
Firecrawl’s
/map returns a list of URLs quickly. ScrapeGraph doesn’t have a one-shot map; use crawl.start with pattern filters and a shallow max_depth to discover URLs cheaply:from scrapegraph_py import LinksFormatConfig
start = sgai.crawl.start(
"https://example.com",
max_depth=1,
max_pages=500,
max_links_per_page=50,
include_patterns=["/docs/*", "/blog/*"],
formats=[LinksFormatConfig()], # cheapest format — just URL discovery
)
status = sgai.crawl.get(start.data.id)
urls = [p.url for p in status.data.pages]
For batch scraping a fixed list of URLs, fan out concurrent
scrape calls — the SDK’s AsyncScrapeGraphAI is the easiest path (see Async / concurrency below).Firecrawl ships change tracking as a
changeTracking format bolted onto scrape/crawl. ScrapeGraph makes monitoring a first-class resource with cron scheduling, webhook delivery, and a queryable activity log.sgai.monitor.list()sgai.monitor.list()sgai.monitor.get(cron_id)sgai.monitor.get(cronId)sgai.monitor.update(cron_id, interval="0 * * * *")sgai.monitor.update(cronId, { interval: ... })sgai.monitor.pause(cron_id) / .resume(cron_id)sgai.monitor.activity(cron_id)sgai.monitor.delete(cron_id)Each tick in
monitor.activity returns status, created_at, elapsed_ms, plus a changed flag and a diffs field when content has moved since the previous run — same job as Firecrawl’s git-diff mode, persisted by ScrapeGraph for you.The ScrapeGraph Python and JS SDKs wrap every response in an
ApiResult — no exceptions to catch on HTTP errors. Check status before reading data:result = sgai.extract("...", url="https://example.com")
if result.status == "success":
data = result.data.json_data
else:
print(f"Error: {result.error}")
const result = await sgai.extract({ url: "https://example.com", prompt: "..." });
if (result.status === "success") {
console.log(result.data?.json);
} else {
console.error(result.error);
}
Direct HTTP callers (curl, fetch) receive the unwrapped response body — the envelope is applied client-side by the SDKs.
status"success" | "error"dataT | Nonestatus == "success"errorstr | Nonestatus == "error"elapsed_ms (Py) / elapsedMs (JS)intFirecrawl raises exceptions on HTTP errors; ScrapeGraph returns a non-success
ApiResult. The HTTP status codes map cleanly:validation (with details[])auth_missing_keySGAI-APIKEY header missinginsufficient_creditsauth_invalid_keynot_foundrate_limitedinternal_errorA defensive wrapper looks the same as the one you wrote around Firecrawl, with one fewer
except branch:res = sgai.scrape("https://example.com", formats=[MarkdownFormatConfig()])
if res.status != "success":
raise RuntimeError(f"scrape failed: {res.error}")
markdown = res.data.results["markdown"]["data"][0]
Run your existing test suite and compare outputs. ScrapeGraph returns equivalent data structures — the main differences are:
ApiResult envelope in the SDKs (no exceptions on error)crawl.start / crawl.get flow (always async)monitor resource in place of change-tracking formatsfetchConfig (declarative) in place of actions (imperative)from firecrawl import Firecrawl
from scrapegraph_py import ScrapeGraphAI, MarkdownFormatConfig
fc = Firecrawl()
sgai = ScrapeGraphAI()
URL = "https://example.com"
fc_md = fc.scrape(URL, formats=["markdown"]).markdown
sgai_md = sgai.scrape(URL, formats=[MarkdownFormatConfig()]).data.results["markdown"]["data"][0]
print("len(firecrawl)=", len(fc_md), "len(scrapegraph)=", len(sgai_md))
Quick cURL sanity check
ApiResult envelope on the raw HTTP endpoint — the SDKs add that client-side):
Common gotchas
- Header name. It’s
SGAI-APIKEY: sgai-..., notAuthorization: Bearer .... Watch for proxies that normalize header casing — the API tolerates any case, but some HTTP libraries strip non-standard headers in redirects. schemafield name in Python.JsonFormatConfigandextractuseschema=(the field is internally aliased fromschema_to avoid shadowing theBaseModel.schemamethod — passschema=from your code and it works).- No
actionsarray. If you relied onclick/type/pressactions, you’ll need to either split the flow into two scrapes (one to trigger a navigation that produces a stable URL, one to scrape the result) or contact support about the upcominginteractionsAPI. - Crawl is always async. There is no blocking
sgai.crawl(...)— callcrawl.startand poll, or pass awebhookUrlvia amonitorinstead. changeTrackingis gone as a format. Usemonitor.create— it gets you cron scheduling, persistent history, and webhook delivery in one resource.- Response shape per format. Each requested format lives under
results[<format>].data(always an array). For most formats the array has one element; forlinksandimagesit’s the full list. numResultscaps at 20 for search. Firecrawl’slimitaccepts higher values — split the query (e.g., bytimeRange) if you need more.

