Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The official n8n-nodes-scrapegraphai community node exposes the full v2 API as a single node with seven resources: Scrape, Extract, Search, Crawl, Monitor, History, and Credit. Drop it into any n8n workflow, point it at a URL, and you get markdown, structured JSON, screenshots, or a recurring monitor — wired into the rest of your stack via the 400+ nodes n8n already ships with.

Package on npm

n8n-nodes-scrapegraphai

Source on GitHub

Issues, PRs, and the changelog

Installation

Inside your n8n instance, open Settings → Community Nodes → Install and enter:
n8n-nodes-scrapegraphai
Acknowledge the risks prompt and install. The node appears as ScrapeGraphAI in the node panel.
Self-hosted n8n only — n8n Cloud does not yet allow community nodes. If you don’t have a host, follow the self-hosting guide.

Credentials

Add a new ScrapeGraphAI API credential and paste your API key. n8n will hit GET /api/credits to verify the key — a green banner confirms it works.
Get your API key from the ScrapeGraphAI dashboard.

What’s in the node

ResourceOperationsWhat it does
ScrapescrapeFetch a page in markdown, HTML, JSON (AI-extracted), screenshot, links, summary, branding, or any combination
ExtractextractRun a natural-language prompt over a URL, raw HTML, or markdown — optional JSON schema
SearchsearchAI web search with inline content; optional rollup prompt across results
Crawlstart, getStatus, stop, resume, deleteAsync multi-page crawls with patterns, depth, per-page formats, MIME-type filters, and an external-link toggle
Monitorcreate, list, get, update, pause, resume, delete, activityCron-scheduled fetches with diff detection and webhooks
Historyget, listLook up past results by scrapeRefId — used to fetch full content for crawled pages
CreditgetCheck remaining credits and plan
Every content-producing operation (Scrape / Extract / Search) exposes an Output parameter with three modes — Simplified, Raw, or Selected Fields — so the response shape stays predictable when chained into AI Agent tools or downstream nodes.

Tour the modules

Drop a ScrapeGraphAI node onto the canvas, pick a credential, and the Resource dropdown gives you everything the v2 API exposes:
ScrapeGraphAI node with the Resource dropdown open showing all seven resources
The rest of this section walks through each resource with its key fields visible.

Scrape

Fetch a page in one or more formats — markdown, HTML, JSON (AI extraction), screenshot, links, summary, or branding. Add as many Format rows as you need; each one carries its own per-format options.
Scrape node with URL filled and a Markdown format row added
FieldNotes
URLThe page to fetch
FormatsAdd one row per output format. Each format exposes its own sub-options (Mode for markdown/HTML, Prompt+Schema for JSON, Full Page/Width/Height/Quality for screenshots).
Content TypeOptional MIME-type hint for the fetcher
Fetch ConfigSee Fetch Config below

Extract

Run a natural-language prompt over a URL, raw HTML, or markdown. Toggle Use JSON Schema to constrain the output shape.
Extract node with Source = URL, an Amazon URL, and a prompt
FieldNotes
SourceURL, HTML, or Markdown — picks the input mode
PromptWhat you want extracted, in plain English
Use JSON SchemaToggle on to paste a JSON schema and lock the output shape
HTML ModeNormal, Reader, or Prune — controls how the page HTML is preprocessed before extraction
Run an AI-powered web search and get the top results with content already fetched. Toggle Use AI Rollup to summarise across all results in one call.
Search node with a query and three results, Markdown format
FieldNotes
QueryThe search query
Number of Results1–20
Result FormatMarkdown or HTML for each result’s inline content
Use AI RollupToggle on to add a Prompt (and optional schema) that runs across the fetched results
Time RangeFilter to past hour / day / week / month / year
Location (Country Code)52 curated ISO codes for geo-targeted results

Crawl

Asynchronous multi-page crawl with five operations:
Operation dropdown on the Crawl resource showing Start, Get Status, Stop, Resume, Delete
Start kicks off a crawl and returns a job ID — the other ops drive the lifecycle (poll, halt, resume, clean up).
Crawl Start node with URL, Markdown format, Max Pages 50, Max Depth 2
FieldNotes
URLStarting URL
FormatsSame multi-format model as Scrape — every crawled page is captured in each format you add
Max PagesDefault 50, max 1000
Max DepthDefault 2
Max Links per PageDefault 10
Allow External LinksOff by default — keeps the crawl on the starting domain
Include / Exclude PatternsGlob-style URL filters
Content TypesOptional MIME-type filter (HTML, PDF, DOCX, images, …)

Monitor

Cron-scheduled fetches with diff detection and webhook delivery. Eight operations cover the whole monitor lifecycle:
Operation dropdown on the Monitor resource showing Create, Delete, Get, Get Activity, Get Many, Pause, Resume, Update
Create schedules a recurring fetch; Get Activity returns recent ticks with diff flags so you can react to changes.
Monitor Create node with URL, Name, cron interval, and a Markdown format
FieldNotes
URLPage to monitor
NameHuman label for the monitor
Interval (Cron)Standard 5-field cron expression — e.g. */30 * * * * for every 30 minutes
FormatsSame multi-format model — each tick captures all configured formats
Webhook URLOptional. Wire to an n8n Webhook node for instant delta notifications.

History

Look up past results by scrapeRefId. Used to retrieve full content for crawled pages (Crawl returns pointers, History fetches the bytes).
History Get node with the Entry Resource Locator set to By ID
FieldNotes
OperationGet (single entry by ID) or Get Many (paginated list)
EntryResource Locator — paste an ID directly, or use an expression like ={{ $json.scrapeRefId }}
SimplifyToggle off to get the full v2 response payload

Credit

Quick check on remaining credits and current plan. Zero-config — pick the resource, hit Test step.
Credit Get node — only Resource and Operation selectors

Example workflow: crawl a site, save every page to Airtable

End-to-end walkthrough that chains Crawl → Wait → Crawl Status → Split Out → History → Airtable. The same pattern works for Notion, Google Sheets, Postgres, S3 — anywhere n8n can write. Full n8n workflow canvas: Manual Trigger → Crawl Start → Wait → Crawl Status → Split Out → History Get → Airtable

1. Crawl → Start

Kick off the crawl. The node returns a cronId (the crawl job ID) which the rest of the workflow chases.
FieldValue
ResourceCrawl
OperationStart
URLhttps://scrapegraphai.com/
Formatsone entry, Markdown (mode Normal)
Max Pages6
Max Depth2

2. Wait

Add a Wait node (~60 seconds). Crawls are asynchronous — give the worker time to fetch a few pages before polling.

3. Crawl → Get Status

Pull the job state. When status is completed (or partial), the response includes a pages array with one entry per crawled page — each carrying the page URL, depth, title, and a scrapeRefId pointer to the stored result. Crawl Get Status node parameters with the Resource Locator filled by an expression
FieldValue
ResourceCrawl
OperationGet Status
Crawl ID={{ $('ScrapegraphAI').item.json.id }} (Resource Locator, expression)

4. Split Out

Split the pages array into one item per page so the next node runs once per crawled URL. Split Out node configured to fan out the pages array
FieldValue
Field To Split Outpages

5. History → Get

For each page, fetch the full content (markdown, HTML, JSON — whatever formats the crawl captured) using the scrapeRefId from Split Out.
FieldValue
ResourceHistory
OperationGet
Entry={{ $json.scrapeRefId }} (Resource Locator, expression)
Simplifyoff

6. Airtable → Create

Map the page metadata + content into a row. Switch the Base and Table dropdowns to By ID mode and paste your IDs, then map fields with expressions: Airtable node parameters with five mapped column expressions
ColumnExpression
URL={{ $('Split Out').item.json.url }}
Title={{ $('Split Out').item.json.title }}
Depth={{ $('Split Out').item.json.depth }}
ContentType={{ $json.metadata.contentType }}
Markdown={{ $json.result.results.markdown.data[0] }}

7. Run it

Hit Test workflow. The node fires once per crawled page and writes a row each time: Airtable base populated with one row per crawled page

Output modes for AI Agent tools

When you attach the node as a tool to an n8n AI Agent, the Output parameter on Scrape / Extract / Search becomes load-bearing:
  • Simplified — flattened response with the most useful top-level fields (id, json, results, usage, …). Easiest for an LLM to reason over.
  • Raw — the full v2 API response, untouched.
  • Selected Fields — comma-separated allowlist of top-level keys.
Pick the mode that matches what your agent needs to see.

Patterns that carry over

PatternResource(s)Notes
One-shot fetchScrapeUse formats=[{type:"markdown"}] for the cheapest pass
Structured extractionExtract or Scrape with JSON formatJSON schema is optional but locks the shape
Multi-page archiveCrawl + History (this guide)History → Get is how you retrieve the bytes a crawl captured
Recurring fetch with diffMonitorWire the webhookUrl field to an n8n Webhook node for instant deltas
AI search rollupSearch with promptSingle-call alternative to “search → scrape each result → summarize”

Fetch Config

Five resources — Scrape, Extract, Search, Crawl, and Monitor — expose an optional Fetch Config collection that controls how each page is fetched. Open the dropdown on any of those operations to surface the eight knobs:
Fetch Config dropdown on the Search node showing the eight available options
FieldDescription
ModeFetch mode — Auto (default), Fast (skips JS rendering), or JS (executes scripts)
StealthResidential proxy + anti-bot headers. Adds 5 credits per call
CountryTwo-letter ISO country code for geo-targeted proxy (e.g. us, de, jp)
Wait (Ms)Milliseconds to wait after page load (0–30000)
Timeout (Ms)Request timeout in milliseconds (1000–60000)
ScrollsNumber of page scrolls to trigger lazy-loaded content (0–100)
Headers (JSON)Custom HTTP headers as a JSON object string
Cookies (JSON)Cookies as a JSON object string
Reach for Stealth + Mode = JS + Wait = 2000–5000 when a site blocks bots or only renders content after JavaScript runs. Combine with Country for region-locked pages.

Troubleshooting

  • Unknown field name: "id" from Airtable — your column names don’t match. Switch the Airtable node’s mapping to Map Each Column Manually and only fill the columns that exist in your table.
  • Crawl Get Status returns pages: [] — the crawl is still running. Increase the Wait duration or poll until status === "completed".
  • History Get returns an old resultscrapeRefId always points to the latest result for that pointer. Trigger a fresh crawl to refresh.
  • Credentials test fails — confirm the key is from the v2 dashboard. The node calls https://v2-api.scrapegraphai.com/api/credits; v1 keys won’t validate.

Resources

GitHub repo

Source code, issue tracker, and release notes

n8n Community Nodes

How to install and trust community nodes in n8n

API Reference

Full v2 endpoint reference — every parameter the node sends

Dashboard

Get an API key and check usage