Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The official n8n-nodes-scrapegraphai community node exposes the full v2 API as a single node with seven resources: Scrape, Extract, Search, Crawl, Monitor, History, and Credit. Drop it into any n8n workflow, point it at a URL, and you get markdown, structured JSON, screenshots, or a recurring monitor — wired into the rest of your stack via the 400+ nodes n8n already ships with.

Package on npm

n8n-nodes-scrapegraphai

Source on GitHub

Issues, PRs, and the changelog

Installation

Inside your n8n instance, open Settings → Community Nodes → Install and enter:
n8n-nodes-scrapegraphai
Acknowledge the risks prompt and install. The node appears as ScrapeGraphAI in the node panel.
Self-hosted n8n only — n8n Cloud does not yet allow community nodes. If you don’t have a host, follow the self-hosting guide.

Credentials

Add a new ScrapeGraphAI API credential and paste your API key. n8n will hit GET /api/credits to verify the key — a green banner confirms it works.
Get your API key from the ScrapeGraphAI dashboard.

What’s in the node

ResourceOperationsWhat it does
ScrapescrapeFetch a page in markdown, HTML, JSON (AI-extracted), screenshot, links, summary, branding, or any combination
ExtractextractRun a natural-language prompt over a URL, raw HTML, or markdown — optional JSON schema
SearchsearchAI web search with inline content; optional rollup prompt across results
Crawlstart, getStatus, stop, resume, deleteAsync multi-page crawls with patterns, depth, and per-page formats
Monitorcreate, list, get, update, pause, resume, delete, activityCron-scheduled fetches with diff detection and webhooks
Historyget, listLook up past results by scrapeRefId — used to fetch full content for crawled pages
CreditgetCheck remaining credits and plan
Every content-producing operation (Scrape / Extract / Search) exposes an Output parameter with three modes — Simplified, Raw, or Selected Fields — so the response shape stays predictable when chained into AI Agent tools or downstream nodes.

Example: crawl a site, save every page to Airtable

This walkthrough uses Crawl to discover pages, History to fetch each page’s full content, and an Airtable node to land the rows. The same pattern works for Notion, Google Sheets, Postgres, S3 — anywhere n8n can write. Full n8n workflow canvas: Manual Trigger → Crawl Start → Wait → Crawl Status → Split Out → History Get → Airtable

1. Crawl → Start

Kick off the crawl. The node returns a cronId (the crawl job ID) which the rest of the workflow chases. Crawl Start node parameters: URL, formats, max pages, max depth
FieldValue
ResourceCrawl
OperationStart
URLhttps://scrapegraphai.com/
Formatsone entry, Markdown (mode Normal)
Max Pages6
Max Depth2

2. Wait

Add a Wait node (~60 seconds). Crawls are asynchronous — give the worker time to fetch a few pages before polling.

3. Crawl → Get Status

Pull the job state. When status is completed (or partial), the response includes a pages array with one entry per crawled page — each carrying the page URL, depth, title, and a scrapeRefId pointer to the stored result. Crawl Get Status node parameters with the Resource Locator filled by an expression
FieldValue
ResourceCrawl
OperationGet Status
Crawl ID={{ $('ScrapegraphAI').item.json.id }} (Resource Locator, expression)

4. Split Out

Split the pages array into one item per page so the next node runs once per crawled URL. Split Out node configured to fan out the pages array
FieldValue
Field To Split Outpages

5. History → Get

For each page, fetch the full content (markdown, HTML, JSON — whatever formats the crawl captured) using the scrapeRefId from Split Out. History Get node with the scrapeRefId expression in the Entry Resource Locator
FieldValue
ResourceHistory
OperationGet
Entry={{ $json.scrapeRefId }} (Resource Locator, expression)
Simplifyoff

6. Airtable → Create

Map the page metadata + content into a row. Switch the Base and Table dropdowns to By ID mode and paste your IDs, then map fields with expressions: Airtable node parameters with five mapped column expressions
ColumnExpression
URL={{ $('Split Out').item.json.url }}
Title={{ $('Split Out').item.json.title }}
Depth={{ $('Split Out').item.json.depth }}
ContentType={{ $json.metadata.contentType }}
Markdown={{ $json.result.results.markdown.data[0] }}

7. Run it

Hit Test workflow. The node fires once per crawled page and writes a row each time: Airtable base populated with one row per crawled page

Output modes for AI Agent tools

When you attach the node as a tool to an n8n AI Agent, the Output parameter on Scrape / Extract / Search becomes load-bearing:
  • Simplified — flattened response with the most useful top-level fields (id, json, results, usage, …). Easiest for an LLM to reason over.
  • Raw — the full v2 API response, untouched.
  • Selected Fields — comma-separated allowlist of top-level keys.
Pick the mode that matches what your agent needs to see.

Patterns that carry over

PatternResource(s)Notes
One-shot fetchScrapeUse formats=[{type:"markdown"}] for the cheapest pass
Structured extractionExtract or Scrape with JSON formatJSON schema is optional but locks the shape
Multi-page archiveCrawl + History (this guide)History → Get is how you retrieve the bytes a crawl captured
Recurring fetch with diffMonitorWire the webhookUrl field to an n8n Webhook node for instant deltas
AI search rollupSearch with promptSingle-call alternative to “search → scrape each result → summarize”

Troubleshooting

  • Unknown field name: "id" from Airtable — your column names don’t match. Switch the Airtable node’s mapping to Map Each Column Manually and only fill the columns that exist in your table.
  • Crawl Get Status returns pages: [] — the crawl is still running. Increase the Wait duration or poll until status === "completed".
  • History Get returns an old resultscrapeRefId always points to the latest result for that pointer. Trigger a fresh crawl to refresh.
  • Credentials test fails — confirm the key is from the v2 dashboard. The node calls https://v2-api.scrapegraphai.com/api/credits; v1 keys won’t validate.

Resources

GitHub repo

Source code, issue tracker, and release notes

n8n Community Nodes

How to install and trust community nodes in n8n

API Reference

Full v2 endpoint reference — every parameter the node sends

Dashboard

Get an API key and check usage