Skip to main content

ScrapeGraph MCP Server

License: MIT Python 3.13+ smithery badge A production‑ready Model Context Protocol (MCP) server that connects LLMs to the ScrapeGraph AI API for AI‑powered web scraping, research, and crawling.

⭐ Star us on GitHub

If this server is helpful, a star goes a long way. Thanks!

Key Features

  • Full v2 API coverage: scrape, extract, search, crawl (+ stop/resume), monitor lifecycle (+ activity polling), credits, history, and schema generation
  • Uses the v2 API base URL (https://api.scrapegraphai.com/api/v2) with the SGAI-APIKEY header — wire format matches scrapegraph-py v2
  • Remote HTTP MCP endpoint and local Python server support
  • Works with Cursor, Claude Desktop, and any MCP‑compatible client
  • Robust error handling, timeouts, and production‑tested reliability
The MCP server is now on v2 (scrapegraph-mcp@2.0.0). The v1 tools sitemap, agentic_scrapper, markdownify_status, and smartscraper_status have been removed. See scrapegraph-mcp#16 for the migration details.

Get Your API Key

Create an account and copy your API key from the ScrapeGraph Dashboard.
Endpoint:
https://mcp.scrapegraphai.com/mcp
Follow the instructions below:

Cursor (HTTP MCP)

Add this to your Cursor MCP settings (~/.cursor/mcp.json):
{
  "mcpServers": {
    "scrapegraph-mcp": {
      "url": "https://mcp.scrapegraphai.com/mcp",
      "headers": {
        "X-API-Key": "YOUR_API_KEY"
      }
    }
  }
}

Claude Desktop (via mcp-remote)

Claude Desktop connects to HTTP MCP via a lightweight proxy. Add the following to ~/Library/Application Support/Claude/claude_desktop_config.json on macOS (adjust path on Windows):
{
  "mcpServers": {
    "scrapegraph-mcp": {
      "command": "npx",
      "args": [
        "mcp-remote@0.1.25",
        "https://mcp.scrapegraphai.com/mcp",
        "--header",
        "X-API-Key:YOUR_API_KEY"
      ]
    }
  }
}

Smithery (optional)

npx -y @smithery/cli install @ScrapeGraphAI/scrapegraph-mcp --client claude

Local Usage (Python)

Prefer running locally? Install and wire the server via stdio.

Install

pip install -e .
# or
uv pip install -e .
Set your key:
# macOS/Linux
export SGAI_API_KEY=your-api-key-here
# Windows (PowerShell)
$env:SGAI_API_KEY="your-api-key-here"

Run the server

scrapegraph-mcp
# or
python -m scrapegraph_mcp.server


Configuration

The server reads the ScrapeGraph API key from SGAI_API_KEY (local) or the X-API-Key header (remote). Environment variables align 1:1 with the Python SDK:
VariableDescriptionDefault
SGAI_API_KEYScrapeGraph API key
SGAI_API_URLOverride the v2 API base URLhttps://api.scrapegraphai.com/api/v2
SGAI_TIMEOUTRequest timeout in seconds120
SCRAPEGRAPH_API_BASE_URLLegacy alias for SGAI_API_URL (still honored)
SGAI_TIMEOUT_SLegacy alias for SGAI_TIMEOUT (still honored)

Available Tools

The server exposes the full v2 API surface.

Content tools

All content tools accept the same FetchConfig passthrough parameters: mode (auto | fast | js), stealth, timeout, wait, scrolls, country, headers, cookies, mock.

markdownify

Convert a webpage to clean markdown (wraps v2 POST /scrape with a markdown format entry).
markdownify(website_url: str, **fetch_config)

scrape

Fetch a URL via v2 POST /scrape with a single format entry.
scrape(
  website_url: str,
  output_format: str = "markdown",   # markdown | html | screenshot | branding | links | images | summary
  screenshot_full_page: bool = False,
  content_type: str | None = None,
  **fetch_config,
)

smartscraper

AI‑powered structured extraction (v2 POST /extract).
smartscraper(
  user_prompt: str,
  website_url: str,
  output_schema: dict | str | None = None,
  **fetch_config,
)

searchscraper

Search the web and optionally extract structured results (v2 POST /search).
searchscraper(
  user_prompt: str,                   # maps to the v2 `query` field
  num_results: int | None = None,     # 1–20
  output_schema: dict | str | None = None,
  prompt: str | None = None,          # required when output_schema is set
  country_search: str | None = None,  # locationGeoCode (e.g. "us", "it")
  time_range: str | None = None,      # past_hour | past_24_hours | past_week | past_month | past_year
  search_format: str = "markdown",    # markdown | html
  search_mode: str = "prune",         # prune | normal
  **fetch_config,
)

generate_schema

Generate or augment a JSON Schema from a prompt (v2 POST /schema).
generate_schema(
  prompt: str,
  existing_schema: dict | str | None = None,
  model: str | None = None,
)

Crawl tools

smartcrawler_initiate

Start a multi‑page crawl. extraction_mode defaults to markdown (also: html, links, images, summary, branding, screenshot).
smartcrawler_initiate(
  url: str,
  extraction_mode: str = "markdown",   # markdown | html | links | images | summary | branding | screenshot
  depth: int | None = None,            # v2 maxDepth
  max_pages: int | None = None,
  max_links_per_page: int | None = None,
  allow_external: bool = False,
  include_patterns: list[str] | None = None,
  exclude_patterns: list[str] | None = None,
  content_types: list[str] | None = None,
  # FetchConfig passthrough
  mode: str | None = None,             # auto | fast | js
  stealth: bool | None = None,
  timeout: int | None = None,
  wait: int | None = None,
  scrolls: int | None = None,
  country: str | None = None,
  headers: dict | None = None,
  cookies: dict | None = None,
)

smartcrawler_fetch_results

Poll status / results for a crawl.
smartcrawler_fetch_results(request_id: str)

crawl_stop

Stop a running crawl.
crawl_stop(request_id: str)

crawl_resume

Resume a paused / stopped crawl.
crawl_resume(request_id: str)

Monitor tools

Replace v1 “scheduled jobs”. monitor_create wraps the supplied prompt (+ optional output_schema) into a v2 {type: "json", ...} format entry for you.
monitor_create(
  url: str,
  prompt: str,                      # what to extract on each run
  interval: str,                    # 5-field cron expression
  name: str | None = None,
  webhook_url: str | None = None,
  output_schema: dict | str | None = None,
  **fetch_config,
)
monitor_list()
monitor_get(monitor_id: str)
monitor_pause(monitor_id: str)
monitor_resume(monitor_id: str)
monitor_delete(monitor_id: str)
monitor_activity(
  monitor_id: str,
  limit: int | None = None,         # 1–100, default 20
  cursor: str | None = None,        # pagination cursor
)
monitor_activity returns the tick history (id, createdAt, status, changed, elapsedMs, diffs) plus a nextCursor when more results are available — mirrors sgai.monitor.activity() in the SDKs.

Account tools

credits

Get the remaining credit balance.
credits()

sgai_history

Browse paginated request history, optionally filtered by service.
sgai_history(
  service: str | None = None,   # scrape | extract | search | monitor | crawl
  page: int | None = None,
  limit: int | None = None,
)

Troubleshooting

  • Verify your key is present in config (X-API-Key for remote, SGAI_API_KEY for local).
  • Claude Desktop logs:
    • macOS: ~/Library/Logs/Claude/
    • Windows: %APPDATA%\\Claude\\Logs\\
  • If a long crawl is “still running”, keep polling smartcrawler_fetch_results.

License

MIT License – see LICENSE file for details.