ScrapeGraphAI MCP Server

Use ScrapeGraphAPI’s capabilities through the Model Context Protocol A Model Context Protocol (MCP) server implementation that integrates ScrapeGraphAPI for web scraping capabilities. Our MCP server is open-source and available on GitHub.

⭐ Star us on GitHub

Love our MCP server? Show your support by starring our repository! It helps us grow and improve.

Features

Web scraping, crawling, and discovery
Search and content extraction
Deep research and batch scraping
Cloud and self-hosted support
SSE support

Installation

You can either use our remote hosted URL or run the server locally. Get your API key from ScrapeGraphAI Dashboard

Remote hosted URL

https://smithery.ai/server/@ScrapeGraphAI/scrapegraph-mcp

Running on Cursor

Add ScrapeGraphAI MCP server to Cursor

Manual Installation

Configuring Cursor 🖥️ Note: Requires Cursor version 0.45.6+ For the most up-to-date configuration instructions, please refer to the official Cursor documentation on configuring MCP servers: Cursor MCP Server Configuration Guide To configure ScrapeGraphAI MCP in Cursor v0.48.6

Open Cursor Settings
Go to Features > MCP Servers
Click ”+ Add new global MCP server”
Enter the following code:

First, install the Python package:

pip install scrapegraph-mcp

{
  "mcpServers": {
    "scrapegraph-mcp": {
      "command": "python3",
      "args": ["-m", "scrapegraph_mcp.server"],
      "env": {
        "SGAI_API_KEY": "YOUR-API-KEY"
      }
    }
  }
}

To configure ScrapeGraphAI MCP in Cursor v0.45.6

Open Cursor Settings
Go to Features > MCP Servers
Click ”+ Add New MCP Server”
Enter the following:
- Name: “scrapegraph-mcp” (or your preferred name)
- Type: “command”
- Command: env SGAI_API_KEY=your-api-key python3 -m scrapegraph_mcp.server

Make sure the Python package is installed:

pip install scrapegraph-mcp

If you are using Windows and are running into issues, try cmd /c "set SGAI_API_KEY=your-api-key && python -m scrapegraph_mcp.server"

Replace your-api-key with your ScrapeGraphAI API key. If you don’t have one yet, you can create an account and get it from ScrapeGraphAI Dashboard After adding, refresh the MCP server list to see the new tools. The Composer Agent will automatically use ScrapeGraphAI MCP when appropriate, but you can explicitly request it by describing your web scraping needs. Access the Composer via Command+L (Mac), select “Agent” next to the submit button, and enter your query.

Local Python MCP (self-hosted) in Cursor

Install the Python package:

pip install scrapegraph-mcp

Add this configuration to your global Cursor MCP settings (recommended path: ~/.cursor/mcp.json):

{
  "mcpServers": {
    "local sgai": {
      "command": "python3",
      "args": [
        "-m",
        "scrapegraph_mcp.server"
      ],
      "env": {
        "SGAI_API_KEY": "YOUR_API_KEY"
      }
    }
  }
}

Replace YOUR_API_KEY with your ScrapeGraphAI API key.
You can point command to an absolute Python path (e.g., a pyenv or venv shim) if needed.
The local server uses the SGAI_API_KEY environment variable (different from the cloud SCRAPEGRAPH_API_KEY).

Alternative: Cursor config using CLI entry point

If the scrapegraph-mcp CLI is on your PATH (installed by pip install scrapegraph-mcp), you can configure Cursor like this:

{
  "mcpServers": {
    "scrapegraph-mcp": {
      "command": "scrapegraph-mcp",
      "env": {
        "SGAI_API_KEY": "YOUR_API_KEY"
      }
    }
  }
}

Running on Windsurf

Add this to your ./codeium/windsurf/model_config.json: First, install the Python package:

pip install scrapegraph-mcp

{
  "mcpServers": {
    "mcp-server-scrapegraph": {
      "command": "python3",
      "args": ["-m", "scrapegraph_mcp.server"],
      "env": {
        "SGAI_API_KEY": "YOUR_API_KEY"
      }
    }
  }
}

Running on VS Code

For one-click installation, click one of the install buttons below… Install in VS Code Install in VS Code Insiders For manual installation, add the following JSON block to your User Settings (JSON) file in VS Code. You can do this by pressing Ctrl + Shift + P and typing Preferences: Open User Settings (JSON). First, install the Python package:

pip install scrapegraph-mcp

{
  "mcp": {
    "inputs": [
      {
        "type": "promptString",
        "id": "apiKey",
        "description": "ScrapeGraphAI API Key",
        "password": true
      }
    ],
    "servers": {
      "scrapegraph": {
        "command": "python3",
        "args": ["-m", "scrapegraph_mcp.server"],
        "env": {
          "SGAI_API_KEY": "${input:apiKey}"
        }
      }
    }
  }
}

Optionally, you can add it to a file called .vscode/mcp.json in your workspace. This will allow you to share the configuration with others:

{
  "inputs": [
    {
      "type": "promptString",
      "id": "apiKey",
      "description": "ScrapeGraphAI API Key",
      "password": true
    }
  ],
  "servers": {
    "scrapegraph": {
      "command": "python3",
      "args": ["-m", "scrapegraph_mcp.server"],
      "env": {
        "SGAI_API_KEY": "${input:apiKey}"
      }
    }
  }
}

Running on Claude Desktop

Add this to the Claude config file: First, install the Python package:

pip install scrapegraph-mcp

{
  "mcpServers": {
    "local sgai": {
      "command": "python3",
      "args": [
        "-m",
        "scrapegraph_mcp.server"
      ],
      "env": {
        "SGAI_API_KEY": "YOUR_KEY"
      }
    }
  }
}

Alternative: Claude config using CLI entry point

If the scrapegraph-mcp CLI is on your PATH, you can use:

{
  "mcpServers": {
    "local sgai": {
      "command": "scrapegraph-mcp",
      "env": {
        "SGAI_API_KEY": "YOUR_KEY"
      }
    }
  }
}

Configuration

Environment Variables

Required for Cloud API

SCRAPEGRAPH_API_KEY: Your ScrapeGraphAI API key
- Required when using cloud API (default)
- Optional when using self-hosted instance with SCRAPEGRAPH_API_URL
SCRAPEGRAPH_API_URL (Optional): Custom API endpoint for self-hosted instances
- Example: http://localhost:8000 for local development

Optional Configuration

SSE_LOCAL: Set to true to run in SSE mode locally
LOG_LEVEL: Set logging level (debug, info, warn, error)
PORT: Custom port for SSE mode (default: 3000)

Configuration Examples

Basic Configuration

export SCRAPEGRAPH_API_KEY=sg-your-api-key-here

System Configuration

The server automatically handles:

API key validation
Rate limiting
Error handling
Connection management

Rate Limiting and Batch Processing

The server utilizes ScrapeGraphAI’s built-in rate limiting and batch processing capabilities:

Automatic rate limit handling with exponential backoff
Efficient parallel processing for batch operations
Smart request queuing and throttling
Automatic retries for transient errors

Available Tools

1. Scrape Tool (`scrapegraph_scrape`)

Scrape content from a single URL with advanced options.

{
  "name": "scrapegraph_scrape",
  "arguments": {
    "url": "https://example.com",
    "formats": ["markdown"],
    "onlyMainContent": true,
    "waitFor": 1000,
    "timeout": 30000,
    "mobile": false,
    "includeTags": ["article", "main"],
    "excludeTags": ["nav", "footer"],
    "skipTlsVerification": false
  }
}

2. Batch Scrape Tool (`scrapegraph_batch_scrape`)

Scrape multiple URLs efficiently with built-in rate limiting and parallel processing.

{
  "name": "scrapegraph_batch_scrape",
  "arguments": {
    "urls": ["https://example1.com", "https://example2.com"],
    "options": {
      "formats": ["markdown"],
      "onlyMainContent": true
    }
  }
}

Response includes operation ID for status checking:

{
  "content": [
    {
      "type": "text",
      "text": "Batch operation queued with ID: batch_1. Use scrapegraph_check_batch_status to check progress."
    }
  ],
  "isError": false
}

3. Check Batch Status (`scrapegraph_check_batch_status`)

Check the status of a batch operation.

{
  "name": "scrapegraph_check_batch_status",
  "arguments": {
    "id": "batch_1"
  }
}

4. Search Tool (`scrapegraph_search`)

Search the web and optionally extract content from search results.

{
  "name": "scrapegraph_search",
  "arguments": {
    "query": "your search query",
    "limit": 5,
    "lang": "en",
    "country": "us",
    "scrapeOptions": {
      "formats": ["markdown"],
      "onlyMainContent": true
    }
  }
}

5. Crawl Tool (`scrapegraph_crawl`)

Start an asynchronous crawl with advanced options.

{
  "name": "scrapegraph_crawl",
  "arguments": {
    "url": "https://example.com",
    "maxDepth": 2,
    "limit": 100,
    "allowExternalLinks": false,
    "deduplicateSimilarURLs": true
  }
}

6. Extract Tool (`scrapegraph_extract`)

Extract structured information from web pages using LLM capabilities. Supports both cloud AI and self-hosted LLM extraction.

{
  "name": "scrapegraph_extract",
  "arguments": {
    "urls": ["https://example.com/page1", "https://example.com/page2"],
    "prompt": "Extract product information including name, price, and description",
    "systemPrompt": "You are a helpful assistant that extracts product information",
    "schema": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "price": { "type": "number" },
        "description": { "type": "string" }
      },
      "required": ["name", "price"]
    },
    "allowExternalLinks": false,
    "enableWebSearch": false,
    "includeSubdomains": false
  }
}

Example response:

{
  "content": [
    {
      "type": "text",
      "text": {
        "name": "Example Product",
        "price": 99.99,
        "description": "This is an example product description"
      }
    }
  ],
  "isError": false
}

Extract Tool Options:

urls: Array of URLs to extract information from
prompt: Custom prompt for the LLM extraction
systemPrompt: System prompt to guide the LLM
schema: JSON schema for structured data extraction
allowExternalLinks: Allow extraction from external links
enableWebSearch: Enable web search for additional context
includeSubdomains: Include subdomains in extraction

When using a self-hosted instance, the extraction will use your configured LLM. For cloud API, it uses ScrapeGraphAI’s managed LLM service.

7. Deep Research Tool (`scrapegraph_deep_research`)

Conduct deep web research on a query using intelligent crawling, search, and LLM analysis.

{
  "name": "scrapegraph_deep_research",
  "arguments": {
    "query": "how does carbon capture technology work?",
    "maxDepth": 3,
    "timeLimit": 120,
    "maxUrls": 50
  }
}

Arguments:

query (string, required): The research question or topic to explore.
maxDepth (number, optional): Maximum recursive depth for crawling/search (default: 3).
timeLimit (number, optional): Time limit in seconds for the research session (default: 120).
maxUrls (number, optional): Maximum number of URLs to analyze (default: 50).

Returns:

Final analysis generated by an LLM based on research. (data.finalAnalysis)
May also include structured activities and sources used in the research process.

8. Generate LLMs.txt Tool (`scrapegraph_generate_llmstxt`)

Generate a standardized llms.txt (and optionally llms-full.txt) file for a given domain. This file defines how large language models should interact with the site.

{
  "name": "scrapegraph_generate_llmstxt",
  "arguments": {
    "url": "https://example.com",
    "maxUrls": 20,
    "showFullText": true
  }
}

Arguments:

url (string, required): The base URL of the website to analyze.
maxUrls (number, optional): Max number of URLs to include (default: 10).
showFullText (boolean, optional): Whether to include llms-full.txt contents in the response.

Returns:

Generated llms.txt file contents and optionally the llms-full.txt (data.llmstxt and/or data.llmsfulltxt)

Logging System

The server includes comprehensive logging:

Operation status and progress
Performance metrics
Credit usage monitoring
Rate limit tracking
Error conditions

Example log messages:

[INFO] ScrapeGraphAI MCP Server initialized successfully
[INFO] Starting scrape for URL: https://example.com
[INFO] Batch operation queued with ID: batch_1
[WARNING] Credit usage has reached warning threshold
[ERROR] Rate limit exceeded, retrying in 2s...

Error Handling

The server provides robust error handling:

Automatic retries for transient errors
Rate limit handling with backoff
Detailed error messages
Credit usage warnings
Network resilience

Example error response:

{
  "content": [
    {
      "type": "text",
      "text": "Error: Rate limit exceeded. Retrying in 2 seconds..."
    }
  ],
  "isError": true
}

Development

# Install dependencies
npm install

# Build
npm run build

# Run tests
npm test

Contributing

Fork the repository
Create your feature branch
Run tests: npm test
Submit a pull request

Thanks to contributors

Thanks to the ScrapeGraphAI team and community for the implementation! Thanks to MCP.so and Klavis AI for hosting and all contributors for integrating our server.

Support the Project

If you find our MCP server useful, please consider giving us a star on GitHub! Your support helps us continue improving and maintaining this project.

Star us on GitHub

Show your support by starring our repository. It takes just a second and means a lot to our team!

Every star motivates us to keep building amazing tools for the community. Thank you for your support!

License

MIT License - see LICENSE file for details

Get Started

Services

Official SDKs

Integrations

Contribute

Resources

​ScrapeGraphAI MCP Server

⭐ Star us on GitHub

​Features

​Installation

​Remote hosted URL

​Running on Cursor

​Manual Installation

​Local Python MCP (self-hosted) in Cursor

​Alternative: Cursor config using CLI entry point

​Running on Windsurf

​Running on VS Code

​Running on Claude Desktop

​Alternative: Claude config using CLI entry point

​Configuration

​Environment Variables

​Required for Cloud API

​Optional Configuration

​Configuration Examples

​Basic Configuration

​System Configuration

​Rate Limiting and Batch Processing

​Available Tools

​1. Scrape Tool (scrapegraph_scrape)

​2. Batch Scrape Tool (scrapegraph_batch_scrape)

​3. Check Batch Status (scrapegraph_check_batch_status)

​4. Search Tool (scrapegraph_search)

​5. Crawl Tool (scrapegraph_crawl)

​6. Extract Tool (scrapegraph_extract)

​Extract Tool Options:

​7. Deep Research Tool (scrapegraph_deep_research)

​8. Generate LLMs.txt Tool (scrapegraph_generate_llmstxt)

​Logging System

​Error Handling

​Development

​Contributing

​Thanks to contributors

​Support the Project

Star us on GitHub

​License

ScrapeGraphAI MCP Server

Features

Installation

Remote hosted URL

Running on Cursor

Manual Installation

Local Python MCP (self-hosted) in Cursor

Alternative: Cursor config using CLI entry point

Running on Windsurf

Running on VS Code

Running on Claude Desktop

Alternative: Claude config using CLI entry point

Configuration

Environment Variables

Required for Cloud API

Optional Configuration

Configuration Examples

Basic Configuration

System Configuration

Rate Limiting and Batch Processing

Available Tools

1. Scrape Tool (`scrapegraph_scrape`)

2. Batch Scrape Tool (`scrapegraph_batch_scrape`)

3. Check Batch Status (`scrapegraph_check_batch_status`)

4. Search Tool (`scrapegraph_search`)

5. Crawl Tool (`scrapegraph_crawl`)

6. Extract Tool (`scrapegraph_extract`)

Extract Tool Options:

7. Deep Research Tool (`scrapegraph_deep_research`)

8. Generate LLMs.txt Tool (`scrapegraph_generate_llmstxt`)

Logging System

Error Handling

Development

Contributing

Thanks to contributors

Support the Project

License