Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The Scrape service fetches a web page and returns content in one or more formats at the same time: markdown, HTML, links, images, summary, JSON extraction, branding, or screenshots. It replaces the previous Markdownify service and uses a flexible formats array so a single call can return any combination you need.
Try the Scrape service instantly in our interactive playground.

Pricing

FormatCredits
markdown1
html1
links1
images1
summary1
json5
screenshot2
branding25
When a request includes multiple formats, the per-format costs are summed. Enabling stealth in fetchConfig adds 5 credits; render mode (auto/fast/js) does not affect the cost. See the pricing page for the full breakdown.

Getting Started

Quick Start

from scrapegraph_py import ScrapeGraphAI, MarkdownFormatConfig

# reads SGAI_API_KEY from env, or pass explicitly: ScrapeGraphAI(api_key="...")
sgai = ScrapeGraphAI()

res = sgai.scrape(
    "https://example.com",
    formats=[MarkdownFormatConfig()],
)

if res.status == "success":
    md = res.data.results.get("markdown", {}).get("data", [])
    print(md[0] if md else None)
else:
    print("Failed:", res.error)

Parameters

ParameterTypeRequiredDescription
urlstringYesThe URL of the webpage to scrape.
formatsarrayYesOne or more output formats (see Formats).
contentTypestringNoOverride auto-detected content type (e.g. "text/html", "application/pdf").
fetchConfig / fetch_configobjectNoFetch options — mode, stealth, headers, cookies, scrolls, wait, timeout, country.
Get your API key from the dashboard.
{
  "id": "03907b00-3c10-4b73-a6b5-e3b399a850b1",
  "results": {
    "markdown": {
      "data": [
        "# Example Domain\n\nThis domain is for use in documentation examples without needing permission. Avoid use in operations.\n\n[Learn more](https://iana.org/domains/example)\n"
      ]
    }
  },
  "metadata": {
    "contentType": "text/html"
  }
}

Output Formats

Pass an array of format objects. Each entry has a type and optional per-format options.
FormatOptionsDescription
markdownmode: "normal" | "reader" | "prune"Clean markdown conversion of the page.
htmlmode: "normal" | "reader" | "prune"Raw or processed HTML.
linksAll outgoing links on the page.
imagesAll image URLs on the page.
summaryAI-generated short summary.
jsonprompt, schemaStructured JSON extraction (AI).
brandingBrand colors, typography, and logos.
screenshotfullPage, width, height, qualityScreenshot image URL.

Multi-format example

from scrapegraph_py import (
    ScrapeGraphAI,
    MarkdownFormatConfig,
    LinksFormatConfig,
    ScreenshotFormatConfig,
)

sgai = ScrapeGraphAI()

res = sgai.scrape(
    "https://example.com",
    formats=[
        MarkdownFormatConfig(mode="reader"),
        LinksFormatConfig(),
        ScreenshotFormatConfig(width=1280, height=720),
    ],
)

if res.status == "success":
    results = res.data.results
    print("Markdown preview:", results.get("markdown", {}).get("data", [""])[0][:200])
    print("Links count:", len(results.get("links", {}).get("data", [])))
    print("Screenshot URL:", results.get("screenshot", {}).get("data", {}).get("url"))

Screenshot

Capture a screenshot of the page. Use fullPage to grab the entire scrollable area, or set width/height for a fixed viewport. quality (1–100) controls JPEG compression.
from scrapegraph_py import ScrapeGraphAI, ScreenshotFormatConfig

sgai = ScrapeGraphAI()

res = sgai.scrape(
    "https://scrapegraphai.com",
    formats=[
        ScreenshotFormatConfig(
            full_page=True,
            width=1440,
            height=900,
            quality=90,
        ),
    ],
)

if res.status == "success":
    shot = res.data.results.get("screenshot", {}).get("data", {})
    print("URL:", shot.get("url"))
    print("Size:", f"{shot.get('width')}x{shot.get('height')}")
OptionTypeDefaultRangeDescription
fullPageboolfalseCapture the whole scrollable page instead of just the viewport.
widthint14403203840Viewport width in pixels.
heightint9002002160Viewport height in pixels.
qualityint801100JPEG quality.

Branding

Extract a page’s brand identity — colors, typography, and logos — in a single call.
from scrapegraph_py import ScrapeGraphAI, BrandingFormatConfig

sgai = ScrapeGraphAI()

res = sgai.scrape(
    "https://scrapegraphai.com",
    formats=[BrandingFormatConfig()],
)

if res.status == "success":
    branding = res.data.results.get("branding", {}).get("data")
    print(branding)
Branding costs 25 credits per call — significantly more than other formats because it runs additional vision and typography analysis on top of the page fetch.

Structured JSON extraction

Use the json format to extract structured data during the scrape.
from scrapegraph_py import ScrapeGraphAI, JsonFormatConfig

sgai = ScrapeGraphAI()

res = sgai.scrape(
    "https://scrapegraphai.com",
    formats=[
        JsonFormatConfig(
            prompt="Extract the company name and tagline",
            schema={
                "type": "object",
                "properties": {
                    "companyName": {"type": "string"},
                    "tagline": {"type": "string"},
                },
                "required": ["companyName"],
            },
        ),
    ],
)

if res.status == "success":
    print(res.data.results.get("json", {}).get("data"))

Using a Pydantic schema (Python)

JsonFormatConfig.schema accepts any JSON Schema dict, so a Pydantic BaseModel works via model_json_schema():
from pydantic import BaseModel
from scrapegraph_py import ScrapeGraphAI, JsonFormatConfig

class Company(BaseModel):
    company_name: str
    tagline: str | None = None

sgai = ScrapeGraphAI()

res = sgai.scrape(
    "https://scrapegraphai.com",
    formats=[
        JsonFormatConfig(
            prompt="Extract the company name and tagline",
            schema=Company.model_json_schema(),
        ),
    ],
)

if res.status == "success":
    parsed = Company.model_validate(res.data.results["json"]["data"])
    print(parsed.company_name, parsed.tagline)

FetchConfig

Control how pages are fetched — JS rendering, stealth, custom headers, etc.
from scrapegraph_py import ScrapeGraphAI, MarkdownFormatConfig, FetchConfig

sgai = ScrapeGraphAI()

res = sgai.scrape(
    "https://example.com",
    formats=[MarkdownFormatConfig()],
    fetch_config=FetchConfig(
        mode="js",
        stealth=True,
        wait=2000,
        scrolls=3,
        cookies={"session": "abc123"},
        country="us",
    ),
)
ParameterTypeDescription
modestringFetch mode: "auto" (default), "fast", or "js".
stealthboolEnable stealth mode with residential proxy and anti-bot headers.
headersobjectCustom HTTP headers.
cookiesobjectCookies to include in the request.
scrollsintNumber of page scrolls (0–100).
waitintMilliseconds to wait after page load (0–30000).
timeoutintRequest timeout in milliseconds (1000–60000).
countrystringTwo-letter ISO country code for geo-targeted proxy routing.

Async Support (Python)

import asyncio
from scrapegraph_py import AsyncScrapeGraphAI, MarkdownFormatConfig

async def main():
    async with AsyncScrapeGraphAI() as sgai:
        res = await sgai.scrape(
            "https://example.com",
            formats=[MarkdownFormatConfig()],
        )
        if res.status == "success":
            md = res.data.results.get("markdown", {}).get("data", [])
            print(md[0] if md else None)

asyncio.run(main())

Key Features

Multiple Formats

Request any combination of markdown, HTML, links, images, summary, JSON, branding, or screenshots in a single call.

JavaScript Rendering

Handle JavaScript-heavy sites with mode: "js" on fetchConfig.

Structured Output

Use the json format with a JSON schema to get typed data back.

Reliable Output

Stealth mode and country-targeted proxies for difficult sources.

Integration Options

Official SDKs

  • Python SDK — perfect for automation and data processing
  • JavaScript SDK — ideal for web applications and Node.js (scrapegraph-js ≥ 2.1.0, Node ≥ 22)

AI Framework Integrations

Support & Resources

Documentation

Comprehensive guides and tutorials

API Reference

Detailed API documentation

Community

Join our Discord community

GitHub

Check out our open-source projects

Ready to Start?

Sign up now and get your API key to begin scraping web content!