> ## Documentation Index
> Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Scrape

> Scrape web pages in markdown, HTML, screenshot, JSON, and more

## Overview

The Scrape service fetches a web page and returns content in one or more formats at the same time: markdown, HTML, links, images, summary, JSON extraction, branding, or screenshots. It replaces the previous Markdownify service and uses a flexible `formats` array so a single call can return any combination you need.

<Note>
  Try the Scrape service instantly in our [interactive playground](https://scrapegraphai.com/dashboard).
</Note>

## Pricing

| Format       | Credits |
| ------------ | ------- |
| `markdown`   | 1       |
| `html`       | 1       |
| `links`      | 1       |
| `images`     | 1       |
| `summary`    | 1       |
| `json`       | 5       |
| `screenshot` | 2       |
| `branding`   | 25      |

When a request includes multiple formats, the per-format costs are summed. Enabling `stealth` in `fetchConfig` adds 5 credits; render mode (`auto`/`fast`/`js`) does not affect the cost. See the [pricing page](https://scrapegraphai.com/pricing) for the full breakdown.

## Getting Started

### Quick Start

<CodeGroup>
  ```python Python theme={null}
  from scrapegraph_py import ScrapeGraphAI, MarkdownFormatConfig

  # reads SGAI_API_KEY from env, or pass explicitly: ScrapeGraphAI(api_key="...")
  sgai = ScrapeGraphAI()

  res = sgai.scrape(
      "https://example.com",
      formats=[MarkdownFormatConfig()],
  )

  if res.status == "success":
      md = res.data.results.get("markdown", {}).get("data", [])
      print(md[0] if md else None)
  else:
      print("Failed:", res.error)
  ```

  ```javascript JavaScript theme={null}
  import { ScrapeGraphAI } from "scrapegraph-js";

  // reads SGAI_API_KEY from env, or pass explicitly: ScrapeGraphAI({ apiKey: "..." })
  const sgai = ScrapeGraphAI();

  const res = await sgai.scrape({
    url: "https://example.com",
    formats: [{ type: "markdown" }],
  });

  if (res.status === "success") {
    console.log(res.data?.results.markdown?.data?.[0]);
  } else {
    console.error(res.error);
  }
  ```

  ```bash cURL theme={null}
  curl -X POST https://v2-api.scrapegraphai.com/api/scrape \
    -H "SGAI-APIKEY: $SGAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "url": "https://example.com",
      "formats": [{ "type": "markdown" }]
    }'
  ```
</CodeGroup>

#### Parameters

| Parameter                      | Type   | Required | Description                                                                                       |
| ------------------------------ | ------ | -------- | ------------------------------------------------------------------------------------------------- |
| `url`                          | string | Yes      | The URL of the webpage to scrape.                                                                 |
| `formats`                      | array  | Yes      | One or more output formats (see [Formats](#output-formats)).                                      |
| `contentType`                  | string | No       | Override auto-detected content type (e.g. `"text/html"`, `"application/pdf"`).                    |
| `fetchConfig` / `fetch_config` | object | No       | Fetch options — `mode`, `stealth`, `headers`, `cookies`, `scrolls`, `wait`, `timeout`, `country`. |

<Note>
  Get your API key from the [dashboard](https://scrapegraphai.com/dashboard).
</Note>

<Accordion title="Example Response" icon="terminal">
  ```json theme={null}
  {
    "id": "03907b00-3c10-4b73-a6b5-e3b399a850b1",
    "results": {
      "markdown": {
        "data": [
          "# Example Domain\n\nThis domain is for use in documentation examples without needing permission. Avoid use in operations.\n\n[Learn more](https://iana.org/domains/example)\n"
        ]
      }
    },
    "metadata": {
      "contentType": "text/html"
    }
  }
  ```
</Accordion>

## Output Formats

Pass an array of format objects. Each entry has a `type` and optional per-format options.

| Format       | Options                                       | Description                            |
| ------------ | --------------------------------------------- | -------------------------------------- |
| `markdown`   | `mode`: `"normal"` \| `"reader"` \| `"prune"` | Clean markdown conversion of the page. |
| `html`       | `mode`: `"normal"` \| `"reader"` \| `"prune"` | Raw or processed HTML.                 |
| `links`      | —                                             | All outgoing links on the page.        |
| `images`     | —                                             | All image URLs on the page.            |
| `summary`    | —                                             | AI-generated short summary.            |
| `json`       | `prompt`, `schema`                            | Structured JSON extraction (AI).       |
| `branding`   | —                                             | Brand colors, typography, and logos.   |
| `screenshot` | `fullPage`, `width`, `height`, `quality`      | Screenshot image URL.                  |

### Multi-format example

<CodeGroup>
  ```python Python theme={null}
  from scrapegraph_py import (
      ScrapeGraphAI,
      MarkdownFormatConfig,
      LinksFormatConfig,
      ScreenshotFormatConfig,
  )

  sgai = ScrapeGraphAI()

  res = sgai.scrape(
      "https://example.com",
      formats=[
          MarkdownFormatConfig(mode="reader"),
          LinksFormatConfig(),
          ScreenshotFormatConfig(width=1280, height=720),
      ],
  )

  if res.status == "success":
      results = res.data.results
      print("Markdown preview:", results.get("markdown", {}).get("data", [""])[0][:200])
      print("Links count:", len(results.get("links", {}).get("data", [])))
      print("Screenshot URL:", results.get("screenshot", {}).get("data", {}).get("url"))
  ```

  ```javascript JavaScript theme={null}
  import { ScrapeGraphAI } from "scrapegraph-js";

  const sgai = ScrapeGraphAI();

  const res = await sgai.scrape({
    url: "https://example.com",
    formats: [
      { type: "markdown", mode: "reader" },
      { type: "links" },
      { type: "screenshot", fullPage: false, width: 1280, height: 720 },
    ],
  });

  if (res.status === "success") {
    const r = res.data?.results;
    console.log("md:", r?.markdown?.data?.[0]?.slice(0, 200));
    console.log("links:", r?.links?.metadata?.count);
    console.log("screenshot:", r?.screenshot?.data?.url);
  }
  ```

  ```bash cURL theme={null}
  curl -X POST https://v2-api.scrapegraphai.com/api/scrape \
    -H "SGAI-APIKEY: $SGAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "url": "https://example.com",
      "formats": [
        { "type": "markdown", "mode": "reader" },
        { "type": "links" },
        { "type": "screenshot", "width": 1280, "height": 720 }
      ]
    }'
  ```
</CodeGroup>

### Screenshot

Capture a screenshot of the page. Use `fullPage` to grab the entire scrollable area, or set `width`/`height` for a fixed viewport. `quality` (1–100) controls JPEG compression.

<CodeGroup>
  ```python Python theme={null}
  from scrapegraph_py import ScrapeGraphAI, ScreenshotFormatConfig

  sgai = ScrapeGraphAI()

  res = sgai.scrape(
      "https://scrapegraphai.com",
      formats=[
          ScreenshotFormatConfig(
              full_page=True,
              width=1440,
              height=900,
              quality=90,
          ),
      ],
  )

  if res.status == "success":
      shot = res.data.results.get("screenshot", {}).get("data", {})
      print("URL:", shot.get("url"))
      print("Size:", f"{shot.get('width')}x{shot.get('height')}")
  ```

  ```javascript JavaScript theme={null}
  import { ScrapeGraphAI } from "scrapegraph-js";

  const sgai = ScrapeGraphAI();

  const res = await sgai.scrape({
    url: "https://scrapegraphai.com",
    formats: [
      { type: "screenshot", fullPage: true, width: 1440, height: 900, quality: 90 },
    ],
  });

  if (res.status === "success") {
    const shot = res.data?.results.screenshot?.data;
    console.log("URL:", shot?.url);
    console.log("Size:", `${shot?.width}x${shot?.height}`);
  }
  ```

  ```bash cURL theme={null}
  curl -X POST https://v2-api.scrapegraphai.com/api/scrape \
    -H "SGAI-APIKEY: $SGAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "url": "https://scrapegraphai.com",
      "formats": [
        { "type": "screenshot", "fullPage": true, "width": 1440, "height": 900, "quality": 90 }
      ]
    }'
  ```
</CodeGroup>

| Option     | Type | Default | Range        | Description                                                     |
| ---------- | ---- | ------- | ------------ | --------------------------------------------------------------- |
| `fullPage` | bool | `false` | —            | Capture the whole scrollable page instead of just the viewport. |
| `width`    | int  | `1440`  | `320`–`3840` | Viewport width in pixels.                                       |
| `height`   | int  | `900`   | `200`–`2160` | Viewport height in pixels.                                      |
| `quality`  | int  | `80`    | `1`–`100`    | JPEG quality.                                                   |

### Branding

Extract a page's brand identity — colors, typography, and logos — in a single call.

<CodeGroup>
  ```python Python theme={null}
  from scrapegraph_py import ScrapeGraphAI, BrandingFormatConfig

  sgai = ScrapeGraphAI()

  res = sgai.scrape(
      "https://scrapegraphai.com",
      formats=[BrandingFormatConfig()],
  )

  if res.status == "success":
      branding = res.data.results.get("branding", {}).get("data")
      print(branding)
  ```

  ```javascript JavaScript theme={null}
  import { ScrapeGraphAI } from "scrapegraph-js";

  const sgai = ScrapeGraphAI();

  const res = await sgai.scrape({
    url: "https://scrapegraphai.com",
    formats: [{ type: "branding" }],
  });

  if (res.status === "success") {
    console.log(res.data?.results.branding?.data);
  }
  ```

  ```bash cURL theme={null}
  curl -X POST https://v2-api.scrapegraphai.com/api/scrape \
    -H "SGAI-APIKEY: $SGAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "url": "https://scrapegraphai.com",
      "formats": [{ "type": "branding" }]
    }'
  ```
</CodeGroup>

<Note>
  Branding costs **25 credits** per call — significantly more than other formats because it runs additional vision and typography analysis on top of the page fetch.
</Note>

### Structured JSON extraction

Use the `json` format to extract structured data during the scrape.

<CodeGroup>
  ```python Python theme={null}
  from scrapegraph_py import ScrapeGraphAI, JsonFormatConfig

  sgai = ScrapeGraphAI()

  res = sgai.scrape(
      "https://scrapegraphai.com",
      formats=[
          JsonFormatConfig(
              prompt="Extract the company name and tagline",
              schema={
                  "type": "object",
                  "properties": {
                      "companyName": {"type": "string"},
                      "tagline": {"type": "string"},
                  },
                  "required": ["companyName"],
              },
          ),
      ],
  )

  if res.status == "success":
      print(res.data.results.get("json", {}).get("data"))
  ```

  ```javascript JavaScript theme={null}
  import { ScrapeGraphAI } from "scrapegraph-js";

  const sgai = ScrapeGraphAI();

  const res = await sgai.scrape({
    url: "https://scrapegraphai.com",
    formats: [
      {
        type: "json",
        prompt: "Extract the company name and tagline",
        schema: {
          type: "object",
          properties: {
            companyName: { type: "string" },
            tagline: { type: "string" },
          },
          required: ["companyName"],
        },
      },
    ],
  });

  if (res.status === "success") {
    console.log(res.data?.results.json?.data);
  }
  ```
</CodeGroup>

#### Using a Pydantic schema (Python)

`JsonFormatConfig.schema` accepts any JSON Schema dict, so a Pydantic `BaseModel` works via `model_json_schema()`:

```python theme={null}
from pydantic import BaseModel
from scrapegraph_py import ScrapeGraphAI, JsonFormatConfig

class Company(BaseModel):
    company_name: str
    tagline: str | None = None

sgai = ScrapeGraphAI()

res = sgai.scrape(
    "https://scrapegraphai.com",
    formats=[
        JsonFormatConfig(
            prompt="Extract the company name and tagline",
            schema=Company.model_json_schema(),
        ),
    ],
)

if res.status == "success":
    parsed = Company.model_validate(res.data.results["json"]["data"])
    print(parsed.company_name, parsed.tagline)
```

## FetchConfig

Control how pages are fetched — JS rendering, stealth, custom headers, etc.

<CodeGroup>
  ```python Python theme={null}
  from scrapegraph_py import ScrapeGraphAI, MarkdownFormatConfig, FetchConfig

  sgai = ScrapeGraphAI()

  res = sgai.scrape(
      "https://example.com",
      formats=[MarkdownFormatConfig()],
      fetch_config=FetchConfig(
          mode="js",
          stealth=True,
          wait=2000,
          scrolls=3,
          cookies={"session": "abc123"},
          country="us",
      ),
  )
  ```

  ```javascript JavaScript theme={null}
  import { ScrapeGraphAI } from "scrapegraph-js";

  const sgai = ScrapeGraphAI();

  const res = await sgai.scrape({
    url: "https://example.com",
    formats: [{ type: "markdown" }],
    fetchConfig: {
      mode: "js",
      stealth: true,
      wait: 2000,
      scrolls: 3,
      cookies: { session: "abc123" },
      country: "us",
    },
  });
  ```
</CodeGroup>

| Parameter | Type   | Description                                                      |
| --------- | ------ | ---------------------------------------------------------------- |
| `mode`    | string | Fetch mode: `"auto"` (default), `"fast"`, or `"js"`.             |
| `stealth` | bool   | Enable stealth mode with residential proxy and anti-bot headers. |
| `headers` | object | Custom HTTP headers.                                             |
| `cookies` | object | Cookies to include in the request.                               |
| `scrolls` | int    | Number of page scrolls (0–100).                                  |
| `wait`    | int    | Milliseconds to wait after page load (0–30000).                  |
| `timeout` | int    | Request timeout in milliseconds (1000–60000).                    |
| `country` | string | Two-letter ISO country code for geo-targeted proxy routing.      |

## Async Support (Python)

```python theme={null}
import asyncio
from scrapegraph_py import AsyncScrapeGraphAI, MarkdownFormatConfig

async def main():
    async with AsyncScrapeGraphAI() as sgai:
        res = await sgai.scrape(
            "https://example.com",
            formats=[MarkdownFormatConfig()],
        )
        if res.status == "success":
            md = res.data.results.get("markdown", {}).get("data", [])
            print(md[0] if md else None)

asyncio.run(main())
```

## Key Features

<CardGroup cols={2}>
  <Card title="Multiple Formats" icon="file-lines">
    Request any combination of markdown, HTML, links, images, summary, JSON, branding, or screenshots in a single call.
  </Card>

  <Card title="JavaScript Rendering" icon="code">
    Handle JavaScript-heavy sites with `mode: "js"` on `fetchConfig`.
  </Card>

  <Card title="Structured Output" icon="table">
    Use the `json` format with a JSON schema to get typed data back.
  </Card>

  <Card title="Reliable Output" icon="shield-check">
    Stealth mode and country-targeted proxies for difficult sources.
  </Card>
</CardGroup>

## Integration Options

### Official SDKs

* [Python SDK](/sdks/python) — perfect for automation and data processing
* [JavaScript SDK](/sdks/javascript) — ideal for web applications and Node.js (`scrapegraph-js` ≥ 2.1.0, Node ≥ 22)

### AI Framework Integrations

* [LangChain Integration](/integrations/langchain) — use Scrape in your content pipelines
* [LlamaIndex Integration](/integrations/llamaindex) — create searchable knowledge bases

## Support & Resources

<CardGroup cols={2}>
  <Card title="Documentation" icon="book" href="/introduction">
    Comprehensive guides and tutorials
  </Card>

  <Card title="API Reference" icon="code" href="/api-reference/introduction">
    Detailed API documentation
  </Card>

  <Card title="Community" icon="discord" href="https://discord.gg/uJN7TYcpNa">
    Join our Discord community
  </Card>

  <Card title="GitHub" icon="github" href="https://github.com/ScrapeGraphAI">
    Check out our open-source projects
  </Card>
</CardGroup>

<Card title="Ready to Start?" icon="rocket" href="https://scrapegraphai.com/dashboard">
  Sign up now and get your API key to begin scraping web content!
</Card>
