> ## Documentation Index
> Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Python SDK

> Official Python SDK for ScrapeGraphAI v2

<CardGroup cols={3}>
  <Card title="PyPI Package" icon="box" href="https://pypi.org/project/scrapegraph-py/">
    [![PyPI version](https://badge.fury.io/py/scrapegraph-py.svg)](https://badge.fury.io/py/scrapegraph-py)
  </Card>

  <Card title="Python Support" icon="python" href="https://pypi.org/project/scrapegraph-py/">
    [![Python Support](https://img.shields.io/pypi/pyversions/scrapegraph-py.svg)](https://pypi.org/project/scrapegraph-py/)
  </Card>

  <Card title="Source on GitHub" icon="github" href="https://github.com/ScrapeGraphAI/scrapegraph-py">
    Issues, PRs, and the changelog
  </Card>
</CardGroup>

<Note>
  These docs cover **`scrapegraph-py` ≥ 2.1.0** and require **Python ≥ 3.12**. Earlier `1.x` releases expose the deprecated v1 API and point to a different backend — none of the snippets on this page work there. The `2.0.x` series used typed request wrappers (`ScrapeRequest`, `ExtractRequest`, …); **2.1.0 removed those wrappers** in favour of direct positional/keyword arguments, so upgrade if you are pinned to `2.0.x`.
</Note>

## Installation

```bash theme={null}
pip install "scrapegraph-py>=2.1.0"
# or
uv add "scrapegraph-py>=2.1.0"
```

## What's New in v2

* **Complete rewrite** built on [Pydantic v2](https://docs.pydantic.dev) + [httpx](https://www.python-httpx.org).
* **Client rename**: `Client` → `ScrapeGraphAI`, `AsyncClient` → `AsyncScrapeGraphAI`.
* **Direct arguments** (v2.1.0): every method accepts positional/keyword args — no more `ScrapeRequest`/`ExtractRequest`/… wrappers.
* **`ApiResult[T]` wrapper**: no exceptions on API errors — every call returns `status: "success" | "error"`, `data`, `error`, and `elapsed_ms`.
* **Nested resources**: `sgai.crawl.*`, `sgai.monitor.*`, `sgai.history.*`.
* **camelCase on the wire, snake\_case in Python**: automatic via Pydantic's `alias_generator`.
* **Removed**: `markdownify()`, `agenticscraper()`, `sitemap()`, `feedback()` — use `scrape()` with the appropriate format entry instead.

<Warning>
  v2 is a breaking release. See the [Migration Guide](/transition-from-v1-to-v2) if you're upgrading from v1.
</Warning>

## Quick Start

```python theme={null}
from scrapegraph_py import ScrapeGraphAI

# reads SGAI_API_KEY from env, or pass it explicitly:
# sgai = ScrapeGraphAI(api_key="sgai-...")
sgai = ScrapeGraphAI()

result = sgai.scrape("https://example.com")

if result.status == "success":
    print(result.data.results["markdown"]["data"])
else:
    print(result.error)
```

### ApiResult

Every method returns `ApiResult[T]` — no try/except needed for API errors:

```python theme={null}
from typing import Generic, Literal, TypeVar
from pydantic import BaseModel

T = TypeVar("T")

class ApiResult(BaseModel, Generic[T]):
    status: Literal["success", "error"]
    data: T | None
    error: str | None = None
    elapsed_ms: int
```

### Environment Variables

| Variable       | Description                         | Default                                |
| -------------- | ----------------------------------- | -------------------------------------- |
| `SGAI_API_KEY` | Your ScrapeGraphAI API key          | —                                      |
| `SGAI_API_URL` | Override API base URL               | `https://v2-api.scrapegraphai.com/api` |
| `SGAI_TIMEOUT` | Request timeout in seconds          | `120`                                  |
| `SGAI_DEBUG`   | Enable debug logging (set to `"1"`) | off                                    |

The client supports context managers for automatic session cleanup:

```python theme={null}
with ScrapeGraphAI() as sgai:
    result = sgai.scrape("https://example.com")
```

## Services

### Scrape

Fetch a page in one or more formats (markdown, html, screenshot, json, links, images, summary, branding).

```python theme={null}
from scrapegraph_py import (
    ScrapeGraphAI, FetchConfig,
    MarkdownFormatConfig, ScreenshotFormatConfig, JsonFormatConfig,
)

sgai = ScrapeGraphAI()

res = sgai.scrape(
    "https://example.com",
    formats=[
        MarkdownFormatConfig(mode="reader"),
        ScreenshotFormatConfig(full_page=True, width=1440, height=900),
        JsonFormatConfig(prompt="Extract product info"),
    ],
    content_type="text/html",  # optional, auto-detected
    fetch_config=FetchConfig(
        mode="js",
        stealth=True,
        timeout=30000,
        wait=2000,
        scrolls=3,
    ),
)

if res.status == "success":
    markdown = res.data.results["markdown"]["data"]
```

#### `scrape()` parameters

| Parameter      | Type                 | Required | Description                                                              |
| -------------- | -------------------- | -------- | ------------------------------------------------------------------------ |
| `url`          | `str`                | Yes      | URL to scrape (positional)                                               |
| `formats`      | `list[FormatConfig]` | No       | Defaults to `[MarkdownFormatConfig()]`                                   |
| `content_type` | `str`                | No       | Override detected content type (e.g. `"application/pdf"`, `"text/html"`) |
| `fetch_config` | `FetchConfig`        | No       | Fetch configuration (mode, stealth, timeout, cookies, country, …)        |

#### Format entries

| Class                    | Fields                                                                                                                             |
| ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------- |
| `MarkdownFormatConfig`   | `mode`: `"normal" \| "reader" \| "prune"`                                                                                          |
| `HtmlFormatConfig`       | `mode`: same as above                                                                                                              |
| `ScreenshotFormatConfig` | `full_page`, `width` (320–3840), `height` (200–2160), `quality`                                                                    |
| `JsonFormatConfig`       | `prompt` (1–10k chars), `schema` (JSON Schema dict — pass a Pydantic model's `model_json_schema()` to reuse a `BaseModel`), `mode` |
| `LinksFormatConfig`      | —                                                                                                                                  |
| `ImagesFormatConfig`     | —                                                                                                                                  |
| `SummaryFormatConfig`    | —                                                                                                                                  |
| `BrandingFormatConfig`   | —                                                                                                                                  |

<Note>
  Duplicate `type` entries in `formats` are rejected by a Pydantic validator.
</Note>

### Extract

Run structured extraction against a URL, HTML, or markdown using AI.

```python theme={null}
from scrapegraph_py import ScrapeGraphAI

sgai = ScrapeGraphAI()

res = sgai.extract(
    "Extract product names and prices",
    url="https://example.com",
    schema={
        "type": "object",
        "properties": {
            "products": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "name":  {"type": "string"},
                        "price": {"type": "string"},
                    },
                },
            },
        },
    },
)

if res.status == "success":
    print(res.data.json_data)
    print(f"Tokens: {res.data.usage.prompt_tokens} / {res.data.usage.completion_tokens}")
```

##### Using a Pydantic model as the schema

`schema=` is a JSON Schema `dict`. Any Pydantic `BaseModel` produces one via `model_json_schema()`, so you can define the desired shape once and reuse it to validate the response client-side.

```python theme={null}
from pydantic import BaseModel, Field
from scrapegraph_py import ScrapeGraphAI

class Product(BaseModel):
    name: str
    price: str | None = None

class Products(BaseModel):
    products: list[Product] = Field(default_factory=list)

sgai = ScrapeGraphAI()

res = sgai.extract(
    "Extract product names and prices",
    url="https://example.com",
    schema=Products.model_json_schema(),
)

if res.status == "success":
    parsed = Products.model_validate(res.data.json_data)
    for p in parsed.products:
        print(p.name, p.price)
```

The same pattern works for `JsonFormatConfig(schema=...)` in `scrape()` and for `search(schema=...)`.

#### `extract()` parameters

| Parameter      | Type          | Required | Description                                                                                                  |
| -------------- | ------------- | -------- | ------------------------------------------------------------------------------------------------------------ |
| `prompt`       | `str`         | Yes      | 1–10,000 chars (positional)                                                                                  |
| `url`          | `str`         | Yes\*    | Page URL                                                                                                     |
| `html`         | `str`         | Yes\*    | Raw HTML (alternative to `url`)                                                                              |
| `markdown`     | `str`         | Yes\*    | Raw markdown (alternative to `url`)                                                                          |
| `schema`       | `dict`        | No       | JSON Schema for the structured output. Pass a Pydantic model's `model_json_schema()` to reuse a `BaseModel`. |
| `mode`         | `str`         | No       | `"normal"` (default), `"reader"`, `"prune"`                                                                  |
| `content_type` | `str`         | No       | Override detected content type                                                                               |
| `fetch_config` | `FetchConfig` | No       | Fetch configuration                                                                                          |

<Note>
  \*At least one of `url`, `html`, or `markdown` is required.
</Note>

### Search

Run a web search and optionally extract structured data from the results.

```python theme={null}
from scrapegraph_py import ScrapeGraphAI

sgai = ScrapeGraphAI()

res = sgai.search(
    "best programming languages 2024",
    num_results=5,
    prompt="Summarize the top languages and reasons",
    time_range="past_week",
    location_geo_code="us",
)

if res.status == "success":
    for hit in res.data.results:
        print(hit.title, hit.url)
    print(res.data.json_data)  # when prompt/schema are set
```

#### `search()` parameters

| Parameter           | Type          | Required | Description                                                                                              |
| ------------------- | ------------- | -------- | -------------------------------------------------------------------------------------------------------- |
| `query`             | `str`         | Yes      | 1–500 chars (positional)                                                                                 |
| `num_results`       | `int`         | No       | 1–20, default `3`                                                                                        |
| `format`            | `str`         | No       | `"markdown"` (default) or `"html"`                                                                       |
| `mode`              | `str`         | No       | HTML processing: `"prune"` (default), `"normal"`, `"reader"`                                             |
| `prompt`            | `str`         | No       | Required when `schema` is set                                                                            |
| `schema`            | `dict`        | No       | JSON Schema for structured output. Pass a Pydantic model's `model_json_schema()` to reuse a `BaseModel`. |
| `location_geo_code` | `str`         | No       | Two-letter country code (e.g. `"us"`, `"it"`)                                                            |
| `time_range`        | `str`         | No       | `"past_hour"`, `"past_24_hours"`, `"past_week"`, `"past_month"`, `"past_year"`                           |
| `fetch_config`      | `FetchConfig` | No       | Fetch configuration                                                                                      |

### Crawl

Crawl a site and its linked pages asynchronously. Access via the `sgai.crawl` resource.

```python theme={null}
from scrapegraph_py import ScrapeGraphAI, MarkdownFormatConfig

sgai = ScrapeGraphAI()

# Start
start = sgai.crawl.start(
    "https://example.com",
    formats=[MarkdownFormatConfig()],
    max_depth=2,
    max_pages=50,
    max_links_per_page=10,
    include_patterns=["/blog/*"],
    exclude_patterns=["/admin/*"],
)

crawl_id = start.data.id

# Poll
status = sgai.crawl.get(crawl_id)
print(f"{status.data.finished}/{status.data.total} - {status.data.status}")

# Control
sgai.crawl.stop(crawl_id)
sgai.crawl.resume(crawl_id)
sgai.crawl.delete(crawl_id)
```

#### `crawl.start()` parameters

| Parameter            | Type                 | Required | Description                            |
| -------------------- | -------------------- | -------- | -------------------------------------- |
| `url`                | `str`                | Yes      | Starting URL (positional)              |
| `formats`            | `list[FormatConfig]` | No       | Defaults to `[MarkdownFormatConfig()]` |
| `max_depth`          | `int`                | No       | `≥ 0`, default `2`                     |
| `max_pages`          | `int`                | No       | `1–1000`, default `50`                 |
| `max_links_per_page` | `int`                | No       | `≥ 1`, default `10`                    |
| `allow_external`     | `bool`               | No       | Default `False`                        |
| `include_patterns`   | `list[str]`          | No       | URL glob patterns to include           |
| `exclude_patterns`   | `list[str]`          | No       | URL glob patterns to exclude           |
| `content_types`      | `list[str]`          | No       | Allowed response content types         |
| `fetch_config`       | `FetchConfig`        | No       | Fetch configuration                    |

### Monitor

Scheduled extraction jobs. Access via the `sgai.monitor` resource.

```python theme={null}
from scrapegraph_py import ScrapeGraphAI, MarkdownFormatConfig

sgai = ScrapeGraphAI()

mon = sgai.monitor.create(
    "https://example.com",
    "0 * * * *",                 # cron expression (positional)
    name="Price Monitor",
    formats=[MarkdownFormatConfig()],
    webhook_url="https://example.com/webhook",
)

cron_id = mon.data.cron_id

sgai.monitor.list()
sgai.monitor.get(cron_id)
sgai.monitor.update(cron_id, interval="0 */6 * * *")
sgai.monitor.pause(cron_id)
sgai.monitor.resume(cron_id)
sgai.monitor.delete(cron_id)
```

#### `monitor.activity()` — poll tick history

Paginate through the per-run ticks a monitor has produced (what changed on each scheduled run).

```python theme={null}
act = sgai.monitor.activity(cron_id, limit=20)

if act.status == "success":
    for tick in act.data.ticks:
        status = "CHANGED" if tick.changed else "no change"
        print(f"[{tick.created_at}] {tick.status} - {status} ({tick.elapsed_ms}ms)")

    if act.data.next_cursor:
        more = sgai.monitor.activity(cron_id, limit=20, cursor=act.data.next_cursor)
```

`monitor.activity()` accepts `limit` (1–100, default `20`) and optional `cursor` for pagination. Each `MonitorTickEntry` exposes `id`, `created_at`, `status`, `changed`, `elapsed_ms`, and a `diffs` model with per-format deltas.

#### `monitor.create()` parameters

| Parameter      | Type                 | Required | Description                               |
| -------------- | -------------------- | -------- | ----------------------------------------- |
| `url`          | `str`                | Yes      | URL to monitor (positional)               |
| `interval`     | `str`                | Yes      | Cron expression, 1–100 chars (positional) |
| `name`         | `str`                | No       | ≤ 200 chars                               |
| `formats`      | `list[FormatConfig]` | No       | Defaults to `[MarkdownFormatConfig()]`    |
| `webhook_url`  | `str`                | No       | Webhook invoked on change detection       |
| `fetch_config` | `FetchConfig`        | No       | Fetch configuration                       |

### History

Fetch recent request history. Access via the `sgai.history` resource.

```python theme={null}
from scrapegraph_py import ScrapeGraphAI

sgai = ScrapeGraphAI()

page = sgai.history.list(service="scrape", page=1, limit=20)
for entry in page.data.data:
    print(entry.id, entry.service, entry.status, entry.elapsed_ms)

one = sgai.history.get("request-id")
```

### Credits / Health

```python theme={null}
credits = sgai.credits()
# ApiResult[CreditsResponse] with .remaining, .used, .plan, .jobs.crawl, .jobs.monitor

health = sgai.health()
# ApiResult[HealthResponse] with .status, .uptime, .services
```

## Configuration Objects

### FetchConfig

Controls how pages are fetched. See the [proxy configuration guide](/services/additional-parameters/proxy) for details on modes and geotargeting.

```python theme={null}
from scrapegraph_py import FetchConfig

config = FetchConfig(
    mode="js",            # "auto" (default), "fast", "js"
    stealth=True,         # Residential proxies / anti-bot headers (+5 credits)
    timeout=30000,        # 1,000–60,000 ms
    wait=2000,            # 0–30,000 ms
    scrolls=3,            # 0–100
    country="us",         # ISO 3166-1 alpha-2
    headers={"X-Custom": "header"},
    cookies={"session": "abc"},
    mock=False,           # Or a MockConfig object for testing
)
```

## Async Support

Every sync method has an async equivalent on `AsyncScrapeGraphAI`:

```python theme={null}
import asyncio
from scrapegraph_py import AsyncScrapeGraphAI

async def main():
    async with AsyncScrapeGraphAI() as sgai:
        res = await sgai.scrape("https://example.com")
        if res.status == "success":
            print(res.data.results["markdown"]["data"])

        start = await sgai.crawl.start("https://example.com", max_pages=25)
        status = await sgai.crawl.get(start.data.id)
        print(status.data.status)

        credits = await sgai.credits()
        print(credits.data.remaining)

asyncio.run(main())
```

## Support

<CardGroup cols={2}>
  <Card title="GitHub" icon="github" href="https://github.com/ScrapeGraphAI/scrapegraph-py">
    Report issues and contribute to the SDK
  </Card>

  <Card title="Email Support" icon="envelope" href="mailto:support@scrapegraphai.com">
    Get help from our development team
  </Card>
</CardGroup>
