> ## Documentation Index
> Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Extract

> AI-powered structured data extraction from any webpage

## Overview

Extract uses an LLM to pull structured data from a URL, HTML, or markdown. Provide a prompt (and optionally a JSON schema) and it returns typed JSON — no selectors or post-processing required.

<Note>
  Try Extract instantly in our [interactive playground](https://scrapegraphai.com/dashboard).
</Note>

## Pricing

Each Extract call costs **5 credits**. Enabling `stealth` in `fetchConfig` adds 5 credits; render mode (`auto` / `fast` / `js`) does not affect the cost. See the [pricing page](https://scrapegraphai.com/pricing) for the full breakdown.

## Getting Started

### Quick Start

<CodeGroup>
  ```python Python theme={null}
  from scrapegraph_py import ScrapeGraphAI

  # reads SGAI_API_KEY from env, or pass explicitly: ScrapeGraphAI(api_key="...")
  sgai = ScrapeGraphAI()

  res = sgai.extract(
      "What does the company do? Extract name and description.",
      url="https://scrapegraphai.com",
  )

  if res.status == "success":
      print(res.data.json_data)
  else:
      print("Failed:", res.error)
  ```

  ```javascript JavaScript theme={null}
  import { ScrapeGraphAI } from "scrapegraph-js";

  const sgai = ScrapeGraphAI();

  const res = await sgai.extract({
    url: "https://scrapegraphai.com",
    prompt: "What does the company do? Extract name and description.",
  });

  if (res.status === "success") {
    console.log(res.data?.json);
    console.log("Tokens used:", res.data?.usage);
  } else {
    console.error(res.error);
  }
  ```

  ```bash cURL theme={null}
  curl -X POST https://v2-api.scrapegraphai.com/api/extract \
    -H "SGAI-APIKEY: $SGAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "url": "https://scrapegraphai.com",
      "prompt": "What does the company do? Extract name and description."
    }'
  ```
</CodeGroup>

#### Parameters

| Parameter                      | Type   | Required | Description                                                                                                                 |
| ------------------------------ | ------ | -------- | --------------------------------------------------------------------------------------------------------------------------- |
| `url`                          | string | Cond.    | URL of the page to extract from. One of `url`, `html`, or `markdown` is required.                                           |
| `html`                         | string | Cond.    | Raw HTML to extract from.                                                                                                   |
| `markdown`                     | string | Cond.    | Markdown content to extract from.                                                                                           |
| `prompt`                       | string | Yes      | Natural-language description of what to extract.                                                                            |
| `schema`                       | object | No       | JSON schema describing the desired output shape. In Python you can pass a Pydantic model via `MyModel.model_json_schema()`. |
| `mode`                         | string | No       | HTML processing mode: `"normal"`, `"reader"`, `"prune"`.                                                                    |
| `fetchConfig` / `fetch_config` | object | No       | Fetch options (see [Scrape · FetchConfig](/services/scrape#fetchconfig)).                                                   |

<Note>
  Get your API key from the [dashboard](https://scrapegraphai.com/dashboard).
</Note>

<Accordion title="Example Response" icon="terminal">
  ```json theme={null}
  {
    "id": "9a2178b6-2525-4f98-85e6-9f8c7da17541",
    "raw": null,
    "json": {
      "name": "ScrapeGraphAI",
      "description": "ScrapeGraphAI is an AI-powered web scraping platform that uses natural language prompts to turn any webpage into structured data via a simple API."
    },
    "usage": {
      "promptTokens": 10002,
      "completionTokens": 509
    },
    "metadata": {
      "chunker": { "chunks": [{ "size": 5000 }, { "size": 2535 }] },
      "fetch": {}
    }
  }
  ```
</Accordion>

## With a JSON Schema

Pass a JSON schema to pin down the exact output shape.

<CodeGroup>
  ```python Python theme={null}
  from scrapegraph_py import ScrapeGraphAI

  sgai = ScrapeGraphAI()

  res = sgai.extract(
      "Extract structured information about this page",
      url="https://example.com",
      schema={
          "type": "object",
          "properties": {
              "title": {"type": "string"},
              "description": {"type": "string"},
              "links": {"type": "array", "items": {"type": "string"}},
          },
          "required": ["title"],
      },
  )

  if res.status == "success":
      print(res.data.json_data)
  ```

  ```javascript JavaScript theme={null}
  import { ScrapeGraphAI } from "scrapegraph-js";

  const sgai = ScrapeGraphAI();

  const res = await sgai.extract({
    url: "https://example.com",
    prompt: "Extract the page title and description",
    schema: {
      type: "object",
      properties: {
        title: { type: "string" },
        description: { type: "string" },
      },
      required: ["title"],
    },
  });

  if (res.status === "success") {
    console.log(res.data?.json);
  }
  ```

  ```bash cURL theme={null}
  curl -X POST https://v2-api.scrapegraphai.com/api/extract \
    -H "SGAI-APIKEY: $SGAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "url": "https://example.com",
      "prompt": "Extract the page title and description",
      "schema": {
        "type": "object",
        "properties": {
          "title": {"type": "string"},
          "description": {"type": "string"}
        },
        "required": ["title"]
      }
    }'
  ```
</CodeGroup>

## With a Pydantic Schema (Python)

If you already model your data with [Pydantic](https://docs.pydantic.dev), use the same `BaseModel` to drive the extraction. `model_json_schema()` produces the JSON Schema dict the API expects, and `model_validate()` parses the response back into typed objects.

```python theme={null}
from pydantic import BaseModel, Field
from scrapegraph_py import ScrapeGraphAI

class Product(BaseModel):
    name: str = Field(description="Product name")
    price: str | None = Field(default=None, description="Listed price, if any")

class Products(BaseModel):
    products: list[Product] = Field(default_factory=list)

sgai = ScrapeGraphAI()

res = sgai.extract(
    "Extract product names and prices",
    url="https://example.com",
    schema=Products.model_json_schema(),
)

if res.status == "success":
    parsed = Products.model_validate(res.data.json_data)
    for p in parsed.products:
        print(p.name, p.price)
```

<Note>
  The wire format is JSON Schema either way — `model_json_schema()` is just the standard Pydantic v2 helper that produces it. Field descriptions are forwarded to the LLM and improve extraction quality on ambiguous fields.
</Note>

## Extract from HTML or Markdown

Skip the fetch and extract from content you already have.

```python theme={null}
from scrapegraph_py import ScrapeGraphAI

sgai = ScrapeGraphAI()

res = sgai.extract(
    "Extract product name and price",
    html="<html><body><h1>Widget</h1><p>$9.99</p></body></html>",
)
```

## FetchConfig

Control how the page is fetched before extraction (JS rendering, stealth, headers, etc). See the full options in [Scrape · FetchConfig](/services/scrape#fetchconfig).

```python theme={null}
from scrapegraph_py import ScrapeGraphAI, FetchConfig

sgai = ScrapeGraphAI()

res = sgai.extract(
    "Extract the main content",
    url="https://example.com",
    fetch_config=FetchConfig(mode="js", stealth=True, wait=2000),
)
```

## Async Support (Python)

```python theme={null}
import asyncio
from scrapegraph_py import AsyncScrapeGraphAI

async def main():
    async with AsyncScrapeGraphAI() as sgai:
        res = await sgai.extract(
            "Summarize what this product does",
            url="https://scrapegraphai.com",
        )
        if res.status == "success":
            print(res.data.json_data)

asyncio.run(main())
```

## Key Features

<CardGroup cols={2}>
  <Card title="Universal Compatibility" icon="globe">
    Works with any URL, raw HTML, or markdown input.
  </Card>

  <Card title="AI Understanding" icon="brain">
    Contextual extraction — no XPath or brittle selectors.
  </Card>

  <Card title="Structured Output" icon="table">
    JSON schema support for typed, predictable results.
  </Card>

  <Card title="Token Accounting" icon="coins">
    Response includes prompt/completion token usage.
  </Card>
</CardGroup>

## Integration Options

### Official SDKs

* [Python SDK](/sdks/python)
* [JavaScript SDK](/sdks/javascript) (`scrapegraph-js` ≥ 2.1.0, Node ≥ 22)

### AI Framework Integrations

* [LangChain Integration](/integrations/langchain)
* [LlamaIndex Integration](/integrations/llamaindex)

## Support & Resources

<CardGroup cols={2}>
  <Card title="Documentation" icon="book" href="/introduction">
    Guides and tutorials
  </Card>

  <Card title="API Reference" icon="code" href="/api-reference/introduction">
    Detailed API documentation
  </Card>

  <Card title="Community" icon="discord" href="https://discord.gg/uJN7TYcpNa">
    Join our Discord community
  </Card>

  <Card title="GitHub" icon="github" href="https://github.com/ScrapeGraphAI">
    Check out our open-source projects
  </Card>
</CardGroup>
