> ## Documentation Index > Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt > Use this file to discover all available pages before exploring further. # Extract > AI-powered structured data extraction from any webpage ## Overview Extract uses an LLM to pull structured data from a URL, HTML, or markdown. Provide a prompt (and optionally a JSON schema) and it returns typed JSON — no selectors or post-processing required. Try Extract instantly in our [interactive playground](https://scrapegraphai.com/dashboard). ## Pricing Each Extract call costs **5 credits**. Enabling `stealth` in `fetchConfig` adds 5 credits; render mode (`auto` / `fast` / `js`) does not affect the cost. See the [pricing page](https://scrapegraphai.com/pricing) for the full breakdown. ## Getting Started ### Quick Start ```python Python theme={null} from scrapegraph_py import ScrapeGraphAI # reads SGAI_API_KEY from env, or pass explicitly: ScrapeGraphAI(api_key="...") sgai = ScrapeGraphAI() res = sgai.extract( "What does the company do? Extract name and description.", url="https://scrapegraphai.com", ) if res.status == "success": print(res.data.json_data) else: print("Failed:", res.error) ``` ```javascript JavaScript theme={null} import { ScrapeGraphAI } from "scrapegraph-js"; const sgai = ScrapeGraphAI(); const res = await sgai.extract({ url: "https://scrapegraphai.com", prompt: "What does the company do? Extract name and description.", }); if (res.status === "success") { console.log(res.data?.json); console.log("Tokens used:", res.data?.usage); } else { console.error(res.error); } ``` ```bash cURL theme={null} curl -X POST https://v2-api.scrapegraphai.com/api/extract \ -H "SGAI-APIKEY: $SGAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://scrapegraphai.com", "prompt": "What does the company do? Extract name and description." }' ``` #### Parameters | Parameter | Type | Required | Description | | ------------------------------ | ------ | -------- | --------------------------------------------------------------------------------------------------------------------------- | | `url` | string | Cond. | URL of the page to extract from. One of `url`, `html`, or `markdown` is required. | | `html` | string | Cond. | Raw HTML to extract from. | | `markdown` | string | Cond. | Markdown content to extract from. | | `prompt` | string | Yes | Natural-language description of what to extract. | | `schema` | object | No | JSON schema describing the desired output shape. In Python you can pass a Pydantic model via `MyModel.model_json_schema()`. | | `mode` | string | No | HTML processing mode: `"normal"`, `"reader"`, `"prune"`. | | `fetchConfig` / `fetch_config` | object | No | Fetch options (see [Scrape · FetchConfig](/services/scrape#fetchconfig)). | Get your API key from the [dashboard](https://scrapegraphai.com/dashboard). ```json theme={null} { "id": "9a2178b6-2525-4f98-85e6-9f8c7da17541", "raw": null, "json": { "name": "ScrapeGraphAI", "description": "ScrapeGraphAI is an AI-powered web scraping platform that uses natural language prompts to turn any webpage into structured data via a simple API." }, "usage": { "promptTokens": 10002, "completionTokens": 509 }, "metadata": { "chunker": { "chunks": [{ "size": 5000 }, { "size": 2535 }] }, "fetch": {} } } ``` ## With a JSON Schema Pass a JSON schema to pin down the exact output shape. ```python Python theme={null} from scrapegraph_py import ScrapeGraphAI sgai = ScrapeGraphAI() res = sgai.extract( "Extract structured information about this page", url="https://example.com", schema={ "type": "object", "properties": { "title": {"type": "string"}, "description": {"type": "string"}, "links": {"type": "array", "items": {"type": "string"}}, }, "required": ["title"], }, ) if res.status == "success": print(res.data.json_data) ``` ```javascript JavaScript theme={null} import { ScrapeGraphAI } from "scrapegraph-js"; const sgai = ScrapeGraphAI(); const res = await sgai.extract({ url: "https://example.com", prompt: "Extract the page title and description", schema: { type: "object", properties: { title: { type: "string" }, description: { type: "string" }, }, required: ["title"], }, }); if (res.status === "success") { console.log(res.data?.json); } ``` ```bash cURL theme={null} curl -X POST https://v2-api.scrapegraphai.com/api/extract \ -H "SGAI-APIKEY: $SGAI_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com", "prompt": "Extract the page title and description", "schema": { "type": "object", "properties": { "title": {"type": "string"}, "description": {"type": "string"} }, "required": ["title"] } }' ``` ## With a Pydantic Schema (Python) If you already model your data with [Pydantic](https://docs.pydantic.dev), use the same `BaseModel` to drive the extraction. `model_json_schema()` produces the JSON Schema dict the API expects, and `model_validate()` parses the response back into typed objects. ```python theme={null} from pydantic import BaseModel, Field from scrapegraph_py import ScrapeGraphAI class Product(BaseModel): name: str = Field(description="Product name") price: str | None = Field(default=None, description="Listed price, if any") class Products(BaseModel): products: list[Product] = Field(default_factory=list) sgai = ScrapeGraphAI() res = sgai.extract( "Extract product names and prices", url="https://example.com", schema=Products.model_json_schema(), ) if res.status == "success": parsed = Products.model_validate(res.data.json_data) for p in parsed.products: print(p.name, p.price) ``` The wire format is JSON Schema either way — `model_json_schema()` is just the standard Pydantic v2 helper that produces it. Field descriptions are forwarded to the LLM and improve extraction quality on ambiguous fields. ## Extract from HTML or Markdown Skip the fetch and extract from content you already have. ```python theme={null} from scrapegraph_py import ScrapeGraphAI sgai = ScrapeGraphAI() res = sgai.extract( "Extract product name and price", html="

Widget

$9.99

", ) ``` ## FetchConfig Control how the page is fetched before extraction (JS rendering, stealth, headers, etc). See the full options in [Scrape · FetchConfig](/services/scrape#fetchconfig). ```python theme={null} from scrapegraph_py import ScrapeGraphAI, FetchConfig sgai = ScrapeGraphAI() res = sgai.extract( "Extract the main content", url="https://example.com", fetch_config=FetchConfig(mode="js", stealth=True, wait=2000), ) ``` ## Async Support (Python) ```python theme={null} import asyncio from scrapegraph_py import AsyncScrapeGraphAI async def main(): async with AsyncScrapeGraphAI() as sgai: res = await sgai.extract( "Summarize what this product does", url="https://scrapegraphai.com", ) if res.status == "success": print(res.data.json_data) asyncio.run(main()) ``` ## Key Features Works with any URL, raw HTML, or markdown input. Contextual extraction — no XPath or brittle selectors. JSON schema support for typed, predictable results. Response includes prompt/completion token usage. ## Integration Options ### Official SDKs * [Python SDK](/sdks/python) * [JavaScript SDK](/sdks/javascript) (`scrapegraph-js` ≥ 2.1.0, Node ≥ 22) ### AI Framework Integrations * [LangChain Integration](/integrations/langchain) * [LlamaIndex Integration](/integrations/llamaindex) ## Support & Resources Guides and tutorials Detailed API documentation Join our Discord community Check out our open-source projects