Documentation Index
Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Extract uses an LLM to pull structured data from a URL, HTML, or markdown. Provide a prompt (and optionally a JSON schema) and it returns typed JSON — no selectors or post-processing required.Try Extract instantly in our interactive playground.
Pricing
Each Extract call costs 5 credits. Enablingstealth in fetchConfig adds 5 credits; render mode (auto / fast / js) does not affect the cost. See the pricing page for the full breakdown.
Getting Started
Quick Start
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Cond. | URL of the page to extract from. One of url, html, or markdown is required. |
html | string | Cond. | Raw HTML to extract from. |
markdown | string | Cond. | Markdown content to extract from. |
prompt | string | Yes | Natural-language description of what to extract. |
schema | object | No | JSON schema describing the desired output shape. In Python you can pass a Pydantic model via MyModel.model_json_schema(). |
mode | string | No | HTML processing mode: "normal", "reader", "prune". |
fetchConfig / fetch_config | object | No | Fetch options (see Scrape · FetchConfig). |
Get your API key from the dashboard.
Example Response
Example Response
With a JSON Schema
Pass a JSON schema to pin down the exact output shape.With a Pydantic Schema (Python)
If you already model your data with Pydantic, use the sameBaseModel to drive the extraction. model_json_schema() produces the JSON Schema dict the API expects, and model_validate() parses the response back into typed objects.
The wire format is JSON Schema either way —
model_json_schema() is just the standard Pydantic v2 helper that produces it. Field descriptions are forwarded to the LLM and improve extraction quality on ambiguous fields.Extract from HTML or Markdown
Skip the fetch and extract from content you already have.FetchConfig
Control how the page is fetched before extraction (JS rendering, stealth, headers, etc). See the full options in Scrape · FetchConfig.Async Support (Python)
Key Features
Universal Compatibility
Works with any URL, raw HTML, or markdown input.
AI Understanding
Contextual extraction — no XPath or brittle selectors.
Structured Output
JSON schema support for typed, predictable results.
Token Accounting
Response includes prompt/completion token usage.
Integration Options
Official SDKs
- Python SDK
- JavaScript SDK (
scrapegraph-js≥ 2.1.0, Node ≥ 22)
AI Framework Integrations
Support & Resources
Documentation
Guides and tutorials
API Reference
Detailed API documentation
Community
Join our Discord community
GitHub
Check out our open-source projects

