Extract

POST https://v2-api.scrapegraphai.com/api/extract

Replaces the v1 smartscraper endpoint. Provide a prompt (and optionally a JSON schema) and get typed JSON back — no selectors or post-processing needed.

Request body

Exactly one of url, html, or markdown must be supplied as the source.

string

URL of the page to extract from.

string

Raw HTML content to extract from (max 2 MB).

string

Markdown content to extract from (max 2 MB).

string

required

Natural-language description of what to extract.

object

JSON schema describing the desired output shape. When provided, the LLM is constrained to match it.

string

HTML pre-processing mode: "normal", "reader", or "prune".

object

Fetch-time options. See the Scrape endpoint for the full field list (mode, stealth, headers, cookies, scrolls, wait, timeout, country). Ignored when html or markdown is supplied.

Example request

curl -X POST https://v2-api.scrapegraphai.com/api/extract \
  -H "SGAI-APIKEY: $SGAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "prompt": "What is the title of this page?"
  }'

Example response

{
  "id": "8c34fc03-17be-4fcc-a7ce-6ebcab23ad43",
  "raw": null,
  "json": {
    "title": "Example Domain"
  },
  "usage": {
    "promptTokens": 361,
    "completionTokens": 92
  },
  "metadata": {
    "chunker": { "chunks": [{ "size": 33 }] },
    "fetch": {}
  }
}

Field	Description
`id`	UUID for this extract call.
`json`	Structured output matching the schema (or free-form JSON when no schema is supplied).
`raw`	Raw model output before JSON parsing, when available.
`usage.promptTokens` / `usage.completionTokens`	LLM token accounting.
`metadata.chunker`	How the source content was split before extraction.
`metadata.fetch`	Fetch diagnostics (populated when the page was fetched by the API).

With a schema

curl -X POST https://v2-api.scrapegraphai.com/api/extract \
  -H "SGAI-APIKEY: $SGAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "prompt": "Extract the page title and description",
    "schema": {
      "type": "object",
      "properties": {
        "title":       { "type": "string" },
        "description": { "type": "string" }
      },
      "required": ["title"]
    }
  }'

Extract from HTML or markdown

curl -X POST https://v2-api.scrapegraphai.com/api/extract \
  -H "SGAI-APIKEY: $SGAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "html": "<html><body><h1>Widget</h1><p>$9.99</p></body></html>",
    "prompt": "Extract product name and price"
  }'

Service overview: Extract
SDK wrappers: Python · JavaScript

API Documentation

Scrape

Search

Crawl

Monitor

History

Account

Extract

Request body

Example request

Example response

With a schema

Extract from HTML or markdown

​Request body

​Example request

​Example response

​With a schema

​Extract from HTML or markdown

​Related

Request body

Example request

Example response

With a schema

Extract from HTML or markdown

Related