Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Schema turns a plain-English description of the data you want into a valid JSON Schema you can pass to Extract, Search, or Monitor as output_schema. Optionally seed it with an existing_schema to extend rather than start from scratch. Use it when you want strongly-typed output but don’t want to hand-write the schema.

Pricing

Each Schema call costs 1 credit. See the pricing page for the full breakdown.

Getting Started

Quick Start

from scrapegraph_py import ScrapeGraphAI

sgai = ScrapeGraphAI()

res = sgai.schema(
    prompt="A product listing on an e-commerce site. Include name, price (number), currency, in_stock (boolean), rating (0-5), and a list of review excerpts."
)

print(res.data.schema)

Parameters

ParameterTypeRequiredDescription
promptstringYesNatural-language description of the schema to generate.
existing_schemaobject | stringNoExisting JSON Schema (object or JSON string) to extend with the new fields described in prompt.
modelstringNoOptional LLM model override.

Response

{
  "refinedPrompt": "Extract all product listings with their name, price, currency, stock status, rating, and review excerpts from the e-commerce site",
  "schema": {
    "$defs": {
      "ItemSchema": {
        "title": "ItemSchema",
        "type": "object",
        "properties": {
          "name": { "title": "Name", "description": "Name of the product", "type": "string" },
          "price": { "title": "Price", "description": "Price of the product as a number", "type": "number" },
          "currency": { "title": "Currency", "description": "Currency code for the price (e.g., USD, EUR)", "type": "string" },
          "in_stock": { "title": "In Stock", "description": "Whether the product is currently in stock", "type": "boolean" },
          "rating": { "title": "Rating", "description": "Product rating on a scale from 0 to 5", "type": "number", "minimum": 0, "maximum": 5 },
          "review_excerpts": { "title": "Review Excerpts", "description": "List of short review excerpts for the product", "type": "array", "items": { "type": "string" } }
        },
        "required": ["name", "price", "currency", "in_stock", "rating", "review_excerpts"]
      }
    },
    "title": "MainSchema",
    "type": "object",
    "properties": {
      "items": {
        "title": "Items",
        "description": "Array of product listings",
        "type": "array",
        "items": { "$ref": "#/$defs/ItemSchema" }
      }
    },
    "required": ["items"]
  },
  "usage": { "promptTokens": 1160, "completionTokens": 743 }
}

Extending an existing schema

Pass existing_schema to grow a schema you already have rather than regenerating from scratch:
existing = {
    "title": "Product",
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "price": {"type": "number"}
    },
    "required": ["name", "price"]
}

res = sgai.schema(
    prompt="Add brand, sku, and a list of category tags.",
    existing_schema=existing,
)

Using the generated schema

Pipe the returned schema directly into Extract, Search, or Monitor as output_schema:
schema_res = sgai.schema(prompt="A blog post with title, author, published_at (ISO date), and tags[].")
generated_schema = schema_res.data.schema

extract_res = sgai.extract(
    "Extract the post details.",
    url="https://example.com/blog/post-slug",
    output_schema=generated_schema,
)
print(extract_res.data.json_data)

When to use Schema

  • βœ… You want structured output but don’t have a hand-written schema yet
  • βœ… You’re prototyping and want a quick starting point you’ll refine
  • βœ… You have a partial schema and want to grow it
  • ❌ You already have a finalized JSON Schema β€” pass it directly to Extract/Search and skip Schema

See also

  • Extract β€” Use output_schema for typed extraction
  • Search β€” Use output_schema for typed search results
  • Monitor β€” Use output_schema on scheduled jobs