Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Agno ships a first-party ScrapeGraphTools toolkit at agno.tools.scrapegraph. One import, pass it to Agent(tools=[...]), and every ScrapeGraph endpoint is available to the model β€” no wrappers required.

Agno docs

Official Agno documentation

ScrapeGraphTools source

The toolkit on GitHub

Installation

pip install -U "agno @ git+https://github.com/agno-agi/agno.git#subdirectory=libs/agno" openai scrapegraph-py
Set your keys:
export SGAI_API_KEY="your-scrapegraph-key"
export OPENAI_API_KEY="your-openai-key"
Until the next Agno release ships the ScrapeGraph v2 rewrite, install Agno from main (as shown above). Agno is model-agnostic β€” swap OpenAIChat for Claude, Gemini, or any other supported model.

Quickstart

Enable every tool with all=True and let the model pick the right one per turn:
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.tools.scrapegraph import ScrapeGraphTools

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    tools=[ScrapeGraphTools(all=True)],
    markdown=True,
)

agent.print_response(
    "Use smartscraper on https://example.com to extract the page title and main heading. Return them as JSON.",
    stream=True,
)

Tools exposed

ToolSignatureScrapeGraph endpoint
smartscraper(url, prompt) -> strPOST /extract
markdownify(url) -> strPOST /scrape (markdown format)
searchscraper(query) -> strPOST /search
crawl(url, prompt, schema, max_depth=2, max_pages=2) -> strPOST /crawl (polls until complete)
scrape(url) -> strPOST /scrape (HTML format)
Each method returns a JSON string (or plain markdown for markdownify), which is what Agno hands back to the model.

Configuration

All knobs live on ScrapeGraphTools.__init__:
ArgumentDefaultPurpose
api_key$SGAI_API_KEYYour ScrapeGraph API key
enable_smartscraperTrueRegister smartscraper
enable_markdownifyFalseRegister markdownify
enable_searchscraperFalseRegister searchscraper
enable_crawlFalseRegister crawl
enable_scrapeFalseRegister scrape
allFalseShortcut: enable every tool
render_heavy_jsFalseRequest JavaScript rendering on every call
headersNoneCustom HTTP headers (User-Agent, Cookie, Authorization, …) applied to every fetch
crawl_poll_interval3Seconds between crawl status polls
crawl_max_wait180Max seconds to wait for a crawl to complete
Only enable what the agent needs β€” a tighter tool surface gives the model a smaller decision space and usually better routing.
tools = ScrapeGraphTools(
    enable_smartscraper=True,
    enable_markdownify=True,
    enable_scrape=True,
    render_heavy_js=True,
    headers={"User-Agent": "MyBot/1.0"},
)

Examples

Structured extraction with smartscraper

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    tools=[ScrapeGraphTools(enable_smartscraper=True)],
    markdown=True,
)

agent.print_response(
    "Extract the product name and price from "
    "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/ as JSON.",
    stream=True,
)

Markdown conversion with markdownify

agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    tools=[ScrapeGraphTools(enable_markdownify=True)],
    markdown=True,
)

agent.print_response(
    "Fetch https://scrapegraphai.com and summarize the top three product features from the markdown.",
    stream=True,
)

Multi-page extraction with crawl

crawl requires a JSON schema so every page contributes to the same shape. The toolkit polls until completion (bounded by crawl_max_wait).
agent = Agent(
    model=OpenAIChat(id="gpt-4o"),
    tools=[ScrapeGraphTools(enable_crawl=True, crawl_max_wait=600)],
    markdown=True,
)

agent.print_response(
    """Crawl https://books.toscrape.com with max_depth=2, max_pages=5.
    Use this schema: {"type": "object", "properties": {"books": {"type": "array", "items": {"type": "object", "properties": {"title": {"type": "string"}, "price": {"type": "string"}}}}}}.
    Prompt: 'Extract every book title and price on the page'. Return the merged JSON.""",
    stream=True,
)

Support

Python SDK

Source and issues for scrapegraph-py

Discord

Get help from our community