Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt

Use this file to discover all available pages before exploring further.

Many modern websites — single-page apps, React or Vue frontends, lazy-loaded content — do not include their data in the initial HTML. The content is only visible after JavaScript runs in the browser.

How ScrapeGraphAI handles JS pages

ScrapeGraphAI can render JavaScript with a headless browser before extracting content. Enable it with FetchConfig(mode="js") — the default auto mode will also pick the browser when needed.

Use wait for delayed content

If the content loads after a short delay (lazy loading, carousels, infinite scroll), add a wait time (ms) before extraction starts:
from scrapegraph_py import ScrapeGraphAI, FetchConfig

sgai = ScrapeGraphAI()

res = sgai.extract(
    "Extract all product names and prices",
    url="https://example.com/products",
    fetch_config=FetchConfig(mode="js", wait=2000),
)

if res.status == "success":
    print(res.data.json_data)

Tips for specific scenarios

Infinite scroll / lazy loading

Use the scrolls option in FetchConfig to scroll the page a given number of times before extracting, triggering lazy-loaded items:
res = sgai.extract(
    "Extract all product cards",
    url="https://example.com/feed",
    fetch_config=FetchConfig(mode="js", scrolls=5, wait=1000),
)
For sites that truly split content across multiple URLs, use crawl.start to follow paginated links automatically.

Login-gated content

If the data requires authentication, pass the required cookies or session tokens via FetchConfig:
res = sgai.extract(
    "Extract my account balance",
    url="https://example.com/dashboard",
    fetch_config=FetchConfig(
        mode="js",
        cookies={"session": "abc123", "auth_token": "xyz"},
    ),
)

Single Page Applications (SPAs)

SPAs render content client-side after the initial load. Increasing wait usually resolves extraction issues. If not, check whether the data is available through the site’s own API (Network tab in DevTools) — that may be easier to call directly.

Verifying the rendered HTML

To debug, call sgai.scrape() with HtmlFormatConfig (Python) or { type: "html" } (JS) to see the exact HTML delivered after rendering, then compare to the raw HTML.