Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt

Use this file to discover all available pages before exploring further.

What is a timeout error?

A timeout error occurs when the ScrapeGraphAI API takes longer than the allowed time to fetch and process your request. This can happen when the target website is slow, the page is very complex, or you are crawling many pages at once. In the v2 SDK, timeouts surface as res.status === "error" with a timeout-related res.error. For crawl.start, the job record returned by crawl.get will show status == "failed" with the timeout reason.

Common causes

1. The target website is slow

Some websites have very slow response times, especially under load or in certain geographic regions. Fix: Raise FetchConfig(timeout=...) (ms), or rerun with a different country to route through a faster region.

2. The page has too much content

Very large pages (e.g., pages with thousands of products or articles) take longer to process. Fix: Narrow your prompt to target a specific section of the page, or use crawl.start with max_depth and max_pages limits to split the work.

3. JavaScript rendering takes too long

Pages that rely heavily on JavaScript, lazy loading, or infinite scroll may time out while waiting for content to appear. Fix: Use FetchConfig(mode="js", wait=3000) to give the page time to load, and tune scrolls for lazy-loaded content.
from scrapegraph_py import ScrapeGraphAI, FetchConfig

sgai = ScrapeGraphAI()

res = sgai.extract(
    "Extract all product listings",
    url="https://example.com",
    fetch_config=FetchConfig(mode="js", wait=3000, scrolls=2, timeout=60000),
)
See the JavaScript rendering guide for more.

Long-running work: use Crawl

For jobs that may take minutes (multi-page extraction, large sites), prefer crawl.start over a single extract call. Crawl is explicitly async — you start a job, then poll:
from scrapegraph_py import ScrapeGraphAI, JsonFormatConfig

sgai = ScrapeGraphAI()

start = sgai.crawl.start(
    "https://slow-website.com",
    formats=[JsonFormatConfig(prompt="Extract the main article content")],
    max_depth=1,
    max_pages=5,
)

while True:
    status = sgai.crawl.get(start.data.id)
    if status.data.status in ("completed", "failed"):
        break

Retry strategy

For transient timeouts, retry with a small delay. Check res.status instead of wrapping in try/except — v2 does not raise on API errors.
import time

def extract_with_timeout_retry(prompt, url, max_attempts=3):
    for attempt in range(max_attempts):
        res = sgai.extract(prompt, url=url)
        if res.status == "success":
            return res
        if "timeout" not in (res.error or "").lower():
            return res
        if attempt < max_attempts - 1:
            time.sleep(5)
    return res