Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt

Use this file to discover all available pages before exploring further.

If ScrapeGraphAI returns an empty result or a response with null fields, there are several common causes.

1. The page requires JavaScript rendering

Many modern websites load their content dynamically via JavaScript after the initial HTML is delivered. If the content is not in the raw HTML, the default fetch mode may miss it. Fix: Set mode="js" in FetchConfig and optionally add a wait time. See the JavaScript rendering guide.
from scrapegraph_py import ScrapeGraphAI, FetchConfig

sgai = ScrapeGraphAI()
res = sgai.extract(
    "Extract all product cards",
    url="https://example.com/products",
    fetch_config=FetchConfig(mode="js", wait=2000),
)

2. Your prompt is too vague

A prompt like "get the data" gives the LLM no guidance on what to look for. Fix: Be specific and descriptive.
# Too vague
prompt = "get the data"

# Better
prompt = "Extract the product name, current price, and stock availability from the product page"

3. The target element does not exist on that URL

Double-check that the data you want to extract actually appears on the URL you are passing. Some pages require login, cookies, or a specific session to show content. Fix: Open the URL in an incognito browser window and verify the content is visible without authentication. If it requires a session, pass cookies via FetchConfig.

4. The website blocks scrapers

Some websites detect and block automated requests, returning a captcha page or empty HTML. Fix: Enable stealth mode and custom headers with FetchConfig(mode="js", stealth=True, headers={...}). See the proxy & fetch configuration guide.

5. The output schema is too strict

If you pass a schema with required fields, the LLM will return null for fields it cannot find on the page. Fix: Make fields optional in your schema, or broaden the prompt to describe fallback behaviour.

6. Rate limiting or quota exceeded

If you have exhausted your credits or are being rate-limited, the API may return an error. Fix: Check your dashboard for remaining credits and current usage. See the rate limiting guide for how to handle 429 responses.

Debugging tips

  • Check res.status — on failure, res.error contains the reason and res.data is None.
  • Log res.data.json_data (Python) / res.data.json (JS) to see exactly what the LLM produced.
  • Test the URL with a simple prompt like "What is the main heading of this page?" to verify that extraction works at all.
  • Call sgai.scrape(url, formats=[HtmlFormatConfig()]) to see the raw HTML the extractor received — that often reveals blocking pages or missing content.
  • Use the interactive playground to test your URL and prompt before integrating.