Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt

Use this file to discover all available pages before exploring further.

Some websites require specific HTTP headers to return content — authentication tokens, cookies, custom user agents, or API keys embedded in headers.

How to pass headers

In v2 all fetch behaviour — including custom headers and cookies — is configured through FetchConfig. It’s accepted by sgai.extract(), sgai.scrape(), sgai.search(), and sgai.crawl.start().
from scrapegraph_py import ScrapeGraphAI, FetchConfig

sgai = ScrapeGraphAI()

res = sgai.extract(
    "Extract the main content",
    url="https://example.com/protected-page",
    fetch_config=FetchConfig(
        headers={
            "Authorization": "Bearer your-token-here",
            "User-Agent": "Mozilla/5.0 (compatible; MyBot/1.0)",
        },
        cookies={"session": "abc123"},
    ),
)
See the proxy & fetch configuration guide for all FetchConfig options.

Common use cases

Export cookies from your browser (e.g., using a browser extension like EditThisCookie) and pass them via cookies:
fetch_config=FetchConfig(cookies={"user_session": "abc123", "_ga": "GA1.2.xyz"})

Mimicking a real browser

Some sites block requests without a browser-like User-Agent:
fetch_config=FetchConfig(headers={
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
})
For stronger anti-bot protection, combine with stealth=True and mode="js" — see the proxy guide.

Bearer token authentication

For APIs or protected dashboards:
fetch_config=FetchConfig(headers={
    "Authorization": "Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..."
})

Tips

  • Headers and cookies are sent to the target website, not to the ScrapeGraphAI API.
  • Keep sensitive tokens out of your source code — load them from environment variables.
  • If you are unsure which headers to pass, open the target URL in your browser, go to DevTools → Network, and inspect the request headers of a successful page load.