Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt

Use this file to discover all available pages before exploring further.

Many websites spread their content across multiple pages — product listings, search results, articles. In v2 there are two approaches: let crawl.start follow links for you, or iterate page URLs manually with extract.

Using Crawl for multi-page extraction

crawl.start is the recommended service when you want to follow links automatically. It runs asynchronously — start a job and poll for the result.
from scrapegraph_py import ScrapeGraphAI, JsonFormatConfig

sgai = ScrapeGraphAI()

start = sgai.crawl.start(
    "https://example.com/products",
    formats=[JsonFormatConfig(prompt="Extract product names and prices")],
    max_depth=2,
    max_pages=50,
    include_patterns=["/products*"],
)

status = sgai.crawl.get(start.data.id)
print(f"{status.data.finished}/{status.data.total} - {status.data.status}")

Iterating page URLs with Extract

If you know the URL pattern for each page, call extract on each URL and aggregate results:
from scrapegraph_py import ScrapeGraphAI

sgai = ScrapeGraphAI()

all_products = []
for page in range(1, 6):  # pages 1-5
    url = f"https://example.com/products?page={page}"
    res = sgai.extract(
        "Extract all product names and prices on this page",
        url=url,
    )
    if res.status == "success":
        all_products.extend(res.data.json_data.get("products", []))

print(f"Total products extracted: {len(all_products)}")

Tips

  • Prefer crawl.start when the number or pattern of pages is unknown — it handles link discovery for you.
  • Use manual iteration when URLs follow a predictable pattern (?page=N) and you want tight control.
  • Add delays between pages in manual mode to avoid triggering rate limits on the target website.
  • Stop early when the extracted list is empty or a “no more results” marker appears.
  • For infinite-scroll pages, use FetchConfig(scrolls=N) instead of pagination.