Handling pagination - ScrapeGraphAI

Many websites spread their content across multiple pages — product listings, search results, articles. In v2 there are two approaches: let crawl.start follow links for you, or iterate page URLs manually with extract.

Using Crawl for multi-page extraction

crawl.start is the recommended service when you want to follow links automatically. It runs asynchronously — start a job and poll for the result.

from scrapegraph_py import ScrapeGraphAI, JsonFormatConfig

sgai = ScrapeGraphAI()

start = sgai.crawl.start(
    "https://example.com/products",
    formats=[JsonFormatConfig(prompt="Extract product names and prices")],
    max_depth=2,
    max_pages=50,
    include_patterns=["/products*"],
)

status = sgai.crawl.get(start.data.id)
print(f"{status.data.finished}/{status.data.total} - {status.data.status}")

import { ScrapeGraphAI } from "scrapegraph-js";

const sgai = ScrapeGraphAI();

const start = await sgai.crawl.start({
  url: "https://example.com/products",
  formats: [{ type: "json", prompt: "Extract product names and prices" }],
  maxDepth: 2,
  maxPages: 50,
  includePatterns: ["/products*"],
});

const status = await sgai.crawl.get(start.data.id);

Iterating page URLs with Extract

If you know the URL pattern for each page, call extract on each URL and aggregate results:

from scrapegraph_py import ScrapeGraphAI

sgai = ScrapeGraphAI()

all_products = []
for page in range(1, 6):  # pages 1-5
    url = f"https://example.com/products?page={page}"
    res = sgai.extract(
        "Extract all product names and prices on this page",
        url=url,
    )
    if res.status == "success":
        all_products.extend(res.data.json_data.get("products", []))

print(f"Total products extracted: {len(all_products)}")

import { ScrapeGraphAI } from "scrapegraph-js";

const sgai = ScrapeGraphAI();
const allProducts = [];

for (let page = 1; page <= 5; page++) {
  const url = `https://example.com/products?page=${page}`;
  const res = await sgai.extract({
    url,
    prompt: "Extract all product names and prices on this page",
  });
  if (res.status === "success") {
    allProducts.push(...(res.data?.json?.products ?? []));
  }
}

Tips

Prefer crawl.start when the number or pattern of pages is unknown — it handles link discovery for you.
Use manual iteration when URLs follow a predictable pattern (?page=N) and you want tight control.
Add delays between pages in manual mode to avoid triggering rate limits on the target website.
Stop early when the extracted list is empty or a “no more results” marker appears.
For infinite-scroll pages, use FetchConfig(scrolls=N) instead of pagination.

​Using Crawl for multi-page extraction

​Iterating page URLs with Extract

​Tips

Using Crawl for multi-page extraction

Iterating page URLs with Extract

Tips