Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt

Use this file to discover all available pages before exploring further.

ScrapeGraph API Banner

NPM Package

npm version

License

License
These docs cover scrapegraph-js β‰₯ 2.1.0. The v2 SDK is ESM-only and requires Node β‰₯ 22. Earlier 0.x/1.x releases expose a different, deprecated API.
Breaking in 2.1.0 (types only): all exported TypeScript types and Zod schemas dropped the Api prefix and now match scrapegraph-py 1:1 (ApiScrapeRequest β†’ ScrapeRequest, ApiFetchConfig β†’ FetchConfig, apiScrapeRequestSchema β†’ scrapeRequestSchema, etc.). Monitor input types are also renamed: ApiMonitorCreateInput β†’ MonitorCreateRequest, ApiMonitorUpdateInput β†’ MonitorUpdateRequest, ApiMonitorActivityParams β†’ MonitorActivityRequest. ApiResult<T> is the only type that keeps the prefix. Runtime JS code is unchanged β€” only TypeScript consumers need to rename imports.

Installation

# npm
npm i scrapegraph-js@latest     # pins a version >= 2.1.0

# pnpm
pnpm add scrapegraph-js@latest

# yarn
yarn add scrapegraph-js@latest

# bun
bun add scrapegraph-js@latest

What’s new in v2

  • New entry point: import { ScrapeGraphAI } from "scrapegraph-js" and instantiate once β€” no more passing the API key to every call.
  • Nested resources: sgai.crawl.*, sgai.monitor.*, sgai.history.*.
  • ApiResult<T> wrapper: no throws β€” every call returns { status, data, error, elapsedMs }.
  • Auto-picks the API key from SGAI_API_KEY (or pass { apiKey } to the factory).
  • Removed: markdownify, agenticScraper, sitemap, feedback β€” use sgai.scrape() with the right format entry instead.
v2 is a breaking change. See the Migration Guide if you’re upgrading from v1.

Quick Start

import { ScrapeGraphAI } from "scrapegraph-js";

// reads SGAI_API_KEY from env, or pass explicitly: ScrapeGraphAI({ apiKey: "..." })
const sgai = ScrapeGraphAI();

const res = await sgai.scrape({
  url: "https://example.com",
  formats: [{ type: "markdown" }],
});

if (res.status === "success") {
  console.log(res.data?.results.markdown?.data?.[0]);
} else {
  console.error(res.error);
}
Store your API keys securely in environment variables. Use .env files and libraries like dotenv to load them into your app.

Return Type

Every method returns ApiResult<T>:
type ApiResult<T> = {
  status: "success" | "error";
  data: T | null;
  error?: string;
  elapsedMs: number;
};
Check res.status before accessing res.data.

Services

sgai.scrape()

Fetch a page in one or more formats (markdown, html, screenshot, json, links, images, summary, branding).
const res = await sgai.scrape({
  url: "https://example.com",
  formats: [
    { type: "markdown", mode: "reader" },
    { type: "screenshot", fullPage: true, width: 1440, height: 900 },
    { type: "json", prompt: "Extract product info" },
  ],
  contentType: "text/html",      // optional, auto-detected
  fetchConfig: {                 // optional
    mode: "js",
    stealth: true,
    timeout: 30000,
    wait: 2000,
    scrolls: 3,
  },
});

Parameters

ParameterTypeRequiredDescription
urlstringYesURL to scrape
formatsFormatConfig[]NoDefaults to [{ type: "markdown" }]
contentTypestringNoOverride detected content type (e.g. "application/pdf")
fetchConfigFetchConfigNoFetch configuration
Formats:
  • markdown β€” Clean markdown (modes: normal, reader, prune)
  • html β€” Raw HTML (modes: normal, reader, prune)
  • links β€” All links on the page
  • images β€” All image URLs
  • summary β€” AI-generated summary
  • json β€” Structured extraction with prompt/schema
  • branding β€” Brand colors, typography, logos
  • screenshot β€” Page screenshot (fullPage, width, height, quality)
const res = await sgai.scrape({
  url: "https://example.com",
  formats: [
    { type: "markdown", mode: "reader" },
    { type: "links" },
    { type: "images" },
    { type: "screenshot", fullPage: false, width: 1440, height: 900, quality: 90 },
  ],
});

if (res.status === "success") {
  const r = res.data?.results;
  console.log("Markdown:", r?.markdown?.data?.[0]?.slice(0, 200));
  console.log("Links:", r?.links?.metadata?.count);
  console.log("Screenshot URL:", r?.screenshot?.data.url);
}

sgai.extract()

Extract structured data from a URL, HTML, or markdown.
const res = await sgai.extract({
  url: "https://example.com",
  prompt: "Extract the main heading and description",
});

if (res.status === "success") {
  console.log(res.data?.json);
  console.log("Tokens:", res.data?.usage);
}

Parameters

ParameterTypeRequiredDescription
urlstringYes*URL of the page
htmlstringYes*Raw HTML (alternative to url)
markdownstringYes*Raw markdown (alternative to url)
promptstringYesWhat to extract
schemaobjectNoJSON schema for structured output
modestringNoHTML processing mode: "normal", "reader", "prune"
contentTypestringNoOverride the detected content type
fetchConfigFetchConfigNoFetch configuration
*One of url, html, or markdown is required.
const res = await sgai.extract({
  url: "https://example.com/article",
  prompt: "Extract the article information",
  schema: {
    type: "object",
    properties: {
      title: { type: "string" },
      author: { type: "string" },
      publishDate: { type: "string" },
      content: { type: "string" },
    },
    required: ["title"],
  },
});

if (res.status === "success") {
  console.log(res.data?.json);
}
Web search with optional AI extraction.
const res = await sgai.search({
  query: "best programming languages 2024",
  numResults: 5,
});

if (res.status === "success") {
  for (const r of res.data?.results ?? []) {
    console.log(`${r.title} - ${r.url}`);
  }
}

Parameters

ParameterTypeRequiredDescription
querystringYesSearch query (1–500 chars)
numResultsnumberNoNumber of results (1–20). Default: 3
promptstringNoPrompt for AI extraction from the fetched results
schemaobjectNoJSON schema (requires prompt)
formatstringNo"markdown" (default) or "html"
timeRangestringNo"past_hour", "past_24_hours", "past_week", "past_month", "past_year"
locationGeoCodestringNoTwo-letter country code (e.g. "us")
fetchConfigFetchConfigNoFetch configuration
const res = await sgai.search({
  query: "typescript best practices",
  numResults: 5,
  prompt: "Extract the main tips and recommendations",
  schema: {
    type: "object",
    properties: {
      tips: { type: "array", items: { type: "string" } },
    },
  },
});

if (res.status === "success") {
  console.log("Results:", res.data?.results.length);
  console.log("Extracted:", res.data?.json);
}

sgai.crawl.*

Crawl a site. Access the resource via sgai.crawl.
const start = await sgai.crawl.start({
  url: "https://example.com",
  formats: [{ type: "markdown" }],
  maxPages: 50,
  maxDepth: 2,
  maxLinksPerPage: 10,
  includePatterns: ["/blog/*"],
  excludePatterns: ["/admin/*"],
});

const crawlId = start.data?.id;

// Status
await sgai.crawl.get(crawlId);

// Control
await sgai.crawl.stop(crawlId);
await sgai.crawl.resume(crawlId);
await sgai.crawl.delete(crawlId);

crawl.start() parameters

ParameterTypeRequiredDescription
urlstringYesStarting URL
formatsFormatConfig[]NoDefaults to [{ type: "markdown" }]
maxDepthnumberNoMaximum crawl depth. Default: 2
maxPagesnumberNoMaximum pages (1–1000). Default: 50
maxLinksPerPagenumberNoLinks followed per page. Default: 10
allowExternalbooleanNoAllow crossing domains. Default: false
includePatternsstring[]NoURL patterns to include
excludePatternsstring[]NoURL patterns to exclude
contentTypesstring[]NoAllowed content types
fetchConfigFetchConfigNoFetch configuration

sgai.monitor.*

Scheduled monitoring jobs.
// Create
const res = await sgai.monitor.create({
  url: "https://example.com",
  name: "Price Monitor",
  interval: "0 * * * *",       // cron expression
  formats: [{ type: "markdown" }],
  webhookUrl: "https://...",   // optional
});

const cronId = res.data?.cronId;

// Manage
await sgai.monitor.list();
await sgai.monitor.get(cronId);
await sgai.monitor.update(cronId, { interval: "0 */6 * * *" });
await sgai.monitor.pause(cronId);
await sgai.monitor.resume(cronId);
await sgai.monitor.delete(cronId);

monitor.activity() β€” poll tick history

Paginate through per-run ticks.
const activity = await sgai.monitor.activity(cronId, { limit: 20 });

if (activity.status === "success") {
  for (const tick of activity.data?.ticks ?? []) {
    const changed = tick.changed ? "CHANGED" : "no change";
    console.log(`[${tick.createdAt}] ${tick.status} - ${changed} (${tick.elapsedMs}ms)`);
  }

  if (activity.data?.nextCursor) {
    const next = await sgai.monitor.activity(cronId, {
      limit: 20,
      cursor: activity.data.nextCursor,
    });
  }
}
Params: limit (1–100, default 20) and cursor for pagination. Each tick exposes id, createdAt, status, changed, elapsedMs, and diffs.

sgai.history.*

const list = await sgai.history.list({
  service: "scrape",   // optional filter
  page: 1,
  limit: 20,
});

const entry = await sgai.history.get("request-id");

sgai.credits() / sgai.healthy()

const credits = await sgai.credits();
// { remaining: 1000, used: 500, plan: "pro", jobs: { crawl: {...}, monitor: {...} } }

const health = await sgai.healthy();
// { status: "ok", uptime: 12345 }

Configuration Objects

FetchConfig

Controls how pages are fetched. See the proxy configuration guide for details.
{
  mode: "js",          // "auto" (default) | "fast" | "js"
  stealth: true,        // Residential proxies / anti-bot headers
  timeout: 15000,       // ms (1000–60000)
  wait: 2000,           // ms after page load (0–30000)
  scrolls: 3,           // 0–100
  country: "us",        // ISO 3166-1 alpha-2
  headers: { "X-Custom": "header" },
  cookies: { key: "value" },
  mock: false,          // Enable mock mode for testing
}

Error Handling

const res = await sgai.extract({
  url: "https://example.com",
  prompt: "Extract the title",
});

if (res.status === "success") {
  console.log(res.data);
} else {
  console.error(`Request failed: ${res.error}`);
}

Environment Variables

VariableDescriptionDefault
SGAI_API_KEYYour ScrapeGraphAI API keyβ€”
SGAI_API_URLOverride API base URLhttps://v2-api.scrapegraphai.com/api
SGAI_DEBUGEnable debug logging ("1")off
SGAI_TIMEOUTRequest timeout in seconds120

Support

GitHub

Report issues and contribute to the SDK

Email Support

Get help from our development team