Documentation Index
Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt
Use this file to discover all available pages before exploring further.
These docs cover
scrapegraph-js β₯ 2.1.0. The v2 SDK is ESM-only and requires Node β₯ 22. Earlier 0.x/1.x releases expose a different, deprecated API.Installation
Whatβs new in v2
- New entry point:
import { ScrapeGraphAI } from "scrapegraph-js"and instantiate once β no more passing the API key to every call. - Nested resources:
sgai.crawl.*,sgai.monitor.*,sgai.history.*. ApiResult<T>wrapper: no throws β every call returns{ status, data, error, elapsedMs }.- Auto-picks the API key from
SGAI_API_KEY(or pass{ apiKey }to the factory). - Removed:
markdownify,agenticScraper,sitemap,feedbackβ usesgai.scrape()with the right format entry instead.
Quick Start
Store your API keys securely in environment variables. Use
.env files and libraries like dotenv to load them into your app.Return Type
Every method returnsApiResult<T>:
res.status before accessing res.data.
Services
sgai.scrape()
Fetch a page in one or more formats (markdown, html, screenshot, json, links, images, summary, branding).
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | URL to scrape |
formats | FormatConfig[] | No | Defaults to [{ type: "markdown" }] |
contentType | string | No | Override detected content type (e.g. "application/pdf") |
fetchConfig | FetchConfig | No | Fetch configuration |
markdownβ Clean markdown (modes:normal,reader,prune)htmlβ Raw HTML (modes:normal,reader,prune)linksβ All links on the pageimagesβ All image URLssummaryβ AI-generated summaryjsonβ Structured extraction with prompt/schemabrandingβ Brand colors, typography, logosscreenshotβ Page screenshot (fullPage,width,height,quality)
Multi-format example
Multi-format example
sgai.extract()
Extract structured data from a URL, HTML, or markdown.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes* | URL of the page |
html | string | Yes* | Raw HTML (alternative to url) |
markdown | string | Yes* | Raw markdown (alternative to url) |
prompt | string | Yes | What to extract |
schema | object | No | JSON schema for structured output |
mode | string | No | HTML processing mode: "normal", "reader", "prune" |
contentType | string | No | Override the detected content type |
fetchConfig | FetchConfig | No | Fetch configuration |
*One of
url, html, or markdown is required.With a JSON schema
With a JSON schema
sgai.search()
Web search with optional AI extraction.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
query | string | Yes | Search query (1β500 chars) |
numResults | number | No | Number of results (1β20). Default: 3 |
prompt | string | No | Prompt for AI extraction from the fetched results |
schema | object | No | JSON schema (requires prompt) |
format | string | No | "markdown" (default) or "html" |
timeRange | string | No | "past_hour", "past_24_hours", "past_week", "past_month", "past_year" |
locationGeoCode | string | No | Two-letter country code (e.g. "us") |
fetchConfig | FetchConfig | No | Fetch configuration |
Search + extraction
Search + extraction
sgai.crawl.*
Crawl a site. Access the resource via sgai.crawl.
crawl.start() parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Starting URL |
formats | FormatConfig[] | No | Defaults to [{ type: "markdown" }] |
maxDepth | number | No | Maximum crawl depth. Default: 2 |
maxPages | number | No | Maximum pages (1β1000). Default: 50 |
maxLinksPerPage | number | No | Links followed per page. Default: 10 |
allowExternal | boolean | No | Allow crossing domains. Default: false |
includePatterns | string[] | No | URL patterns to include |
excludePatterns | string[] | No | URL patterns to exclude |
contentTypes | string[] | No | Allowed content types |
fetchConfig | FetchConfig | No | Fetch configuration |
sgai.monitor.*
Scheduled monitoring jobs.
monitor.activity() β poll tick history
Paginate through per-run ticks.
limit (1β100, default 20) and cursor for pagination. Each tick exposes id, createdAt, status, changed, elapsedMs, and diffs.
sgai.history.*
sgai.credits() / sgai.healthy()
Configuration Objects
FetchConfig
Controls how pages are fetched. See the proxy configuration guide for details.Error Handling
Environment Variables
| Variable | Description | Default |
|---|---|---|
SGAI_API_KEY | Your ScrapeGraphAI API key | β |
SGAI_API_URL | Override API base URL | https://v2-api.scrapegraphai.com/api |
SGAI_DEBUG | Enable debug logging ("1") | off |
SGAI_TIMEOUT | Request timeout in seconds | 120 |
Support
GitHub
Report issues and contribute to the SDK
Email Support
Get help from our development team