Installation
Install the package using npm, pnpm, yarn or bun:
# Using npm
npm i scrapegraph-js
# Using pnpm
pnpm i scrapegraph-js
# Using yarn
yarn add scrapegraph-js
# Using bun
bun add scrapegraph-js
Features
AI-Powered Extraction : Smart web scraping with artificial intelligence
Async by Design : Fully asynchronous architecture
Type Safety : Built-in TypeScript support with Zod schemas
Zero Exceptions : All errors wrapped in ApiResult — no try/catch needed
Developer Friendly : Comprehensive error handling and debug logging
Quick Start
Basic example
Store your API keys securely in environment variables. Use .env files and
libraries like dotenv to load them into your app.
import { smartScraper } from "scrapegraph-js" ;
import "dotenv/config" ;
const apiKey = process . env . SGAI_APIKEY ;
const response = await smartScraper ( apiKey , {
website_url: "https://example.com" ,
user_prompt: "What does the company do?" ,
});
if ( response . status === "error" ) {
console . error ( "Error:" , response . error );
} else {
console . log ( response . data . result );
}
Services
SmartScraper
Extract specific information from any webpage using AI:
const response = await smartScraper ( apiKey , {
website_url: "https://example.com" ,
user_prompt: "Extract the main content" ,
});
All functions return an ApiResult<T> object:
type ApiResult < T > = {
status : "success" | "error" ;
data : T | null ;
error ?: string ;
elapsedMs : number ;
};
Parameters
Parameter Type Required Description apiKey string Yes The ScrapeGraph API Key (first argument). user_prompt string Yes A textual description of what you want to extract. website_url string No* The URL of the webpage to scrape. *One of website_url, website_html, or website_markdown is required. output_schema object No A Zod schema (converted to JSON) that describes the structure of the response. number_of_scrolls number No Number of scrolls for infinite scroll pages (0-50). stealth boolean No Enable anti-detection mode (+4 credits). headers object No Custom HTTP headers. mock boolean No Enable mock mode for testing. wait_ms number No Page load wait time in ms (default: 3000). country_code string No Proxy routing country code (e.g., “us”).
Define a simple schema using Zod: import { z } from "zod" ;
const ArticleSchema = z . object ({
title: z . string (). describe ( "The article title" ),
author: z . string (). describe ( "The author's name" ),
publishDate: z . string (). describe ( "Article publication date" ),
content: z . string (). describe ( "Main article content" ),
category: z . string (). describe ( "Article category" ),
});
const ArticlesArraySchema = z
. array ( ArticleSchema )
. describe ( "Array of articles" );
const response = await smartScraper ( apiKey , {
website_url: "https://example.com/blog/article" ,
user_prompt: "Extract the article information" ,
output_schema: ArticlesArraySchema ,
});
console . log ( `Title: ${ response . data . result . title } ` );
console . log ( `Author: ${ response . data . result . author } ` );
console . log ( `Published: ${ response . data . result . publishDate } ` );
Define a complex schema for nested data structures: import { z } from "zod" ;
const EmployeeSchema = z . object ({
name: z . string (). describe ( "Employee's full name" ),
position: z . string (). describe ( "Job title" ),
department: z . string (). describe ( "Department name" ),
email: z . string (). describe ( "Email address" ),
});
const OfficeSchema = z . object ({
location: z . string (). describe ( "Office location/city" ),
address: z . string (). describe ( "Full address" ),
phone: z . string (). describe ( "Contact number" ),
});
const CompanySchema = z . object ({
name: z . string (). describe ( "Company name" ),
description: z . string (). describe ( "Company description" ),
industry: z . string (). describe ( "Industry sector" ),
foundedYear: z . number (). describe ( "Year company was founded" ),
employees: z . array ( EmployeeSchema ). describe ( "List of key employees" ),
offices: z . array ( OfficeSchema ). describe ( "Company office locations" ),
website: z . string (). url (). describe ( "Company website URL" ),
});
const response = await smartScraper ( apiKey , {
website_url: "https://example.com/about" ,
user_prompt: "Extract detailed company information including employees and offices" ,
output_schema: CompanySchema ,
});
console . log ( `Company: ${ response . data . result . name } ` );
console . log ( " \n Key Employees:" );
response . data . result . employees . forEach (( employee ) => {
console . log ( `- ${ employee . name } ( ${ employee . position } )` );
});
console . log ( " \n Office Locations:" );
response . data . result . offices . forEach (( office ) => {
console . log ( `- ${ office . location } : ${ office . address } ` );
});
Enhanced JavaScript Rendering Example
For modern web applications built with React, Vue, Angular, or other JavaScript frameworks: import { smartScraper } from 'scrapegraph-js' ;
import { z } from 'zod' ;
const apiKey = 'your-api-key' ;
const ProductSchema = z . object ({
name: z . string (). describe ( 'Product name' ),
price: z . string (). describe ( 'Product price' ),
description: z . string (). describe ( 'Product description' ),
availability: z . string (). describe ( 'Product availability status' )
});
const response = await smartScraper ( apiKey , {
website_url: 'https://example-react-store.com/products/123' ,
user_prompt: 'Extract product details including name, price, description, and availability' ,
output_schema: ProductSchema ,
});
if ( response . status === 'error' ) {
console . error ( 'Error:' , response . error );
} else {
console . log ( 'Product:' , response . data . result . name );
console . log ( 'Price:' , response . data . result . price );
console . log ( 'Available:' , response . data . result . availability );
}
SearchScraper
Search and extract information from multiple web sources using AI:
const response = await searchScraper ( apiKey , {
user_prompt: "Find the best restaurants in San Francisco" ,
location_geo_code: "us" ,
time_range: "past_week" ,
});
Parameters
Parameter Type Required Description apiKey string Yes The ScrapeGraph API Key (first argument). user_prompt string Yes A textual description of what you want to achieve. num_results number No Number of websites to search (3-20). Default: 3. extraction_mode boolean No true = AI extraction mode (10 credits/page), false = markdown mode (2 credits/page).output_schema object No Zod schema for structured response format (AI extraction mode only). location_geo_code string No Geo code for location-based search (e.g., “us”). time_range string No Time range filter. Options: “past_hour”, “past_24_hours”, “past_week”, “past_month”, “past_year”.
Define a simple schema using Zod: import { z } from "zod" ;
const ArticleSchema = z . object ({
title: z . string (). describe ( "The article title" ),
author: z . string (). describe ( "The author's name" ),
publishDate: z . string (). describe ( "Article publication date" ),
content: z . string (). describe ( "Main article content" ),
category: z . string (). describe ( "Article category" ),
});
const response = await searchScraper ( apiKey , {
user_prompt: "Find news about the latest trends in AI" ,
output_schema: ArticleSchema ,
location_geo_code: "us" ,
time_range: "past_week" ,
});
console . log ( `Title: ${ response . data . result . title } ` );
console . log ( `Author: ${ response . data . result . author } ` );
console . log ( `Published: ${ response . data . result . publishDate } ` );
Define a complex schema for nested data structures: import { z } from "zod" ;
const EmployeeSchema = z . object ({
name: z . string (). describe ( "Employee's full name" ),
position: z . string (). describe ( "Job title" ),
department: z . string (). describe ( "Department name" ),
email: z . string (). describe ( "Email address" ),
});
const OfficeSchema = z . object ({
location: z . string (). describe ( "Office location/city" ),
address: z . string (). describe ( "Full address" ),
phone: z . string (). describe ( "Contact number" ),
});
const RestaurantSchema = z . object ({
name: z . string (). describe ( "Restaurant name" ),
address: z . string (). describe ( "Restaurant address" ),
rating: z . number (). describe ( "Restaurant rating" ),
website: z . string (). url (). describe ( "Restaurant website URL" ),
});
const response = await searchScraper ( apiKey , {
user_prompt: "Find the best restaurants in San Francisco" ,
output_schema: RestaurantSchema ,
location_geo_code: "us" ,
time_range: "past_month" ,
});
Use markdown mode for cost-effective content gathering: import { searchScraper } from 'scrapegraph-js' ;
const apiKey = 'your-api-key' ;
const response = await searchScraper ( apiKey , {
user_prompt: 'Latest developments in artificial intelligence' ,
num_results: 3 ,
extraction_mode: false ,
location_geo_code: "us" ,
time_range: "past_week" ,
});
if ( response . status === 'error' ) {
console . error ( 'Error:' , response . error );
} else {
const markdownContent = response . data . markdown_content ;
console . log ( 'Markdown content length:' , markdownContent . length );
console . log ( 'Reference URLs:' , response . data . reference_urls );
console . log ( 'Content preview:' , markdownContent . substring ( 0 , 500 ) + '...' );
}
Markdown Mode Benefits:
Cost-effective : Only 2 credits per page (vs 10 credits for AI extraction)
Full content : Get complete page content in markdown format
Faster : No AI processing overhead
Perfect for : Content analysis, bulk data collection, building datasets
Time Range Filter Example
Filter search results by date range to get only recent information: import { searchScraper } from 'scrapegraph-js' ;
const apiKey = 'your-api-key' ;
const response = await searchScraper ( apiKey , {
user_prompt: 'Latest news about AI developments' ,
num_results: 5 ,
time_range: 'past_week' , // Options: 'past_hour', 'past_24_hours', 'past_week', 'past_month', 'past_year'
});
if ( response . status === 'error' ) {
console . error ( 'Error:' , response . error );
} else {
console . log ( 'Recent AI news:' , response . data . result );
console . log ( 'Reference URLs:' , response . data . reference_urls );
}
Time Range Options:
past_hour - Results from the past hour
past_24_hours - Results from the past 24 hours
past_week - Results from the past week
past_month - Results from the past month
past_year - Results from the past year
Use Cases:
Finding recent news and updates
Tracking time-sensitive information
Getting latest product releases
Monitoring recent market changes
Markdownify
Convert any webpage into clean, formatted markdown:
const response = await markdownify ( apiKey , {
website_url: "https://example.com" ,
});
Parameters
Parameter Type Required Description apiKey string Yes The ScrapeGraph API Key (first argument). website_url string Yes The URL of the webpage to convert to markdown. wait_ms number No Page load wait time in ms (default: 3000). stealth boolean No Enable anti-detection mode (+4 credits). country_code string No Proxy routing country code (e.g., “us”).
API Credits
Check your available API credits:
import { getCredits } from "scrapegraph-js" ;
const credits = await getCredits ( apiKey );
if ( credits . status === "error" ) {
console . error ( "Error fetching credits:" , credits . error );
} else {
console . log ( "Remaining credits:" , credits . data . remaining_credits );
console . log ( "Total used:" , credits . data . total_credits_used );
}
Support
This project is licensed under the MIT License. See the
LICENSE
file for details.