Documentation Index
Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt
Use this file to discover all available pages before exploring further.
Integrate ScrapeGraphAI with OpenAI to build AI applications powered by web data.
Setup
npm install scrapegraph-js openai
Create .env file:
SGAI_APIKEY=your_scrapegraph_key
OPENAI_API_KEY=your_openai_key
If using Node < 20, install dotenv and add import 'dotenv/config' to your code.
Scrape + Summarize
This example demonstrates a simple workflow: scrape a website and summarize the content using OpenAI.
import { extract } from 'scrapegraph-js';
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const result = await extract(process.env.SGAI_APIKEY!, {
url: 'https://scrapegraphai.com',
prompt: 'Extract all content from this page',
});
const data = result.data?.json;
console.log('Scraped content length:', JSON.stringify(data).length);
const completion = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{ role: 'user', content: `Summarize in 100 words: ${JSON.stringify(data)}` }
]
});
console.log('Response:', completion.choices[0].message.content);
This example shows how to use OpenAI’s function calling to let the model decide when to scrape websites based on user requests.
import { extract } from 'scrapegraph-js';
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
console.log("Sending user message to OpenAI and requesting tool use if necessary...");
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{
role: 'user',
content: 'What is ScrapeGraphAI? Check scrapegraphai.com'
}],
tools: [{
type: 'function',
function: {
name: 'scrape_website',
description: 'Scrape and extract structured data from a website URL',
parameters: {
type: 'object',
properties: {
url: { type: 'string', description: 'The URL to scrape' }
},
required: ['url']
}
}
}]
});
const toolCall = response.choices[0].message.tool_calls?.[0];
if (toolCall) {
const { url } = JSON.parse(toolCall.function.arguments);
console.log(`Calling tool: ${toolCall.function.name} | URL: ${url}`);
const result = await extract(process.env.SGAI_APIKEY!, {
url,
prompt: 'Extract all content from this page',
});
const data = result.data?.json;
console.log(`Scraped content preview: ${JSON.stringify(data)?.substring(0, 300)}...`);
// Continue with the conversation or process the scraped content as needed
}
This example demonstrates how to use OpenAI’s JSON mode to extract structured data from scraped website content.
import { extract } from 'scrapegraph-js';
import OpenAI from 'openai';
import { z } from 'zod';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const CompanyInfoSchema = z.object({
name: z.string(),
industry: z.string().optional(),
description: z.string().optional()
});
const result = await extract(process.env.SGAI_APIKEY!, {
url: 'https://stripe.com',
prompt: 'Extract all content from this page',
});
const data = result.data?.json;
const completion = await openai.chat.completions.create({
model: 'gpt-4o-mini',
response_format: { type: 'json_object' },
messages: [
{
role: 'system',
content: `Extract company information from the website content. Respond ONLY with valid JSON in this exact format:
{ "name": "Company Name", "industry": "Industry", "description": "One sentence description" }`
},
{ role: 'user', content: JSON.stringify(data) }
]
});
const companyInfo = CompanyInfoSchema.parse(
JSON.parse(completion.choices[0].message.content!)
);
console.log(companyInfo);
For more examples, check the OpenAI documentation.