Skip to main content
POST
/
v1
/
smartscraper
cURL
curl -X POST 'https://api.scrapegraphai.com/v1/smartscraper' \
  -H 'SGAI-APIKEY: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "user_prompt": "Extract info about the company",
    "website_url": "https://scrapegraphai.com/"
  }'
{
  "request_id": "<string>",
  "status": "queued",
  "website_url": "<string>",
  "user_prompt": "<string>",
  "result": {},
  "error": ""
}
SmartScraper allows you to extract specific information from any webpage using AI. Simply provide a URL and describe what information you want to extract in natural language.

Use Cases

  • Extract company information from websites
  • Gather product details from e-commerce pages
  • Collect contact information from business pages
  • Extract structured data from articles or blog posts

Request Body

website_url
string
required
The URL of the webpage you want to extract information from.
user_prompt
string
required
Natural language description of what information you want to extract from the webpage.
output_schema
object
Optional schema to structure the output. If provided, the AI will attempt to format the results according to this schema.
headers
object
Optional custom HTTP headers to send with the request. Useful for setting User-Agent, cookies, authentication tokens, and other request metadata.Example: {"User-Agent": "Mozilla/5.0...", "Cookie": "session=abc123"}
total_pages
integer
Optional parameter to enable pagination and scrape multiple pages. Specify the number of pages to extract data from.Default: 1 Range: 1-100
number_of_scrolls
integer
Optional parameter for infinite scroll pages. Specify how many times to scroll down to load more content before extraction.Default: 0 Range: 0-50
render_heavy_js
boolean
Optional parameter to enable enhanced JavaScript rendering for heavy JS websites (React, Vue, Angular, SPAs). Use when standard rendering doesn’t capture all content.Default: false
mock
boolean
Optional parameter to enable mock mode. When set to true, the request will return mock data instead of performing an actual extraction. Useful for testing and development.Default: false
plain_text
boolean
Optional parameter to return plain text instead of JSON. When set to true, the result will be returned as plain text rather than structured JSON data.Default: false
stealth
boolean
Optional parameter to enable stealth mode. When set to true, the scraper will use advanced anti-detection techniques to bypass bot protection and access protected websites. Adds +4 credits to the request cost.Default: false

Example Requests

Basic Request

curl -X POST 'https://api.scrapegraphai.com/v1/smartscraper' \
-H 'SGAI-APIKEY: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
  "website_url": "https://scrapegraphai.com/",
  "user_prompt": "Extract company information and features",
  "output_schema": {
    "properties": {
      "company_name": {"type": "string"},
      "description": {"type": "string"},
      "features": {"type": "array", "items": {"type": "string"}},
      "contact_email": {"type": "string"}
    }
  }
}'

Advanced Request with Pagination and Stealth Mode

curl -X POST 'https://api.scrapegraphai.com/v1/smartscraper' \
-H 'SGAI-APIKEY: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
  "website_url": "https://example.com/news",
  "user_prompt": "Extract all the headlines from this section into a table with the date and URL of the news",
  "total_pages": 2,
  "stealth": true,
  "render_heavy_js": true,
  "headers": {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Cookie": "cookie1=value1; cookie2=value2"
  }
}'

Example Response

{
  "request_id": "<request-id>",
  "status": "completed",
  "website_url": "https://scrapegraphai.com/",
  "user_prompt": "Extract info about the company",
  "result": {
    "company_name": "ScrapeGraphAI",
    "description": "ScrapeGraphAI is a powerful AI scraping API designed for efficient web data extraction to power LLM applications and AI agents...",
    "features": [
      "Effortless, cost-effective, and AI-powered data extraction",
      "Handles proxy rotation and rate limits",
      "Supports a wide variety of websites"
    ],
    "contact_email": "contact@scrapegraphai.com",
    "social_links": {
      "github": "https://github.com/ScrapeGraphAI/Scrapegraph-ai",
      "linkedin": "https://www.linkedin.com/company/101881123",
      "twitter": "https://x.com/scrapegraphai"
    },
    "..."
  },
  "error": ""
}

Authorizations

SGAI-APIKEY
string
header
required

Body

application/json

Either website_url or website_html must be provided

user_prompt
string
required
Example:

"Extract info about the company"

website_url
string
Example:

"https://scrapegraphai.com/"

website_html
string

HTML content, maximum size 2MB

Example:

"<html><body><h1>Title</h1><p>Content</p></body></html>"

headers
object

Optional headers to send with the request, including cookies and user agent

Example:
{
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Cookie": "cookie1=value1; cookie2=value2"
}
output_schema
object | null
stealth
boolean
default:false

Enable stealth mode to bypass bot protection using advanced anti-detection techniques. Adds +4 credits to the request cost

Response

Successful Response

request_id
string
required
status
enum<string>
required
Available options:
queued,
processing,
completed,
failed
website_url
string
required
user_prompt
string
required
result
object | null
error
string
default:""