Skip to main content
Sitemap allows you to extract all URLs from a website’s sitemap.xml file automatically. The API automatically discovers the sitemap from robots.txt, common locations like /sitemap.xml, or sitemap index files.

Use Cases

  • Discover all pages on a website for bulk scraping
  • Build content inventory from a website
  • Monitor website structure changes
  • Combine with other endpoints to scrape multiple pages
  • Create site maps for SEO analysis

Request Body

website_url
string
required
The URL of the website you want to extract the sitemap from. The API will automatically locate the sitemap.xml file.
headers
object
Optional headers to customize the request behavior. This can include user agent, cookies, or other HTTP headers.
mock
boolean
Optional parameter to enable mock mode. When set to true, the request will return mock data instead of performing an actual extraction. Useful for testing and development.Default: false
stealth
boolean
Optional parameter to enable stealth mode. When set to true, the scraper will use advanced anti-detection techniques to bypass bot protection and access protected websites. Adds +4 credits to the request cost.Default: false

Example Request

curl -X POST https://api.scrapegraphai.com/v1/sitemap \
  -H "Content-Type: application/json" \
  -H "SGAI-APIKEY: YOUR_API_KEY" \
  -d '{
  "website_url": "https://scrapegraphai.com"
}'

Example Response

{
  "request_id": "65401e0d-8cd6-4d6a-88f6-e21255d1c06a",
  "status": "completed",
  "website_url": "https://scrapegraphai.com",
  "urls": [
    "https://scrapegraphai.com/",
    "https://scrapegraphai.com/about",
    "https://scrapegraphai.com/blog",
    "https://scrapegraphai.com/blog/how-to-scrape-websites",
    "https://scrapegraphai.com/blog/web-scraping-best-practices",
    "https://scrapegraphai.com/docs",
    "https://scrapegraphai.com/pricing",
    "https://scrapegraphai.com/contact"
  ],
  "error": ""
}

Python Example

from scrapegraph_py import Client

# Initialize the client
client = Client(api_key="YOUR_API_KEY")

# Sitemap request
response = client.sitemap(
    website_url="https://scrapegraphai.com"
)

print("Result:", response)

JavaScript Example

import { sitemap } from 'scrapegraph-js';

const apiKey = 'YOUR_API_KEY';

const response = await sitemap(apiKey, {
  website_url: 'https://scrapegraphai.com',
});

if (response.status === 'error') {
  console.error('Error:', response.error);
} else {
  console.log('Result:', response.data);
}

Combining with SmartScraper

You can combine the Sitemap endpoint with SmartScraper to scrape multiple pages from a website:
from scrapegraph_py import Client

client = Client(api_key="YOUR_API_KEY")

# Step 1: Get all URLs from sitemap
sitemap_response = client.sitemap(
    website_url="https://scrapegraphai.com"
)

# Step 2: Filter for specific pages (e.g., blog posts)
blog_urls = [url for url in sitemap_response.urls if '/blog/' in url]

# Step 3: Scrape each blog post
for url in blog_urls[:5]:  # Scrape first 5 blog posts
    result = client.smartscraper(
        website_url=url,
        user_prompt="Extract the title, author, and main content"
    )
    print(f"Scraped: {url}")

Features

  • Automatic Discovery: Finds sitemap from robots.txt or common locations
  • Sitemap Index Support: Handles sitemap index files with multiple sitemaps
  • Fast Extraction: Quickly retrieves all URLs without scraping each page
  • No Rate Limits: Extract thousands of URLs in a single request
  • Integration Ready: Combine with other endpoints for bulk operations