Overview

All our services (SmartScraper, SearchScraper, and Markdownify) support custom headers and cookies to help you:

  • Bypass basic anti-bot protections
  • Access authenticated content
  • Maintain sessions
  • Customize request behavior

Headers

Common Headers

You can set any of the following headers in your requests:

{
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",  // Browser identification
    "Accept": "*/*",                                                                // Accepted content types
    "Accept-Encoding": "gzip, deflate, br",                                        // Supported encodings
    "Accept-Language": "en-US,en;q=0.9",                                           // Preferred languages
    "Cache-Control": "no-cache,no-cache",                                          // Caching behavior
    "Sec-Ch-Ua": "\"Google Chrome\";v=\"107\", \"Chromium\";v=\"107\"",           // Browser details
    "Sec-Ch-Ua-Mobile": "?0",                                                      // Mobile browser flag
    "Sec-Ch-Ua-Platform": "\"macOS\"",                                            // Operating system
    "Sec-Fetch-Dest": "document",                                                  // Request destination
    "Sec-Fetch-Mode": "navigate",                                                  // Request mode
    "Sec-Fetch-Site": "none",                                                      // Request origin
    "Sec-Fetch-User": "?1",                                                        // User-initiated flag
    "Upgrade-Insecure-Requests": "1"                                              // HTTPS upgrade
}

Usage Examples

from scrapegraph_py import Client

client = Client(api_key="your-api-key")

# Define custom headers
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Sec-Ch-Ua-Platform": "\"Windows\""
}

# Use with SmartScraper
response = client.smartscraper(
    website_url="https://example.com",
    user_prompt="Extract the main content",
    headers=headers
)

# Use with SearchScraper
response = client.searchscraper(
    user_prompt="Find information about...",
    headers=headers
)

# Use with Markdownify
response = client.markdownify(
    website_url="https://example.com",
    headers=headers
)

Cookies

Overview

Cookies are essential for:

  • Accessing authenticated content
  • Maintaining user sessions
  • Handling website preferences
  • Bypassing certain security measures

Setting Cookies

Cookies are set using the Cookie header as a semicolon-separated string of key-value pairs:

headers = {
    "Cookie": "session_id=abc123; user_id=12345; theme=dark"
}

Examples

# Example with session cookies
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Cookie": "session_id=abc123; user_id=12345; theme=dark"
}

response = client.smartscraper(
    website_url="https://example.com/dashboard",
    user_prompt="Extract user information",
    headers=headers
)

Common Use Cases

  1. Authentication
headers = {
    "Cookie": "auth_token=xyz789; session_id=abc123"
}
  1. Regional Settings
headers = {
    "Cookie": "country=US; language=en; currency=USD"
}
  1. User Preferences
headers = {
    "Cookie": "theme=dark; notifications=enabled"
}

Best Practices

  1. User Agent Best Practices

    • Use recent browser versions
    • Match User-Agent with Sec-Ch-Ua headers
    • Consider region-specific variations
  2. Cookie Management

    • Keep cookies up to date
    • Include all required session cookies
    • Remove unnecessary cookies
    • Handle cookie expiration
  3. Security Considerations

    • Don’t share sensitive cookies
    • Rotate User-Agents when appropriate
    • Use HTTPS when sending sensitive data

Common Issues

Support

Need Help?

Contact our support team for assistance with headers, cookies, or any other questions!