Skip to main content
Scrape Service

Overview

The Scrape service provides direct access to raw HTML content from web pages, with optional JavaScript rendering support. This service is perfect for applications that need the complete HTML structure of a webpage, including dynamically generated content.
Try the Scrape service instantly in our interactive playground

Getting Started

Quick Start

from scrapegraph_py import Client
from scrapegraph_py.logger import sgai_logger

sgai_logger.set_logging(level="INFO")

# Initialize the client
sgai_client = Client(api_key="your-api-key")

# Scrape request
response = sgai_client.htmlify(
    website_url="https://example.com",
    branding=True  # Set to True to extract brand design and metadata
)

print("HTML Content:", response.html)
print("Request ID:", response.scrape_request_id)
print("Status:", response.status)
# Optional branding result
if response.branding:
    print("Branding extracted")

Parameters

ParameterTypeRequiredDescription
apiKeystringYesThe ScrapeGraph API Key.
website_urlstringYesThe URL of the webpage to scrape.
brandingbooleanNoReturn extracted brand design and metadata. Default: false
stealthbooleanNoEnable stealth mode for anti-bot protection. Adds additional credits. Default: false
wait_msintegerNoMilliseconds to wait before capturing page content. Default: 3000
country_codestringNoTwo-letter ISO country code for geo-targeted proxy routing (e.g., β€œus”, β€œgb”, β€œde”).
Get your API key from the dashboard
{
  "scrape_request_id": "2f0f7a7e-7eb3-4bd2-8f8d-ae8a7f2d9c1a",
  "status": "completed",
  "html": "<!DOCTYPE html><html><head><title>Example Page</title></head><body><h1>Welcome to Example.com</h1><p>This is the raw HTML content...</p></body></html>",
  "error": ""
}
The response includes:
  • scrape_request_id: Unique identifier for tracking your request
  • status: Current status of the scraping operation
  • html: Raw HTML content of the webpage
  • error: Error message (if any occurred during scraping)
{
  "scrape_request_id": "2f0f7a7e-7eb3-4bd2-8f8d-ae8a7f2d9c1a",
  "status": "completed",
  "html": "<!DOCTYPE html><html>...</html>",
  "error": "",
  "branding": {
    "branding": {
      "colorScheme": "light",
      "colors": {
        "primary": "#0B5FFF",
        "accent": "#FF8A00",
        "background": "#FFFFFF",
        "textPrimary": "#111827",
        "link": "#0B5FFF"
      },
      "fonts": [
        { "family": "Inter", "role": "body" }
      ],
      "typography": {
        "fontFamilies": { "primary": "Inter", "heading": "Inter" },
        "fontStacks": { "heading": ["Inter"], "body": ["Inter"] },
        "fontSizes": { "h1": "32px", "h2": "24px", "body": "16px" }
      },
      "spacing": { "baseUnit": 4, "borderRadius": "6px" },
      "components": {
        "input": { "borderColor": "#E5E7EB", "borderRadius": "6px" },
        "buttonPrimary": {
          "background": "#0B5FFF",
          "textColor": "#FFFFFF",
          "borderRadius": "6px",
          "shadow": "..."
        }
      },
      "images": {
        "logo": "https://example.com/logo.svg",
        "favicon": "https://example.com/favicon.ico",
        "ogImage": "https://example.com/og.png"
      },
      "designSystem": { "framework": "tailwind", "componentLibrary": null },
      "confidence": { "overall": 0.86 }
    },
    "metadata": {
      "title": "Example",
      "language": "en",
      "favicon": "https://example.com/favicon.ico"
    }
  }
}
When branding=true is passed, the response includes a branding object with brand design data and page metadata.

Key Features

Raw HTML Access

Get complete HTML structure including all elements

Branding Extraction

Optionally extract brand colors, fonts, typography, UI components, images, and metadata

Fast Processing

Quick extraction for simple HTML content

Reliable Output

Consistent results across different websites

Use Cases

Web Development

  • Extract HTML templates
  • Analyze page structure
  • Test website rendering
  • Debug HTML issues

Data Analysis

  • Parse HTML content
  • Extract specific elements
  • Monitor website changes
  • Build web scrapers

Content Processing

  • Process dynamic content
  • Handle JavaScript-heavy sites
  • Extract embedded data
  • Analyze page performance
Want to learn more about our AI-powered scraping technology? Visit our main website to discover how we’re revolutionizing web data extraction.

Advanced Usage

Async Support

For applications requiring asynchronous execution, the Scrape service provides async support:
from scrapegraph_py import AsyncClient
import asyncio

async def main():
    async with AsyncClient(api_key="your-api-key") as client:
        response = await client.htmlify(
            website_url="https://example.com"
        )
        print(response)

# Run the async function
asyncio.run(main())

Concurrent Processing

Process multiple URLs concurrently for better performance:
import asyncio
from scrapegraph_py import AsyncClient
from scrapegraph_py.logger import sgai_logger

sgai_logger.set_logging(level="INFO")

async def main():
    # Initialize async client
    sgai_client = AsyncClient(api_key="your-api-key")

    # URLs to scrape
    urls = [
        "https://example.com",
        "https://scrapegraphai.com/",
        "https://github.com/ScrapeGraphAI/Scrapegraph-ai",
    ]

    tasks = [sgai_client.htmlify(website_url=url) for url in urls]

    # Execute requests concurrently
    responses = await asyncio.gather(*tasks, return_exceptions=True)

    # Process results
    for i, response in enumerate(responses):
        if isinstance(response, Exception):
            print(f"\nError for {urls[i]}: {response}")
        else:
            print(f"\nPage {i+1} HTML:")
            print(f"URL: {urls[i]}")
            print(f"HTML Length: {len(response['html'])} characters")

    await sgai_client.close()

if __name__ == "__main__":
    asyncio.run(main())

Integration Options

Official SDKs

AI Framework Integrations

Best Practices

Performance Optimization

  1. Process multiple URLs concurrently
  2. Cache results when possible
  3. Monitor API usage and costs

Error Handling

  • Always check the status field
  • Handle network timeouts gracefully
  • Implement retry logic for failed requests
  • Log errors for debugging

Content Processing

  • Validate HTML structure before parsing
  • Handle different character encodings
  • Extract only needed content sections
  • Clean up HTML for further processing

Example Projects

Check out our cookbook for real-world examples:
  • Web scraping automation tools
  • Content monitoring systems
  • HTML analysis applications
  • Dynamic content extractors

API Reference

For detailed API documentation, see the API Reference.

Support & Resources

Ready to Start?

Sign up now and get your API key to begin scraping web content!