LocalScraper

Overview

LocalScraper brings the same powerful AI extraction capabilities as SmartScraper but works with your local HTML content. This makes it perfect for scenarios where you already have the HTML content or need to process cached pages, internal documents, or dynamically generated content.

Try LocalScraper instantly in our interactive playground - no coding required!

Key Features

Local Processing

Process HTML content directly without making external requests

AI Understanding

Same powerful AI extraction as SmartScraper

Faster Processing

No network latency or website loading delays

Full Control

Complete control over your HTML input and processing

Use Cases

Internal Systems

Process internally cached pages
Extract from intranet content
Handle dynamic JavaScript renders
Process email templates

Batch Processing

Archive data extraction
Historical content analysis
Bulk document processing
Offline content processing

Development & Testing

Test extraction logic locally
Debug content processing
Prototype without API calls
Validate schemas offline

Want to learn more about our AI-powered scraping technology? Visit our main website to discover how we’re revolutionizing web data extraction.

Getting Started

Quick Start

from scrapegraph_py import Client

client = Client(api_key="your-api-key")

html_content = """
<html>
    <body>
        <h1>ScrapeGraphAI</h1>
        <div class="description">
            <p>AI-powered web scraping for modern applications.</p>
        </div>
        <div class="features">
            <ul>
                <li>Smart Extraction</li>
                <li>Local Processing</li>
                <li>Schema Support</li>
            </ul>
        </div>
    </body>
</html>
"""

response = client.localscraper(
    website_html=html_content,
    user_prompt="Extract the company information and features"
)

Get your API key from the dashboard

{
  "request_id": "sg-req-xyz789",
  "status": "completed",
  "user_prompt": "Extract the company information and features",
  "result": {
    "company_name": "ScrapeGraphAI",
    "description": "AI-powered web scraping for modern applications.",
    "features": [
      "Smart Extraction",
      "Local Processing",
      "Schema Support"
    ]
  },
  "error": ""
}

The response includes:

request_id: Unique identifier for tracking your request
status: Current status of the extraction
result: The extracted data in structured JSON format
error: Error message (if any occurred during extraction)

Advanced Usage

Custom Schema Example

Define exactly what data you want to extract:

Async Support

For applications requiring asynchronous execution, LocalScraper provides async support through the AsyncClient:

from scrapegraph_py import AsyncClient
import asyncio

async def main():
    html_content = """
    <html>
        <body>
            <h1>Product: Gaming Laptop</h1>
            <div class="price">$999.99</div>
            <div class="description">
                High-performance gaming laptop with RTX 3080.
            </div>
        </body>
    </html>
    """
    
    async with AsyncClient(api_key="your-api-key") as client:
        response = await client.localscraper(
            website_html=html_content,
            user_prompt="Extract the product information"
        )
        print(response)

# Run the async function
asyncio.run(main())

Integration Options

Official SDKs

Python SDK - Perfect for data science and backend applications
JavaScript SDK - Ideal for web applications and Node.js

AI Framework Integrations

LangChain Integration - Use LocalScraper in your LLM workflows
LlamaIndex Integration - Build powerful search and QA systems

Best Practices

HTML Preparation

Ensure HTML is well-formed
Include relevant content only
Clean up unnecessary markup
Handle character encoding properly

Optimization Tips

Remove unnecessary scripts and styles
Clean up dynamic content placeholders
Preserve important semantic structure
Include relevant metadata

Example Projects

Check out our cookbook for real-world examples:

Dynamic content extraction
Email template processing
Cached content analysis
Batch HTML processing

API Reference

For detailed API documentation, see:

Support & Resources

Documentation

Comprehensive guides and tutorials

API Reference

Detailed API documentation

Community

Join our Discord community

GitHub

Check out our open-source projects

Main Website

Visit our official website

Get Started

Services

Official SDKs

Integrations

Contribute

Resources

Overview

Key Features

Local Processing

AI Understanding

Faster Processing

Full Control

Use Cases

Internal Systems

Batch Processing

Development & Testing

Getting Started

Quick Start

Advanced Usage

Custom Schema Example

Async Support

Integration Options

Official SDKs

AI Framework Integrations

Best Practices

HTML Preparation

Optimization Tips

Example Projects

API Reference

Support & Resources

Documentation

API Reference

Community

GitHub

Main Website

Ready to Start?

Get Started

Services

Official SDKs

Integrations

Contribute

Resources

​Overview

​Key Features

Local Processing

AI Understanding

Faster Processing

Full Control

​Use Cases

​Internal Systems

​Batch Processing

​Development & Testing

​Getting Started

​Quick Start

​Advanced Usage

​Custom Schema Example

​Async Support

​Integration Options

​Official SDKs

​AI Framework Integrations

​Best Practices

​HTML Preparation

​Optimization Tips

​Example Projects

​API Reference

​Support & Resources

Documentation

API Reference

Community

GitHub

Main Website

Ready to Start?

Overview

Key Features

Use Cases

Internal Systems

Batch Processing

Development & Testing

Getting Started

Quick Start

Advanced Usage

Custom Schema Example

Async Support

Integration Options

Official SDKs

AI Framework Integrations

Best Practices

HTML Preparation

Optimization Tips

Example Projects

API Reference

Support & Resources