Overview

LocalScraper brings the same powerful AI extraction capabilities as SmartScraper but works with your local HTML content. This makes it perfect for scenarios where you already have the HTML content or need to process cached pages, internal documents, or dynamically generated content.

Try LocalScraper instantly in our interactive playground - no coding required!

Key Features

Local Processing

Process HTML content directly without making external requests

AI Understanding

Same powerful AI extraction as SmartScraper

Faster Processing

No network latency or website loading delays

Full Control

Complete control over your HTML input and processing

Use Cases

Internal Systems

  • Process internally cached pages
  • Extract from intranet content
  • Handle dynamic JavaScript renders
  • Process email templates

Batch Processing

  • Archive data extraction
  • Historical content analysis
  • Bulk document processing
  • Offline content processing

Development & Testing

  • Test extraction logic locally
  • Debug content processing
  • Prototype without API calls
  • Validate schemas offline

Want to learn more about our AI-powered scraping technology? Visit our main website to discover how we’re revolutionizing web data extraction.

Getting Started

Quick Start

from scrapegraph_py import Client

client = Client(api_key="your-api-key")

html_content = """
<html>
    <body>
        <h1>ScrapeGraphAI</h1>
        <div class="description">
            <p>AI-powered web scraping for modern applications.</p>
        </div>
        <div class="features">
            <ul>
                <li>Smart Extraction</li>
                <li>Local Processing</li>
                <li>Schema Support</li>
            </ul>
        </div>
    </body>
</html>
"""

response = client.localscraper(
    website_html=html_content,
    user_prompt="Extract the company information and features"
)

Get your API key from the dashboard

Advanced Usage

Custom Schema Example

Define exactly what data you want to extract:

Async Support

For applications requiring asynchronous execution, LocalScraper provides async support through the AsyncClient:

from scrapegraph_py import AsyncClient
import asyncio

async def main():
    html_content = """
    <html>
        <body>
            <h1>Product: Gaming Laptop</h1>
            <div class="price">$999.99</div>
            <div class="description">
                High-performance gaming laptop with RTX 3080.
            </div>
        </body>
    </html>
    """
    
    async with AsyncClient(api_key="your-api-key") as client:
        response = await client.localscraper(
            website_html=html_content,
            user_prompt="Extract the product information"
        )
        print(response)

# Run the async function
asyncio.run(main())

Integration Options

Official SDKs

  • Python SDK - Perfect for data science and backend applications
  • JavaScript SDK - Ideal for web applications and Node.js

AI Framework Integrations

Best Practices

HTML Preparation

  1. Ensure HTML is well-formed
  2. Include relevant content only
  3. Clean up unnecessary markup
  4. Handle character encoding properly

Optimization Tips

  • Remove unnecessary scripts and styles
  • Clean up dynamic content placeholders
  • Preserve important semantic structure
  • Include relevant metadata

Example Projects

Check out our cookbook for real-world examples:

  • Dynamic content extraction
  • Email template processing
  • Cached content analysis
  • Batch HTML processing

API Reference

For detailed API documentation, see:

Support & Resources

Ready to Start?

Sign up now and get your API key to begin processing your HTML content with LocalScraper!