SmartScraper
AI-powered web scraping for any website
Overview
SmartScraper is our flagship LLM-powered web scraping service that intelligently extracts structured data from any website. Using advanced LLM models, it understands context and content like a human would, making web data extraction more reliable and efficient than ever.
Try SmartScraper instantly in our interactive playground - no coding required!
Getting Started
Quick Start
Parameters
Parameter | Type | Required | Description |
---|---|---|---|
apiKey | string | Yes | The ScrapeGraph API Key. |
websiteUrl | string | Yes | The URL of the webpage that needs to be scraped. |
prompt | string | Yes | A textual description of what you want to achieve. |
schema | object | No | The Pydantic or Zod object that describes the structure and format of the response. |
Get your API key from the dashboard
Key Features
Universal Compatibility
Works with any website structure, including JavaScript-rendered content
AI Understanding
Contextual understanding of content for accurate extraction
Structured Output
Returns clean, structured data in your preferred format
Schema Support
Define custom output schemas using Pydantic or Zod
Use Cases
Content Aggregation
- News article extraction
- Blog post summarization
- Product information gathering
- Research data collection
Data Analysis
- Market research
- Competitor analysis
- Price monitoring
- Trend tracking
AI Training
- Dataset creation
- Training data collection
- Content classification
- Knowledge base building
Want to learn more about our AI-powered scraping technology? Visit our main website to discover how we’re revolutionizing web data extraction.
Other Functionality
Retrieve a previous request
If you know the response id of a previous request you made, you can retrieve all the information.
Parameters
Parameter | Type | Required | Description |
---|---|---|---|
apiKey | string | Yes | The ScrapeGraph API Key. |
requestId | string | Yes | The request ID associated with the output of a previous smartScraper request. |
Custom Schema Example
Define exactly what data you want to extract:
Async Support
For applications requiring asynchronous execution, SmartScraper provides comprehensive async support through the AsyncClient
:
SmartScraper Endpoint
The SmartScraper endpoint is our core service for extracting structured data from any webpage using advanced language models. It automatically adapts to different website layouts and content types, enabling quick and reliable data extraction.
Key Capabilities
- Universal Compatibility: Works with any website structure, including JavaScript-rendered content
- Schema Validation: Supports both Pydantic (Python) and Zod (JavaScript) schemas
- Concurrent Processing: Efficient handling of multiple URLs through async support
- Custom Extraction: Flexible user prompts for targeted data extraction
Endpoint Details
Required Headers
Header | Description |
---|---|
SGAI-APIKEY | Your API authentication key |
Content-Type | application/json |
Request Body
Field | Type | Required | Description |
---|---|---|---|
website_url | string | Yes* | URL to scrape (*either this or website_html required) |
website_html | string | No | Raw HTML content to process |
user_prompt | string | Yes | Instructions for data extraction |
output_schema | object | No | Pydantic or Zod schema for response validation |
Response Format
Best Practices
-
Schema Definition:
- Define schemas to ensure consistent data structure
- Use descriptive field names and types
- Include field descriptions for better extraction accuracy
-
Async Processing:
- Use async clients for concurrent requests
- Implement proper error handling
- Monitor rate limits and implement backoff strategies
-
Error Handling:
- Always wrap requests in try-catch blocks
- Check response status before processing
- Implement retry logic for failed requests
Integration Options
Official SDKs
- Python SDK - Perfect for data science and backend applications
- JavaScript SDK - Ideal for web applications and Node.js
AI Framework Integrations
- LangChain Integration - Use SmartScraper in your LLM workflows
- LlamaIndex Integration - Build powerful search and QA systems
Best Practices
Optimizing Extraction
- Be specific in your prompts
- Use schemas for structured data
- Handle pagination for multi-page content
- Implement error handling and retries
Rate Limiting
- Implement reasonable delays between requests
- Use async clients for better performance
- Monitor your API usage
Example Projects
Check out our cookbook for real-world examples:
- E-commerce product scraping
- News aggregation
- Research data collection
- Content monitoring
API Reference
For detailed API documentation, see:
Support & Resources
Documentation
Comprehensive guides and tutorials
API Reference
Detailed API documentation
Community
Join our Discord community
GitHub
Check out our open-source projects
Ready to Start?
Sign up now and get your API key to begin extracting data with SmartScraper!
Was this page helpful?