Overview

SearchScraper is our advanced LLM-powered search service that intelligently searches and aggregates information from multiple web sources. Using state-of-the-art language models, it understands your queries and extracts relevant information across the web, providing comprehensive answers with full source attribution.

Try SearchScraper instantly in our interactive playground - no coding required!

Getting Started

Quick Start

from scrapegraph_py import Client

client = Client(api_key="your-api-key")

response = client.searchscraper(
    user_prompt="What are the key features and pricing of ChatGPT Plus?"
)

Parameters

ParameterTypeRequiredDescription
apiKeystringYesThe ScrapeGraph API Key.
promptstringYesA textual description of what you want to achieve.
schemaobjectNoThe Pydantic or Zod object that describes the structure and format of the response

Get your API key from the dashboard

Key Features

Multi-Source Search

Intelligent search across multiple reliable web sources

AI Understanding

Advanced LLM models for accurate information extraction

Structured Output

Clean, structured data in your preferred format

Source Attribution

Full transparency with reference URLs

Use Cases

Research & Analysis

  • Academic research and fact-finding
  • Market research and competitive analysis
  • Technology trend analysis
  • Industry insights gathering

Data Aggregation

  • Product research and comparison
  • Company information compilation
  • Price monitoring across sources
  • Technology stack analysis

Content Creation

  • Fact verification and citation
  • Content research and inspiration
  • Data-driven article writing
  • Knowledge base building

Want to learn more about our AI-powered search technology? Visit our main website to discover how we’re revolutionizing web research.

Other Functionality

Retrieve a previous request

If you know the response id of a previous request you made, you can retrieve all the information.

import { getSearchScraperRequest } from 'scrapegraph-js';

const apiKey = 'your_api_key';
const requestId = 'ID_of_previous_request';

try {
  const requestInfo = await getSearchScraperRequest(apiKey, requestId);
  console.log(requestInfo);
} catch (error) {
  console.error(error);
}

Parameters

ParameterTypeRequiredDescription
apiKeystringYesThe ScrapeGraph API Key.
requestIdstringYesThe request ID associated with the output of a previous searchScraper request.

Custom Schema Example

Define exactly what data you want to extract using Pydantic or Zod:

from pydantic import BaseModel, Field
from typing import List

class CompanyProfile(BaseModel):
    name: str = Field(description="Company name")
    description: str = Field(description="Brief company description")
    founded_year: str = Field(description="Year the company was founded")
    headquarters: str = Field(description="Company headquarters location")
    employees: str = Field(description="Number of employees")
    industry: str = Field(description="Primary industry")
    products: List[str] = Field(description="Main products or services")
    competitors: List[str] = Field(description="Major competitors")
    market_share: str = Field(description="Company's market share")
    revenue: str = Field(description="Annual revenue")
    tech_stack: List[str] = Field(description="Technologies used by the company")

response = client.searchscraper(
    user_prompt="Find comprehensive information about OpenAI",
    output_schema=CompanyProfile
)

Advanced Schema Usage

The schema system in SearchScraper is a powerful way to ensure you get exactly the data structure you need. Here are some advanced techniques for using schemas effectively:

Nested Schemas

You can create complex nested structures to capture hierarchical data:

from pydantic import BaseModel, Field
from typing import List, Optional

class Author(BaseModel):
    name: str = Field(description="Author's full name")
    bio: Optional[str] = Field(description="Author's biography")
    expertise: List[str] = Field(description="Areas of expertise")

class Article(BaseModel):
    title: str = Field(description="Article title")
    content: str = Field(description="Main article content")
    author: Author = Field(description="Article author information")
    publication_date: str = Field(description="Date of publication")
    tags: List[str] = Field(description="Article tags or categories")

response = client.searchscraper(
    user_prompt="Find the latest AI research articles",
    output_schema=Article
)

Schema Validation Rules

Enhance data quality by adding validation rules to your schema:

from pydantic import BaseModel, Field, validator
from typing import List
from datetime import datetime

class ProductInfo(BaseModel):
    name: str = Field(description="Product name")
    price: float = Field(description="Product price", gt=0)
    currency: str = Field(description="Currency code", max_length=3)
    release_date: str = Field(description="Product release date")
    
    @validator('currency')
    def validate_currency(cls, v):
        if len(v) != 3 or not v.isupper():
            raise ValueError('Currency must be a 3-letter uppercase code')
        return v
        
    @validator('release_date')
    def validate_date(cls, v):
        try:
            datetime.strptime(v, '%Y-%m-%d')
            return v
        except ValueError:
            raise ValueError('Date must be in YYYY-MM-DD format')

Quality Improvement Tips

To get the highest quality results from SearchScraper, follow these best practices:

1. Detailed Field Descriptions

Always provide clear, detailed descriptions for each field in your schema:

class CompanyInfo(BaseModel):
    revenue: str = Field(
        description="Annual revenue in USD, including the year of reporting"
        # Good: "Annual revenue in USD, including the year of reporting"
        # Bad: "Revenue"
    )
    market_position: str = Field(
        description="Company's market position including market share percentage and rank among competitors"
        # Good: "Company's market position including market share percentage and rank among competitors"
        # Bad: "Position"
    )

2. Structured Prompts

Combine schemas with well-structured prompts for better results:

response = client.searchscraper(
    user_prompt="""
    Find information about Tesla's electric vehicles with specific focus on:
    - Latest Model 3 and Model Y specifications
    - Current pricing structure
    - Available customization options
    - Delivery timeframes
    Please include only verified information from official sources.
    """,
    output_schema=TeslaVehicleInfo
)

3. Data Validation

Implement comprehensive validation to ensure data quality:

from pydantic import BaseModel, Field, validator
from typing import List, Optional
from datetime import datetime

class MarketData(BaseModel):
    timestamp: str = Field(description="Data timestamp in ISO format")
    value: float = Field(description="Market value")
    confidence_score: float = Field(description="Confidence score between 0 and 1")
    
    @validator('timestamp')
    def validate_timestamp(cls, v):
        try:
            datetime.fromisoformat(v)
            return v
        except ValueError:
            raise ValueError('Invalid ISO timestamp format')
    
    @validator('confidence_score')
    def validate_confidence(cls, v):
        if not 0 <= v <= 1:
            raise ValueError('Confidence score must be between 0 and 1')
        return v

4. Error Handling

Implement robust error handling for schema validation:

try:
    response = client.searchscraper(
        user_prompt="Find market data for NASDAQ:AAPL",
        output_schema=MarketData
    )
    validated_data = MarketData(**response.result)
except ValidationError as e:
    print(f"Data validation failed: {e.json()}")
    # Implement fallback logic or error reporting
except Exception as e:
    print(f"An error occurred: {str(e)}")

Async Support

Example of using the async searchscraper functionality to search for information concurrently:

import asyncio
from scrapegraph_py import AsyncClient
from scrapegraph_py.logger import sgai_logger

sgai_logger.set_logging(level="INFO")

async def main():
    # Initialize async client
    sgai_client = AsyncClient(api_key="your-api-key-here")

    # List of search queries
    queries = [
        "What is the latest version of Python and what are its main features?",
        "What are the key differences between Python 2 and Python 3?",
        "What is Python's GIL and how does it work?",
    ]

    # Create tasks for concurrent execution
    tasks = [sgai_client.searchscraper(user_prompt=query) for query in queries]

    # Execute requests concurrently
    responses = await asyncio.gather(*tasks, return_exceptions=True)

    # Process results
    for i, response in enumerate(responses):
        if isinstance(response, Exception):
            print(f"\nError for query {i+1}: {response}")
        else:
            print(f"\nSearch {i+1}:")
            print(f"Query: {queries[i]}")
            print(f"Result: {response['result']}")
            print("Reference URLs:")
            for url in response["reference_urls"]:
                print(f"- {url}")

    await sgai_client.close()

if __name__ == "__main__":
    asyncio.run(main())

Integration Options

Official SDKs

  • Python SDK - Perfect for data science and backend applications
  • JavaScript SDK - Ideal for web applications and Node.js

AI Framework Integrations

Best Practices

Query Optimization

  1. Be specific in your prompts
  2. Use descriptive queries
  3. Include relevant context
  4. Specify time-sensitive requirements

Schema Design

  • Start with essential fields
  • Use appropriate data types
  • Add field descriptions
  • Make optional fields nullable
  • Group related information

Rate Limiting

  • Implement reasonable delays between requests
  • Use async clients for better performance
  • Monitor your API usage

Example Projects

Check out our cookbook for real-world examples:

API Reference

For detailed API documentation, see:

Support & Resources

Ready to Start?

Sign up now and get your API key to begin searching and extracting data with SearchScraper!