SearchScraper is our advanced LLM-powered search service that intelligently searches and aggregates information from multiple web sources. Using state-of-the-art language models, it understands your queries and extracts relevant information across the web, providing comprehensive answers with full source attribution.
from scrapegraph_py import Clientclient = Client(api_key="your-api-key")response = client.searchscraper( user_prompt="What are the key features and pricing of ChatGPT Plus?")
The schema system in SearchScraper is a powerful way to ensure you get exactly the data structure you need. Here are some advanced techniques for using schemas effectively:
Always provide clear, detailed descriptions for each field in your schema:
class CompanyInfo(BaseModel): revenue: str = Field( description="Annual revenue in USD, including the year of reporting" # Good: "Annual revenue in USD, including the year of reporting" # Bad: "Revenue" ) market_position: str = Field( description="Company's market position including market share percentage and rank among competitors" # Good: "Company's market position including market share percentage and rank among competitors" # Bad: "Position" )
Combine schemas with well-structured prompts for better results:
response = client.searchscraper( user_prompt=""" Find information about Tesla's electric vehicles with specific focus on: - Latest Model 3 and Model Y specifications - Current pricing structure - Available customization options - Delivery timeframes Please include only verified information from official sources. """, output_schema=TeslaVehicleInfo)
Implement comprehensive validation to ensure data quality:
from pydantic import BaseModel, Field, validatorfrom typing import List, Optionalfrom datetime import datetimeclass MarketData(BaseModel): timestamp: str = Field(description="Data timestamp in ISO format") value: float = Field(description="Market value") confidence_score: float = Field(description="Confidence score between 0 and 1") @validator('timestamp') def validate_timestamp(cls, v): try: datetime.fromisoformat(v) return v except ValueError: raise ValueError('Invalid ISO timestamp format') @validator('confidence_score') def validate_confidence(cls, v): if not 0 <= v <= 1: raise ValueError('Confidence score must be between 0 and 1') return v
Example of using the async searchscraper functionality to search for information concurrently:
import asynciofrom scrapegraph_py import AsyncClientfrom scrapegraph_py.logger import sgai_loggersgai_logger.set_logging(level="INFO")async def main(): # Initialize async client sgai_client = AsyncClient(api_key="your-api-key-here") # List of search queries queries = [ "What is the latest version of Python and what are its main features?", "What are the key differences between Python 2 and Python 3?", "What is Python's GIL and how does it work?", ] # Create tasks for concurrent execution tasks = [sgai_client.searchscraper(user_prompt=query) for query in queries] # Execute requests concurrently responses = await asyncio.gather(*tasks, return_exceptions=True) # Process results for i, response in enumerate(responses): if isinstance(response, Exception): print(f"\nError for query {i+1}: {response}") else: print(f"\nSearch {i+1}:") print(f"Query: {queries[i]}") print(f"Result: {response['result']}") print("Reference URLs:") for url in response["reference_urls"]: print(f"- {url}") await sgai_client.close()if __name__ == "__main__": asyncio.run(main())