Documentation Index
Fetch the complete documentation index at: https://docs.scrapegraphai.com/llms.txt
Use this file to discover all available pages before exploring further.
Automating Lead Discovery & Enrichment
Transform your lead generation process with automated web scraping. Extract valuable contact information and business details from various online sources.
Common Use Cases
- Contact Discovery: Extract contact information from company websites
- Business Directory Scraping: Gather leads from business directories
- LinkedIn Profile Scraping: Extract professional profiles and company information
- Email Discovery: Find and verify business email addresses
- Lead Enrichment: Add additional data points to existing leads
Integration Examples
from scrapegraph_py import Client
from pydantic import BaseModel, Field
from typing import List, Optional
class ContactInfo(BaseModel):
name: str = Field(description="Contact person's full name")
email: Optional[str] = Field(description="Email address if available")
role: Optional[str] = Field(description="Job role or position")
phone: Optional[str] = Field(description="Phone number if available")
department: Optional[str] = Field(description="Department or team")
class CompanyContacts(BaseModel):
contacts: List[ContactInfo] = Field(description="List of contact information")
company_name: str = Field(description="Company name")
# Initialize the client
client = Client(api_key="your-api-key")
# Scrape company website
response = client.extract(
url="https://company.com/about",
prompt="Extract all contact information for decision makers and leadership team",
output_schema=CompanyContacts
)
# Process and store leads
for contact in response.contacts:
if contact.email and contact.role and "manager" in contact.role.lower():
leads_db.add(contact)
Business Directory Scraping
from scrapegraph_py import Client
from pydantic import BaseModel, Field
from typing import List, Optional
class BusinessInfo(BaseModel):
"""Schema for business information"""
name: str = Field(description="Business name")
website: str = Field(description="Company website URL")
description: Optional[str] = Field(description="Business description")
location: Optional[str] = Field(description="Business location")
industry: Optional[str] = Field(description="Industry or category")
size: Optional[str] = Field(description="Company size if available")
contact_email: Optional[str] = Field(description="Primary contact email")
phone: Optional[str] = Field(description="Business phone number")
class BusinessSearchResults(BaseModel):
"""Schema for search results"""
businesses: List[BusinessInfo] = Field(description="List of found businesses")
total_results: Optional[int] = Field(description="Total number of businesses found")
# Initialize the client
client = Client(api_key="your-api-key")
try:
# Search for businesses in a specific category
search_results = client.search(
query="Find software companies in San Francisco with their contact details",
output_schema=BusinessSearchResults,
num_results=10
)
# Extract and validate leads
valid_leads = []
for business in search_results.businesses:
if not business.website:
continue
try:
# Get more detailed information from company website
details = client.extract(
url=business.website,
prompt="Extract detailed company information including team size, tech stack, and all contact methods",
output_schema=CompanyContacts # Defined earlier in the file
)
if validate_lead(details): # Your validation logic here
valid_leads.append(details)
except Exception as e:
print(f"Error processing {business.name}: {str(e)}")
continue
print(f"Found {len(valid_leads)} valid leads out of {len(search_results.businesses)} businesses")
except Exception as e:
print(f"Error during search: {str(e)}")
Best Practices
- Data Validation: Always validate extracted contact information
- Privacy Compliance: Ensure compliance with privacy regulations
- Rate Limiting: Implement appropriate delays between requests
- Data Deduplication: Remove duplicate leads before storage