Skip to main content

Overview

The LangChain integration enables your agents to extract structured data from websites using natural language. This powerful combination allows you to build sophisticated AI agents that can understand and process web content intelligently.

Official LangChain Documentation

View the integration in LangChain’s official documentation

Installation

Install the package using pip:
pip install langchain-scrapegraph

Available Tools

SmartScraperTool

Extract structured data from any webpage using natural language prompts:
from langchain_scrapegraph.tools import SmartScraperTool

# Initialize the tool (uses SGAI_API_KEY from environment)
tool = SmartscraperTool()

# Extract information using natural language
result = tool.invoke({
    "website_url": "https://www.example.com",
    "user_prompt": "Extract the main heading and first paragraph"
})
Define the structure of the output using Pydantic models:
from typing import List
from pydantic import BaseModel, Field
from langchain_scrapegraph.tools import SmartScraperTool

class WebsiteInfo(BaseModel):
    title: str = Field(description="The main title of the webpage")
    description: str = Field(description="The main description or first paragraph")
    urls: List[str] = Field(description="The URLs inside the webpage")

# Initialize with schema
tool = SmartScraperTool(llm_output_schema=WebsiteInfo)

result = tool.invoke({
    "website_url": "https://www.example.com",
    "user_prompt": "Extract the website information"
})

SearchScraperTool

Process HTML content directly with AI extraction:
from langchain_scrapegraph.tools import SearchScraperTool


tool = SearchScraperTool()
result = tool.invoke({
    "user_prompt": "Find the best restaurants in San Francisco",
})

from typing import Optional
from pydantic import BaseModel, Field
from langchain_scrapegraph.tools import SearchScraperTool

class RestaurantInfo(BaseModel):
    name: str = Field(description="The restaurant name")
    address: str = Field(description="The restaurant address")
    rating: float = Field(description="The restaurant rating")


tool = SearchScraperTool(llm_output_schema=RestaurantInfo)

result = tool.invoke({
    "user_prompt": "Find the best restaurants in San Francisco"
})

MarkdownifyTool

Convert any webpage into clean, formatted markdown:
from langchain_scrapegraph.tools import MarkdownifyTool

tool = MarkdownifyTool()
markdown = tool.invoke({"website_url": "https://example.com"})

Example Agent

Create a research agent that can gather and analyze web data:
from langchain.agents import initialize_agent, AgentType
from langchain_scrapegraph.tools import SmartScraperTool
from langchain_openai import ChatOpenAI

# Initialize tools
tools = [
    SmartScraperTool(),
]

# Create an agent
agent = initialize_agent(
    tools=tools,
    llm=ChatOpenAI(temperature=0),
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

# Use the agent
response = agent.run("""
    Visit example.com, make a summary of the content and extract the main heading and first paragraph
""")

Configuration

Set your ScrapeGraph API key in your environment:
export SGAI_API_KEY="your-api-key-here"
Or set it programmatically:
import os
os.environ["SGAI_API_KEY"] = "your-api-key-here"
Get your API key from the dashboard

Use Cases

Research Agents

Create agents that gather and analyze web data

Data Collection

Automate structured data extraction from websites

Content Processing

Convert web content into markdown for further processing

Information Extraction

Extract specific data points using natural language

Support

Need help with the integration?
⌘I