Overview

This tool integrates ScrapeGraph with LlamaIndex, providing intelligent web scraping capabilities with structured data extraction.

Official LlamaHub Documentation

View the integration on LlamaHub

Installation

Install the package using pip:

pip install llama-index-tools-scrapegraphai

Usage

First, import and initialize the ScrapegraphToolSpec:

from llama_index.tools.scrapegraph.base import ScrapegraphToolSpec

scrapegraph_tool = ScrapegraphToolSpec()

Available Functions

Smart Scraping (Sync)

Extract structured data using a schema:

from pydantic import BaseModel, Field 
 
class FounderSchema(BaseModel): 
    name: str = Field(description="Name of the founder") 
    role: str = Field(description="Role of the founder") 
    social_media: str = Field(description="Social media URL of the founder") 
 
class ListFoundersSchema(BaseModel): 
    founders: list[FounderSchema] = Field(description="List of founders") 
 
response = scrapegraph_tool.scrapegraph_smartscraper( 
    prompt="Extract product information", 
    url="https://scrapegraphai.com/", 
    api_key="sgai-***", 
    schema=ListFoundersSchema, 
) 
 
result = response["result"] 
 
for founder in result["founders"]: 
    print(founder)

Smart Scraping (Async)

Asynchronous version of the smart scraper:

result = await scrapegraph_tool.scrapegraph_smartscraper_async(
    prompt="Extract product information",
    url="https://example.com/product",
    api_key="your-api-key",
    schema=schema,
)

Submit Feedback

Provide feedback on extraction results:

response = scrapegraph_tool.scrapegraph_feedback(
    request_id="request-id",
    api_key="your-api-key",
    rating=5,
    feedback_text="Great results!",
)

Check Credits

Monitor your API credit usage:

credits = scrapegraph_tool.scrapegraph_get_credits(api_key="your-api-key")

Use Cases

RAG Applications

Build powerful retrieval-augmented generation systems

Knowledge Bases

Create and maintain up-to-date knowledge bases

Web Research

Automate web research and data collection

Content Indexing

Index and structure web content for search

Support

Need help with the integration?