Overview

This tool integrates ScrapeGraph with LlamaIndex, providing intelligent web scraping capabilities with structured data extraction.

Official LlamaHub Documentation

View the integration on LlamaHub

Installation

Install the package using pip:

pip install llama-index-tools-scrapegraphai

Usage

First, import and initialize the ScrapegraphToolSpec:

from llama_index.tools.scrapegraph.base import ScrapegraphToolSpec

scrapegraph_tool = ScrapegraphToolSpec()

Available Functions

Smart Scraping (Sync)

Extract structured data using a schema:

from pydantic import BaseModel, Field 
 
class FounderSchema(BaseModel): 
    name: str = Field(description="Name of the founder") 
    role: str = Field(description="Role of the founder") 
    social_media: str = Field(description="Social media URL of the founder") 
 
class ListFoundersSchema(BaseModel): 
    founders: list[FounderSchema] = Field(description="List of founders") 
 
response = scrapegraph_tool.scrapegraph_smartscraper( 
    prompt="Extract product information", 
    url="https://scrapegraphai.com/", 
    api_key="sgai-***", 
    schema=ListFoundersSchema, 
) 
 
result = response["result"] 
 
for founder in result["founders"]: 
    print(founder)

Smart Scraping (Async)

Asynchronous version of the smart scraper:

result = await scrapegraph_tool.scrapegraph_smartscraper_async(
    prompt="Extract product information",
    url="https://example.com/product",
    api_key="your-api-key",
    schema=schema,
)

Submit Feedback

Provide feedback on extraction results:

response = scrapegraph_tool.scrapegraph_feedback(
    request_id="request-id",
    api_key="your-api-key",
    rating=5,
    feedback_text="Great results!",
)

Check Credits

Monitor your API credit usage:

credits = scrapegraph_tool.scrapegraph_get_credits(api_key="your-api-key")

Use Cases

RAG Applications

Build powerful retrieval-augmented generation systems

Knowledge Bases

Create and maintain up-to-date knowledge bases

Web Research

Automate web research and data collection

Content Indexing

Index and structure web content for search

Support

Need help with the integration?

Overview

This tool integrates ScrapeGraph with LlamaIndex, providing intelligent web scraping capabilities with structured data extraction.

Official LlamaHub Documentation

View the integration on LlamaHub

Installation

Install the package using pip:

pip install llama-index-tools-scrapegraphai

Usage

First, import and initialize the ScrapegraphToolSpec:

from llama_index.tools.scrapegraph.base import ScrapegraphToolSpec

scrapegraph_tool = ScrapegraphToolSpec()

Available Functions

Smart Scraping (Sync)

Extract structured data using a schema:

from pydantic import BaseModel, Field 
 
class FounderSchema(BaseModel): 
    name: str = Field(description="Name of the founder") 
    role: str = Field(description="Role of the founder") 
    social_media: str = Field(description="Social media URL of the founder") 
 
class ListFoundersSchema(BaseModel): 
    founders: list[FounderSchema] = Field(description="List of founders") 
 
response = scrapegraph_tool.scrapegraph_smartscraper( 
    prompt="Extract product information", 
    url="https://scrapegraphai.com/", 
    api_key="sgai-***", 
    schema=ListFoundersSchema, 
) 
 
result = response["result"] 
 
for founder in result["founders"]: 
    print(founder)

Smart Scraping (Async)

Asynchronous version of the smart scraper:

result = await scrapegraph_tool.scrapegraph_smartscraper_async(
    prompt="Extract product information",
    url="https://example.com/product",
    api_key="your-api-key",
    schema=schema,
)

Submit Feedback

Provide feedback on extraction results:

response = scrapegraph_tool.scrapegraph_feedback(
    request_id="request-id",
    api_key="your-api-key",
    rating=5,
    feedback_text="Great results!",
)

Check Credits

Monitor your API credit usage:

credits = scrapegraph_tool.scrapegraph_get_credits(api_key="your-api-key")

Use Cases

RAG Applications

Build powerful retrieval-augmented generation systems

Knowledge Bases

Create and maintain up-to-date knowledge bases

Web Research

Automate web research and data collection

Content Indexing

Index and structure web content for search

Support

Need help with the integration?