🦙 LlamaIndex - ScrapeGraphAI

Overview

This tool integrates ScrapeGraph with LlamaIndex, providing intelligent web scraping capabilities with structured data extraction.

Official LlamaHub Documentation

View the integration on LlamaHub

Installation

Install the package using pip:

pip install llama-index-tools-scrapegraphai

Usage

First, import and initialize the ScrapegraphToolSpec:

from llama_index.tools.scrapegraph.base import ScrapegraphToolSpec

scrapegraph_tool = ScrapegraphToolSpec()

Available Functions

Smart Scraping (Sync)

Extract structured data using a schema:

from pydantic import BaseModel, Field 
 
class FounderSchema(BaseModel): 
    name: str = Field(description="Name of the founder") 
    role: str = Field(description="Role of the founder") 
    social_media: str = Field(description="Social media URL of the founder") 
 
class ListFoundersSchema(BaseModel): 
    founders: list[FounderSchema] = Field(description="List of founders") 
 
response = scrapegraph_tool.scrapegraph_smartscraper( 
    prompt="Extract product information", 
    url="https://scrapegraphai.com/", 
    api_key="sgai-***", 
    schema=ListFoundersSchema, 
) 
 
result = response["result"] 
 
for founder in result["founders"]: 
    print(founder)

Smart Scraping (Async)

Asynchronous version of the smart scraper:

result = await scrapegraph_tool.scrapegraph_smartscraper_async(
    prompt="Extract product information",
    url="https://example.com/product",
    api_key="your-api-key",
    schema=schema,
)

Submit Feedback

Provide feedback on extraction results:

response = scrapegraph_tool.scrapegraph_feedback(
    request_id="request-id",
    api_key="your-api-key",
    rating=5,
    feedback_text="Great results!",
)

Check Credits

Monitor your API credit usage:

credits = scrapegraph_tool.scrapegraph_get_credits(api_key="your-api-key")

Use Cases

RAG Applications

Build powerful retrieval-augmented generation systems

Knowledge Bases

Create and maintain up-to-date knowledge bases

Web Research

Automate web research and data collection

Content Indexing

Index and structure web content for search

Support

Need help with the integration?

GitHub Issues

Report bugs and request features

Discord Community

Get help from our community

On this page

Overview
Installation
Usage
Available Functions
Smart Scraping (Sync)
Smart Scraping (Async)
Submit Feedback
Check Credits
Use Cases
Support

Overview

This tool integrates ScrapeGraph with LlamaIndex, providing intelligent web scraping capabilities with structured data extraction.

Official LlamaHub Documentation

View the integration on LlamaHub

Installation

Install the package using pip:

pip install llama-index-tools-scrapegraphai

Usage

First, import and initialize the ScrapegraphToolSpec:

from llama_index.tools.scrapegraph.base import ScrapegraphToolSpec

scrapegraph_tool = ScrapegraphToolSpec()

Available Functions

Smart Scraping (Sync)

Extract structured data using a schema:

from pydantic import BaseModel, Field 
 
class FounderSchema(BaseModel): 
    name: str = Field(description="Name of the founder") 
    role: str = Field(description="Role of the founder") 
    social_media: str = Field(description="Social media URL of the founder") 
 
class ListFoundersSchema(BaseModel): 
    founders: list[FounderSchema] = Field(description="List of founders") 
 
response = scrapegraph_tool.scrapegraph_smartscraper( 
    prompt="Extract product information", 
    url="https://scrapegraphai.com/", 
    api_key="sgai-***", 
    schema=ListFoundersSchema, 
) 
 
result = response["result"] 
 
for founder in result["founders"]: 
    print(founder)

Smart Scraping (Async)

Asynchronous version of the smart scraper:

result = await scrapegraph_tool.scrapegraph_smartscraper_async(
    prompt="Extract product information",
    url="https://example.com/product",
    api_key="your-api-key",
    schema=schema,
)

Submit Feedback

Provide feedback on extraction results:

response = scrapegraph_tool.scrapegraph_feedback(
    request_id="request-id",
    api_key="your-api-key",
    rating=5,
    feedback_text="Great results!",
)

Check Credits

Monitor your API credit usage:

credits = scrapegraph_tool.scrapegraph_get_credits(api_key="your-api-key")

Use Cases

RAG Applications

Build powerful retrieval-augmented generation systems

Knowledge Bases

Create and maintain up-to-date knowledge bases

Web Research

Automate web research and data collection

Content Indexing

Index and structure web content for search

Support

Need help with the integration?

GitHub Issues

Report bugs and request features

Discord Community

Get help from our community

On this page

Overview
Installation
Usage
Available Functions
Smart Scraping (Sync)
Smart Scraping (Async)
Submit Feedback
Check Credits
Use Cases
Support

​Overview

Official LlamaHub Documentation

​Installation

​Usage

​Available Functions

​Smart Scraping (Sync)

​Smart Scraping (Async)

​Submit Feedback

​Check Credits

​Use Cases

RAG Applications

Knowledge Bases

Web Research

Content Indexing

​Support

GitHub Issues

Discord Community

API Documentation

SmartScraper

SearchScraper

SmartCrawler

Markdownify

User

​Overview

Official LlamaHub Documentation

​Installation

​Usage

​Available Functions

​Smart Scraping (Sync)

​Smart Scraping (Async)

​Submit Feedback

​Check Credits

​Use Cases

RAG Applications

Knowledge Bases

Web Research

Content Indexing

​Support

GitHub Issues

Discord Community

Overview

Installation

Usage

Available Functions

Smart Scraping (Sync)

Smart Scraping (Async)

Submit Feedback

Check Credits

Use Cases

Support

Overview

Installation

Usage

Available Functions

Smart Scraping (Sync)

Smart Scraping (Async)

Submit Feedback

Check Credits

Use Cases

Support