CrewAI orchestrates role-playing agents around tasks. Every ScrapeGraph v2 endpoint is one method on the official scrapegraph-py SDK β wrap each one with CrewAIβs @tool decorator and you get a full ScrapeGraph toolkit for your crew, no extra dependency required.
The legacy ScrapegraphScrapeTool in crewai-tools still targets ScrapeGraph v1 (smartscraper / website_url / user_prompt) and its repository was archived on 2025-11-10. The wrappers below hit v2 directly through scrapegraph-py and cover every endpoint β scrape, extract, search, crawl, monitor, history, credits.
CrewAI tools are callable outside an agent via .run(**kwargs) β useful for scripts, tests, or as a building block inside a custom task.
from sgai_tools import scrape, extract, search, credits, crawl_start, crawl_getprint(credits.run())print(scrape.run(url="https://example.com"))print(extract.run( url="https://scrapegraphai.com", prompt="Extract the company name and a short description",))print(search.run(query="best AI scraping tools 2026", num_results=3))job = crawl_start.run(url="https://scrapegraphai.com", max_depth=1, max_pages=5)print(crawl_get.run(crawl_id=job["id"]))
Give an agent the whole toolkit and let it pick the right tool per task. CrewAI drives execution through Crew.kickoff().
from crewai import Agent, Crew, Taskfrom sgai_tools import ALL_TOOLSresearcher = Agent( role="Web Researcher", goal="Gather and extract accurate information from websites", backstory="You are an expert web researcher with deep experience in " "extracting structured data from the open web.", tools=ALL_TOOLS, verbose=True,)task = Task( description=( "Visit https://scrapegraphai.com and extract the company name, " "tagline, and the top three product features. Return the result as JSON." ), expected_output="A JSON object with keys: name, tagline, features (list of 3 strings).", agent=researcher,)crew = Crew(agents=[researcher], tasks=[task], verbose=True)result = crew.kickoff()print(result)
Prefer a focused toolset: if the agent only needs extract and search, pass tools=[extract, search] instead of ALL_TOOLS. A tighter surface gives the model a smaller decision space and better routing.
extract already returns structured JSON under the json_data key. Ask CrewAI to validate the task output against a Pydantic model with output_pydantic.
from pydantic import BaseModel, Fieldfrom crewai import Agent, Crew, Taskfrom sgai_tools import extractclass Company(BaseModel): name: str = Field(description="Company name") tagline: str = Field(description="One-line description of what they do")agent = Agent( role="Web Researcher", goal="Extract company facts from homepages", backstory="You extract clean, structured company info.", tools=[extract],)task = Task( description=( "Call the extract tool on https://scrapegraphai.com with prompt " "'Return an object with name and tagline describing the company'. " "Return the final answer as a JSON object with `name` and `tagline`." ), expected_output="JSON object matching the Company schema.", agent=agent, output_pydantic=Company,)crew = Crew(agents=[agent], tasks=[task])result = crew.kickoff()company: Company = result.pydanticprint(company)
A classic CrewAI pattern: one agent searches, a second extracts structured data from the top hit. Tasks run sequentially and the second task receives the firstβs output as context.
from crewai import Agent, Crew, Task, Processfrom sgai_tools import search, extractfinder = Agent( role="Search Specialist", goal="Find the single most relevant URL for a query", backstory="You triage search results and return only the best URL.", tools=[search],)analyst = Agent( role="Data Analyst", goal="Extract concise summaries from a given URL", backstory="You turn raw pages into 3-bullet summaries.", tools=[extract],)find_task = Task( description="Search for 'scrapegraphai documentation' and return only the top URL.", expected_output="A single URL string.", agent=finder,)summarise_task = Task( description="Extract a 3-bullet summary of the page at the URL from the previous task.", expected_output="Three bullet points summarising the page.", agent=analyst, context=[find_task],)crew = Crew( agents=[finder, analyst], tasks=[find_task, summarise_task], process=Process.sequential,)print(crew.kickoff())