Learn how to extract article information from Wired.com using ScrapeGraphAI’s SmartScraper. This example demonstrates how to gather article details, categories, and author information.
Try it yourself in our interactive notebooks:
The Goal
We’ll extract the following article information:
Field | Description |
---|
Category | Article category (e.g., ‘Health’, ‘Environment’) |
Title | Article headline |
Link | URL to the full article |
Author | Writer’s name |
Code Example
from pydantic import BaseModel, Field
from typing import List
from scrapegraph_py import Client
# Schema for a single news item
class NewsItemSchema(BaseModel):
category: str = Field(description="Category of the news (e.g., 'Health', 'Environment')")
title: str = Field(description="Title of the news article")
link: str = Field(description="URL to the news article")
author: str = Field(description="Author of the news article")
# Schema that contains a list of news items
class ListNewsSchema(BaseModel):
news: List[NewsItemSchema] = Field(description="List of news articles with their details")
client = Client(api_key="your-api-key")
response = client.smartscraper(
website_url="https://www.wired.com/",
user_prompt="Extract latest news articles",
output_schema=ListNewsSchema
)
Example Output
{
"news": [
{
"category": "Artificial Intelligence",
"title": "The Race to Build Better Large Language Models",
"link": "https://www.wired.com/story/the-race-to-build-better-llms",
"author": "Will Knight"
},
{
"category": "Security",
"title": "The Latest Cybersecurity Threats You Need to Know About",
"link": "https://www.wired.com/story/latest-cybersecurity-threats",
"author": "Lily Hay Newman"
},
{
"category": "Science",
"title": "New Discoveries in Quantum Computing",
"link": "https://www.wired.com/story/quantum-computing-discoveries",
"author": "Steven Levy"
}
]
}
Have a suggestion for a new example? Contact us with your use case or contribute directly on GitHub.