Markdownify

Overview

Markdownify is our specialized service that transforms web content into clean, well-formatted markdown. It intelligently preserves the content’s structure while removing unnecessary elements, making it perfect for content migration, documentation creation, and knowledge base building.

Try Markdownify instantly in our interactive playground - no coding required!

Getting Started

Quick Start

from scrapegraph_py import Client

client = Client(api_key="your-api-key")

response = client.markdownify(
    website_url="https://example.com/article"
)

Parameters

Parameter	Type	Required	Description
apiKey	string	Yes	The ScrapeGraph API Key.
websiteUrl	string	Yes	The URL of the webpage to convert to markdown.

Get your API key from the dashboard

Example Response

{
  "request_id": "sg-req-md456",
  "status": "completed",
  "website_url": "https://example.com/article",
  "result": "# Understanding AI-Powered Web Scraping\n\nWeb scraping has evolved significantly with the advent of AI technologies...\n\n## Key Benefits\n\n- Improved accuracy\n- Intelligent extraction\n- Structured output\n\n![AI Scraping Process](https://example.com/images/ai-scraping.png)\n\n> AI-powered scraping represents the future of web data extraction.\n\n### Getting Started\n\n1. Choose your target website\n2. Define extraction goals\n3. Select appropriate tools\n",
  "error": ""
}

The response includes:

request_id: Unique identifier for tracking your request
status: Current status of the conversion
result: Object containing the markdown content and metadata
error: Error message (if any occurred during conversion)

Key Features

Smart Conversion

Intelligent content structure preservation

Clean Output

Removes ads, navigation, and irrelevant content

Format Retention

Maintains headings, lists, and text formatting

Asset Handling

Preserves images and handles external links

Use Cases

Content Migration

Convert blog posts to markdown
Transform documentation
Migrate knowledge bases
Archive web content

Documentation

Create technical documentation
Build wikis and guides
Generate README files
Maintain developer docs

Content Management

Prepare content for CMS import
Create portable content
Build learning resources
Format articles

Want to learn more about our AI-powered scraping technology? Visit our main website to discover how we’re revolutionizing web data extraction.

Other Functionality

Retrieve a previous request

If you know the response id of a previous request you made, you can retrieve all the information.

import { getMarkdownifyRequest } from 'scrapegraph-js';

const apiKey = 'your_api_key';
const requestId = 'ID_of_previous_request';

try {
  const requestInfo = await getMarkdownifyRequest(apiKey, requestId);
  console.log(requestInfo);
} catch (error) {
  console.error(error);
}

Parameters

Parameter	Type	Required	Description
apiKey	string	Yes	The ScrapeGraph API Key.
requestId	string	Yes	The request ID associated with the output of a previous searchScraper request.

Async Support

For applications requiring asynchronous execution, Markdownify provides async support through the AsyncClient. Here’s a basic example:

from scrapegraph_py import AsyncClient
import asyncio

async def main():
    async with AsyncClient(api_key="your-api-key") as client:
        response = await client.markdownify(
            website_url="https://example.com/article"
        )
        print(response)

# Run the async function
asyncio.run(main())

For more advanced concurrent processing, you can use the following example:

import asyncio
from scrapegraph_py import AsyncClient
from scrapegraph_py.logger import sgai_logger

sgai_logger.set_logging(level="INFO")

async def main():
    # Initialize async client
    sgai_client = AsyncClient(api_key="your-api-key-here")

    # Concurrent markdownify requests
    urls = [
        "https://scrapegraphai.com/",
        "https://github.com/ScrapeGraphAI/Scrapegraph-ai",
    ]

    tasks = [sgai_client.markdownify(website_url=url) for url in urls]

    # Execute requests concurrently
    responses = await asyncio.gather(*tasks, return_exceptions=True)

    # Process results
    for i, response in enumerate(responses):
        if isinstance(response, Exception):
            print(f"\nError for {urls[i]}: {response}")
        else:
            print(f"\nPage {i+1} Markdown:")
            print(f"URL: {urls[i]}")
            print(f"Result: {response['result']}")

    await sgai_client.close()

if __name__ == "__main__":
    asyncio.run(main())

This advanced example demonstrates:

Concurrent processing of multiple URLs
Error handling for failed requests
Proper client cleanup
Logging configuration

Integration Options

Official SDKs

Python SDK - Perfect for automation and content processing
JavaScript SDK - Ideal for web applications and content tools

AI Framework Integrations

LangChain Integration - Use Markdownify in your content pipelines
LlamaIndex Integration - Create searchable knowledge bases

Best Practices

Content Optimization

Verify source content quality
Check image and link preservation
Review markdown formatting
Validate output structure

Processing Tips

Handle large content in chunks
Preserve important metadata
Maintain content hierarchy
Check for formatting consistency

Example Projects

Check out our cookbook for real-world examples:

Blog migration tools
Documentation generators
Content archival systems
Knowledge base builders

API Reference

For detailed API documentation, see:

Support & Resources

Documentation

Comprehensive guides and tutorials

API Reference

Detailed API documentation

Community

Join our Discord community

GitHub

Check out our open-source projects

Main Website

Visit our official website

Get Started

Services

Official SDKs

Integrations

Contribute

Resources

Overview

Getting Started

Quick Start

Parameters

Key Features

Smart Conversion

Clean Output

Format Retention

Asset Handling

Use Cases

Content Migration

Documentation

Content Management

Other Functionality

Retrieve a previous request

Parameters

Async Support

Integration Options

Official SDKs

AI Framework Integrations

Best Practices

Content Optimization

Processing Tips

Example Projects

API Reference

Support & Resources

Documentation

API Reference

Community

GitHub

Main Website

Ready to Start?

Get Started

Services

Official SDKs

Integrations

Contribute

Resources

​Overview

​Getting Started

​Quick Start

​Parameters

​Key Features

Smart Conversion

Clean Output

Format Retention

Asset Handling

​Use Cases

​Content Migration

​Documentation

​Content Management

​Other Functionality

​Retrieve a previous request

​Parameters

​Async Support

​Integration Options

​Official SDKs

​AI Framework Integrations

​Best Practices

​Content Optimization

​Processing Tips

​Example Projects

​API Reference

​Support & Resources

Documentation

API Reference

Community

GitHub

Main Website

Ready to Start?

Overview

Getting Started

Quick Start

Parameters

Key Features

Use Cases

Content Migration

Documentation

Content Management

Other Functionality

Retrieve a previous request

Parameters

Async Support

Integration Options

Official SDKs

AI Framework Integrations

Best Practices

Content Optimization

Processing Tips

Example Projects

API Reference

Support & Resources