Overview
Crawl is an advanced web crawling service that traverses multiple pages, follows links, and returns content in your preferred format (markdown or HTML). It provides namespaced operations for starting, monitoring, stopping, and resuming crawl jobs.Try Crawl instantly in our interactive playground
Getting Started
Quick Start
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| url | string | Yes | The starting URL to crawl. |
| depth | int | No | How many levels deep to follow links. |
| max_pages | int | No | Maximum number of pages to crawl. |
| format | string | No | Output format: "markdown" or "html". Default: "markdown". |
| include_patterns | list | No | URL patterns to include (e.g., ["/blog/*"]). |
| exclude_patterns | list | No | URL patterns to exclude (e.g., ["/admin/*"]). |
| fetch_config | FetchConfig | No | Configuration for page fetching (headers, stealth, etc.). |
Get your API key from the dashboard
Managing Crawl Jobs
Check Status
Stop a Running Crawl
Resume a Stopped Crawl
Advanced Usage
With FetchConfig
Async Support
Key Features
Multi-Page Crawling
Traverse entire websites following links automatically
Flexible Formats
Get results in markdown or HTML format
Job Control
Start, stop, resume, and monitor crawl jobs
URL Filtering
Include or exclude pages by URL patterns
Integration Options
Official SDKs
- Python SDK - Perfect for data science and backend applications
- JavaScript SDK - Ideal for web applications and Node.js
AI Framework Integrations
- LangChain Integration - Use Crawl in your LLM workflows
- LlamaIndex Integration - Build powerful search and QA systems
Support & Resources
Documentation
Comprehensive guides and tutorials
API Reference
Detailed API documentation
Community
Join our Discord community
GitHub
Check out our open-source projects