Quick Start
Get started with PageData.to in under 5 minutes
You'll need an API key to get started. Sign up for a free account to get yours.
1. Get Your API Key
Sign up for a free account and grab your API key from the dashboard.
pd_live_1234567890abcdef
2. Make Your First Request
Use our simple API to extract data from any webpage by specifying field names.
curl -X POST https://api.pagedata.to/v1/extract \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"url": "https://example.com/product",
"fields": "product_title, price, rating"
}'3. Get Your Data
Receive clean, structured JSON back instantly.
{
"success": true,
"data": {
"product_title": "Sony WH-1000XM5 Wireless Headphones",
"price": "$349.99",
"rating": "4.7 out of 5 stars"
}
}Authentication
Secure your API requests with API keys
PageData.to uses API keys to authenticate requests. You can create and manage your API keys from your dashboard.
Using Your API Key
Include your API key in the Authorization header of every request using the Bearer authentication scheme.
Authorization: Bearer pd_live_1234567890abcdef
Important: Keep your API keys secure and never expose them in client-side code or public repositories. Use environment variables to store your keys.
API Key Format
API keys follow the format pd_live_ followed by a random string. Test mode keys use pd_test_ prefix.
API Reference
Complete reference for all PageData.to endpoints
/api/extractExtract structured data from a webpage by specifying field names to extract.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| url | string | Yes | The URL of the webpage to extract data from. Must be a valid URL format. |
| fields | string | object | Yes | Simple mode: Comma-separated string (e.g., "title, price, description"). Advanced mode: Schema object defining complex structures with nested objects and arrays. |
| mode | string | No | Either simple (default) or advanced. Advanced mode requires fields to be a schema object. |
| extract_multiple | boolean | No | Set to true to extract multiple items from a page (e.g., list of products). Default: false. Only applies to simple mode. |
| stealth_plus | boolean | No | Enable enhanced stealth mode for difficult-to-scrape sites. Default: false. |
| output_format | string | No | Output format: json (default) or csv. CSV format automatically downloads as a file. |
Response
{
"success": true,
"data": {
// Your extracted data as key-value pairs
// Field names match what you specified in the request
// All values are returned as strings (optional/nullable)
}
}Error Responses
Invalid request parameters
Missing or invalid API key
Plan limit exceeded or quota exhausted
Rate limit or concurrency limit exceeded
Scraping failed, extraction error, or other server-side issue
Examples
Real-world examples showing different ways to use the API
Simple Field Extraction
Extract basic fields from a product page
curl -X POST https://api.pagedata.to/v1/extract \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"url": "https://example.com/product",
"fields": "title, price, description, rating"
}'Extract Multiple Items
Extract a list of products from a category page
{
"url": "https://example.com/products",
"fields": "product_name, price, availability",
"extract_multiple": true
}
// Response:
{
"success": true,
"data": {
"data": [
{
"product_name": "Product 1",
"price": "$29.99",
"availability": "In Stock"
},
{
"product_name": "Product 2",
"price": "$39.99",
"availability": "Out of Stock"
}
]
}
}Advanced Schema with Nested Objects
Define complex data structures for detailed extraction
{
"url": "https://example.com/product",
"mode": "advanced",
"fields": {
"type": "object",
"properties": {
"title": { "type": "string" },
"price": {
"type": "object",
"properties": {
"amount": { "type": "string" },
"currency": { "type": "string" },
"discount": { "type": "string" }
}
},
"reviews": {
"type": "array",
"items": {
"type": "object",
"properties": {
"author": { "type": "string" },
"rating": { "type": "string" },
"comment": { "type": "string" }
}
}
}
}
}
}Export to CSV
Get results as a CSV file for spreadsheet import
{
"url": "https://example.com/products",
"fields": "product_name, price, rating",
"extract_multiple": true,
"output_format": "csv"
}
// Response will be a CSV file:
// Content-Type: text/csv
// Content-Disposition: attachment; filename="extract-2025-10-29T12-00-00.csv"
product_name,price,rating
"Product 1","$29.99","4.5"
"Product 2","$39.99","4.8"Using Stealth Mode
Enable enhanced scraping for difficult sites
{
"url": "https://difficult-site.com/page",
"fields": "title, content",
"stealth_plus": true
}
// Higher success rate on protected sitesBest Practices
Tips and recommendations for optimal API usage
Field Naming
- Use descriptive, semantic field names (e.g., "product_title" instead of "field1")
- Avoid spaces in field names - they will be converted to underscores
- Use consistent naming conventions across your requests
- Keep field names concise but meaningful
Error Handling
- Always check the
successfield in responses - Implement exponential backoff for rate limit errors (429)
- Log the
X-Request-IDheader for debugging - Handle missing or null values in extracted data gracefully
- Monitor your quota usage via the dashboard to avoid hitting limits
Performance Optimization
- Extract only the fields you need - fewer fields = faster processing
- Use simple mode when possible - advanced schemas have more overhead
- Consider batching requests during off-peak hours if scraping large datasets
- Cache results when appropriate to reduce API calls
- Use concurrency limits wisely - stay within your plan's concurrent request limit
Security
- Never expose API keys in client-side code or public repositories
- Use environment variables to store API keys
- Rotate API keys periodically for enhanced security
- Create separate API keys for different environments (dev, staging, production)
- Revoke unused API keys immediately
Stealth Mode Usage
- Use
stealth_plusonly when standard scraping fails - Stealth mode may have slightly higher latency due to additional processing
- Test without stealth mode first to optimize for cost and speed
- Some sites may still block automated access - respect robots.txt
Data Quality
- AI extraction works best with clear, semantic field names
- More specific field names yield more accurate results (e.g., "product_price_usd" vs "price")
- Validate and sanitize extracted data before using in production
- All extracted values are strings - convert to appropriate types as needed
- Fields may be null if content isn't found on the page