Quick Start

Get started with PageData.to in under 5 minutes

You'll need an API key to get started. Sign up for a free account to get yours.

1. Get Your API Key

# Your API key will look like this:
pd_live_1234567890abcdef

2. Make Your First Request

Use our simple API to extract data from any webpage by specifying field names.

curl -X POST https://api.pagedata.to/v1/extract \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "url": "https://example.com/product",
    "fields": "product_title, price, rating"
  }'

3. Get Your Data

Receive clean, structured JSON back instantly.

{
  "success": true,
  "data": {
    "product_title": "Sony WH-1000XM5 Wireless Headphones",
    "price": "$349.99",
    "rating": "4.7 out of 5 stars"
  }
}

Authentication

Secure your API requests with API keys

PageData.to uses API keys to authenticate requests. You can create and manage your API keys from your dashboard.

Using Your API Key

Include your API key in the Authorization header of every request using the Bearer authentication scheme.

Authorization: Bearer pd_live_1234567890abcdef

Important: Keep your API keys secure and never expose them in client-side code or public repositories. Use environment variables to store your keys.

API Key Format

API keys follow the format pd_live_ followed by a random string. Test mode keys use pd_test_ prefix.

API Reference

Complete reference for all PageData.to endpoints

POST/api/extract

Extract structured data from a webpage by specifying field names to extract.

Request Body

Parameter	Type	Required	Description
url	string	Yes	The URL of the webpage to extract data from. Must be a valid URL format.
fields	string \| object	Yes	Simple mode: Comma-separated string (e.g., "title, price, description"). Advanced mode: Schema object defining complex structures with nested objects and arrays.
mode	string	No	Either `simple` (default) or `advanced`. Advanced mode requires fields to be a schema object.
extract_multiple	boolean	No	Set to `true` to extract multiple items from a page (e.g., list of products). Default: false. Only applies to simple mode.
stealth_plus	boolean	No	Enable enhanced stealth mode for difficult-to-scrape sites. Default: false.
output_format	string	No	Output format: `json` (default) or `csv`. CSV format automatically downloads as a file.

Response

{
  "success": true,
  "data": {
    // Your extracted data as key-value pairs
    // Field names match what you specified in the request
    // All values are returned as strings (optional/nullable)
  }
}

Error Responses

400 Bad Request

Invalid request parameters

{ "error": "Missing required fields: url, fields" }

401 Unauthorized

Missing or invalid API key

{ "error": "Unauthorized", "code": "UNAUTHORIZED" }

403 Forbidden

Plan limit exceeded or quota exhausted

{ "error": "Monthly quota exceeded", "code": "PLAN_LIMIT_EXCEEDED", "details": { "currentUsage": 100, "limit": 100, "remaining": 0, "periodEnd": "2025-11-01T00:00:00.000Z" } }

429 Too Many Requests

Rate limit or concurrency limit exceeded

{ "error": "Rate limit exceeded", "code": "RATE_LIMIT_EXCEEDED" }

500 Internal Server Error

Scraping failed, extraction error, or other server-side issue

{ "error": "Extraction failed", "code": "INTERNAL_ERROR" }

Examples

Real-world examples showing different ways to use the API

Simple Field Extraction

Extract basic fields from a product page

curl -X POST https://api.pagedata.to/v1/extract \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "url": "https://example.com/product",
    "fields": "title, price, description, rating"
  }'

Extract Multiple Items

Extract a list of products from a category page

{
  "url": "https://example.com/products",
  "fields": "product_name, price, availability",
  "extract_multiple": true
}

// Response:
{
  "success": true,
  "data": {
    "data": [
      {
        "product_name": "Product 1",
        "price": "$29.99",
        "availability": "In Stock"
      },
      {
        "product_name": "Product 2",
        "price": "$39.99",
        "availability": "Out of Stock"
      }
    ]
  }
}

Advanced Schema with Nested Objects

Define complex data structures for detailed extraction

{
  "url": "https://example.com/product",
  "mode": "advanced",
  "fields": {
    "type": "object",
    "properties": {
      "title": { "type": "string" },
      "price": {
        "type": "object",
        "properties": {
          "amount": { "type": "string" },
          "currency": { "type": "string" },
          "discount": { "type": "string" }
        }
      },
      "reviews": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "author": { "type": "string" },
            "rating": { "type": "string" },
            "comment": { "type": "string" }
          }
        }
      }
    }
  }
}

Export to CSV

Get results as a CSV file for spreadsheet import

{
  "url": "https://example.com/products",
  "fields": "product_name, price, rating",
  "extract_multiple": true,
  "output_format": "csv"
}

// Response will be a CSV file:
// Content-Type: text/csv
// Content-Disposition: attachment; filename="extract-2025-10-29T12-00-00.csv"

product_name,price,rating
"Product 1","$29.99","4.5"
"Product 2","$39.99","4.8"

Using Stealth Mode

Enable enhanced scraping for difficult sites

{
  "url": "https://difficult-site.com/page",
  "fields": "title, content",
  "stealth_plus": true
}

// Higher success rate on protected sites

Best Practices

Tips and recommendations for optimal API usage

Field Naming

Use descriptive, semantic field names (e.g., "product_title" instead of "field1")
Avoid spaces in field names - they will be converted to underscores
Use consistent naming conventions across your requests
Keep field names concise but meaningful

Error Handling

Always check the success field in responses
Implement exponential backoff for rate limit errors (429)
Log the X-Request-ID header for debugging
Handle missing or null values in extracted data gracefully
Monitor your quota usage via the dashboard to avoid hitting limits

Performance Optimization

Extract only the fields you need - fewer fields = faster processing
Use simple mode when possible - advanced schemas have more overhead
Consider batching requests during off-peak hours if scraping large datasets
Cache results when appropriate to reduce API calls
Use concurrency limits wisely - stay within your plan's concurrent request limit

Security

Never expose API keys in client-side code or public repositories
Use environment variables to store API keys
Rotate API keys periodically for enhanced security
Create separate API keys for different environments (dev, staging, production)
Revoke unused API keys immediately

Stealth Mode Usage

Use stealth_plus only when standard scraping fails
Stealth mode may have slightly higher latency due to additional processing
Test without stealth mode first to optimize for cost and speed
Some sites may still block automated access - respect robots.txt

Data Quality

AI extraction works best with clear, semantic field names
More specific field names yield more accurate results (e.g., "product_price_usd" vs "price")
Validate and sanitize extracted data before using in production
All extracted values are strings - convert to appropriate types as needed
Fields may be null if content isn't found on the page