Quick Start

Get started with PageData.to in under 5 minutes

You'll need an API key to get started. Sign up for a free account to get yours.

1. Get Your API Key

Sign up for a free account and grab your API key from the dashboard.

# Your API key will look like this:
pd_live_1234567890abcdef

2. Make Your First Request

Use our simple API to extract data from any webpage by specifying field names.

curl -X POST https://api.pagedata.to/v1/extract \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "url": "https://example.com/product",
    "fields": "product_title, price, rating"
  }'

3. Get Your Data

Receive clean, structured JSON back instantly.

{
  "success": true,
  "data": {
    "product_title": "Sony WH-1000XM5 Wireless Headphones",
    "price": "$349.99",
    "rating": "4.7 out of 5 stars"
  }
}

Authentication

Secure your API requests with API keys

PageData.to uses API keys to authenticate requests. You can create and manage your API keys from your dashboard.

Using Your API Key

Include your API key in the Authorization header of every request using the Bearer authentication scheme.

Authorization: Bearer pd_live_1234567890abcdef

Important: Keep your API keys secure and never expose them in client-side code or public repositories. Use environment variables to store your keys.

API Key Format

API keys follow the format pd_live_ followed by a random string. Test mode keys use pd_test_ prefix.

API Reference

Complete reference for all PageData.to endpoints

POST/api/extract

Extract structured data from a webpage by specifying field names to extract.

Request Body

ParameterTypeRequiredDescription
urlstringYesThe URL of the webpage to extract data from. Must be a valid URL format.
fieldsstring | objectYesSimple mode: Comma-separated string (e.g., "title, price, description").
Advanced mode: Schema object defining complex structures with nested objects and arrays.
modestringNoEither simple (default) or advanced. Advanced mode requires fields to be a schema object.
extract_multiplebooleanNoSet to true to extract multiple items from a page (e.g., list of products). Default: false. Only applies to simple mode.
stealth_plusbooleanNoEnable enhanced stealth mode for difficult-to-scrape sites. Default: false.
output_formatstringNoOutput format: json (default) or csv. CSV format automatically downloads as a file.

Response

{
  "success": true,
  "data": {
    // Your extracted data as key-value pairs
    // Field names match what you specified in the request
    // All values are returned as strings (optional/nullable)
  }
}

Error Responses

400 Bad Request

Invalid request parameters

{ "error": "Missing required fields: url, fields" }
401 Unauthorized

Missing or invalid API key

{ "error": "Unauthorized", "code": "UNAUTHORIZED" }
403 Forbidden

Plan limit exceeded or quota exhausted

{ "error": "Monthly quota exceeded", "code": "PLAN_LIMIT_EXCEEDED", "details": { "currentUsage": 100, "limit": 100, "remaining": 0, "periodEnd": "2025-11-01T00:00:00.000Z" } }
429 Too Many Requests

Rate limit or concurrency limit exceeded

{ "error": "Rate limit exceeded", "code": "RATE_LIMIT_EXCEEDED" }
500 Internal Server Error

Scraping failed, extraction error, or other server-side issue

{ "error": "Extraction failed", "code": "INTERNAL_ERROR" }

Examples

Real-world examples showing different ways to use the API

Simple Field Extraction

Extract basic fields from a product page

curl -X POST https://api.pagedata.to/v1/extract \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "url": "https://example.com/product",
    "fields": "title, price, description, rating"
  }'

Extract Multiple Items

Extract a list of products from a category page

{
  "url": "https://example.com/products",
  "fields": "product_name, price, availability",
  "extract_multiple": true
}

// Response:
{
  "success": true,
  "data": {
    "data": [
      {
        "product_name": "Product 1",
        "price": "$29.99",
        "availability": "In Stock"
      },
      {
        "product_name": "Product 2",
        "price": "$39.99",
        "availability": "Out of Stock"
      }
    ]
  }
}

Advanced Schema with Nested Objects

Define complex data structures for detailed extraction

{
  "url": "https://example.com/product",
  "mode": "advanced",
  "fields": {
    "type": "object",
    "properties": {
      "title": { "type": "string" },
      "price": {
        "type": "object",
        "properties": {
          "amount": { "type": "string" },
          "currency": { "type": "string" },
          "discount": { "type": "string" }
        }
      },
      "reviews": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "author": { "type": "string" },
            "rating": { "type": "string" },
            "comment": { "type": "string" }
          }
        }
      }
    }
  }
}

Export to CSV

Get results as a CSV file for spreadsheet import

{
  "url": "https://example.com/products",
  "fields": "product_name, price, rating",
  "extract_multiple": true,
  "output_format": "csv"
}

// Response will be a CSV file:
// Content-Type: text/csv
// Content-Disposition: attachment; filename="extract-2025-10-29T12-00-00.csv"

product_name,price,rating
"Product 1","$29.99","4.5"
"Product 2","$39.99","4.8"

Using Stealth Mode

Enable enhanced scraping for difficult sites

{
  "url": "https://difficult-site.com/page",
  "fields": "title, content",
  "stealth_plus": true
}

// Higher success rate on protected sites

Best Practices

Tips and recommendations for optimal API usage

Field Naming

  • Use descriptive, semantic field names (e.g., "product_title" instead of "field1")
  • Avoid spaces in field names - they will be converted to underscores
  • Use consistent naming conventions across your requests
  • Keep field names concise but meaningful

Error Handling

  • Always check the success field in responses
  • Implement exponential backoff for rate limit errors (429)
  • Log the X-Request-ID header for debugging
  • Handle missing or null values in extracted data gracefully
  • Monitor your quota usage via the dashboard to avoid hitting limits

Performance Optimization

  • Extract only the fields you need - fewer fields = faster processing
  • Use simple mode when possible - advanced schemas have more overhead
  • Consider batching requests during off-peak hours if scraping large datasets
  • Cache results when appropriate to reduce API calls
  • Use concurrency limits wisely - stay within your plan's concurrent request limit

Security

  • Never expose API keys in client-side code or public repositories
  • Use environment variables to store API keys
  • Rotate API keys periodically for enhanced security
  • Create separate API keys for different environments (dev, staging, production)
  • Revoke unused API keys immediately

Stealth Mode Usage

  • Use stealth_plus only when standard scraping fails
  • Stealth mode may have slightly higher latency due to additional processing
  • Test without stealth mode first to optimize for cost and speed
  • Some sites may still block automated access - respect robots.txt

Data Quality

  • AI extraction works best with clear, semantic field names
  • More specific field names yield more accurate results (e.g., "product_price_usd" vs "price")
  • Validate and sanitize extracted data before using in production
  • All extracted values are strings - convert to appropriate types as needed
  • Fields may be null if content isn't found on the page