Scrape API

Extract content from a single webpage in multiple formats including Markdown, HTML, screenshots, and structured JSON.

The Scrape API allows you to extract content from any single webpage. Provide a URL and receive clean, structured content in your preferred format — including Markdown, HTML, raw HTML, screenshots, extracted links, images, summaries, and structured JSON via schema-based extraction.

Endpoint

POST https://api.octivas.com/api/v1/scrape

Request Parameters

Parameter	Type	Required	Default	Description
`url`	`string` (URL)	Yes	—	The URL to scrape content from
`formats`	`string[]`	No	`["markdown"]`	Output formats: `"markdown"`, `"html"`, `"rawHtml"`, `"screenshot"`, `"links"`, `"json"`, `"images"`, `"summary"`
`schema`	`object`	No	—	JSON Schema defining the structure for extraction (requires `"json"` format)
`prompt`	`string`	No	—	Guidance prompt for structured extraction (requires `"json"` format)
`max_age`	`integer`	No	`172800000`	Cache freshness window in milliseconds. Set to `0` to bypass cache and get fresh content
`store_in_cache`	`boolean`	No	`true`	Whether to cache the scrape result for future requests
`location`	`object`	No	—	Geographic settings for the request. See Location Object
`only_main_content`	`boolean`	No	`true`	When `true`, extracts only the primary content area. Set to `false` to include navbars, footers, sidebars, etc.
`timeout`	`integer`	No	`30000`	Request timeout in milliseconds

Output Formats

Format	Returns	Description
`markdown`	`string`	Page content converted to clean Markdown
`html`	`string`	Cleaned HTML content
`rawHtml`	`string`	Original unprocessed HTML from the page
`screenshot`	`string` (URL)	URL to a full-page screenshot image stored on `content.octivas.com`
`links`	`string[]`	All hyperlinks found on the page
`json`	`object`	Structured data extracted using `schema` and/or `prompt`
`images`	`string[]`	All image URLs found on the page
`summary`	`string`	AI-generated summary of the page content

Location Object

Field	Type	Required	Description
`country`	`string`	Yes	ISO 3166-1 alpha-2 country code (e.g. `"US"`, `"DE"`, `"JP"`)
`languages`	`string[]`	No	Preferred languages (e.g. `["en", "de"]`). Defaults to `["en"]`

Example Request

curl -X POST https://api.octivas.com/api/v1/scrape \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "formats": ["markdown", "html", "links", "screenshot"]
  }'

import Octivas from 'octivas';

const client = new Octivas('your_api_key');

const result = await client.scrape({
  url: 'https://example.com',
  formats: ['markdown', 'html', 'links', 'screenshot']
});

console.log(result.markdown);
console.log(result.links);
console.log(result.screenshot); // URL to screenshot image

import octivas

client = octivas.Client("your_api_key")

result = client.scrape(
    url="https://example.com",
    formats=["markdown", "html", "links", "screenshot"]
)

print(result.markdown)
print(result.links)
print(result.screenshot)  # URL to screenshot image

Response

{
  "success": true,
  "url": "https://example.com/",
  "markdown": "# Example\n\nThis is example content.",
  "html": "<h1>Example</h1><p>This is example content.</p>",
  "raw_html": null,
  "screenshot": "https://content.octivas.com/screenshots/abc123.png",
  "links": [
    "https://example.com/about",
    "https://example.com/contact"
  ],
  "json": null,
  "images": null,
  "summary": null,
  "metadata": {
    "title": "Example Domain",
    "description": "Example website",
    "url": "https://example.com/",
    "status_code": 200,
    "credits_used": 1
  }
}

Response Fields

Field	Type	Description
`success`	`boolean`	Whether the request succeeded
`url`	`string`	The resolved URL that was scraped
`markdown`	`string \| null`	Page content as Markdown (if requested)
`html`	`string \| null`	Page content as cleaned HTML (if requested)
`raw_html`	`string \| null`	Original unprocessed HTML (if requested)
`screenshot`	`string \| null`	URL to the screenshot image (if requested)
`links`	`string[] \| null`	Hyperlinks found on the page (if requested)
`json`	`object \| null`	Structured extraction result (if requested with `schema`/`prompt`)
`images`	`string[] \| null`	Image URLs found on the page (if requested)
`summary`	`string \| null`	AI-generated summary (if requested)
`metadata.title`	`string`	Page title
`metadata.description`	`string`	Page meta description
`metadata.url`	`string`	Final URL after redirects
`metadata.status_code`	`number`	HTTP status code
`metadata.credits_used`	`number`	Credits consumed by this request

Structured Extraction

Use the json format with a schema and optional prompt to extract structured data from any page.

curl -X POST https://api.octivas.com/api/v1/scrape \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/product/123",
    "formats": ["json"],
    "schema": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "price": { "type": "number" },
        "currency": { "type": "string" },
        "in_stock": { "type": "boolean" }
      },
      "required": ["name", "price"]
    },
    "prompt": "Extract the product details from this page."
  }'

const result = await client.scrape({
  url: 'https://example.com/product/123',
  formats: ['json'],
  schema: {
    type: 'object',
    properties: {
      name: { type: 'string' },
      price: { type: 'number' },
      currency: { type: 'string' },
      in_stock: { type: 'boolean' }
    },
    required: ['name', 'price']
  },
  prompt: 'Extract the product details from this page.'
});

console.log(result.json);
// { name: "Widget Pro", price: 29.99, currency: "USD", in_stock: true }

result = client.scrape(
    url="https://example.com/product/123",
    formats=["json"],
    schema={
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "price": {"type": "number"},
            "currency": {"type": "string"},
            "in_stock": {"type": "boolean"},
        },
        "required": ["name", "price"],
    },
    prompt="Extract the product details from this page.",
)

print(result.json)
# {"name": "Widget Pro", "price": 29.99, "currency": "USD", "in_stock": True}

Caching

By default, scrape results are cached for 2 days (172,800,000 ms). You can control caching behavior:

max_age: 0 — Always fetch fresh content, bypassing the cache
store_in_cache: false — Fetch normally but don't store the result in the cache

{
  "url": "https://example.com",
  "formats": ["markdown"],
  "max_age": 0,
  "store_in_cache": false
}

Geo / Locale

Use the location parameter to scrape pages as if from a specific country and language:

{
  "url": "https://example.com",
  "formats": ["markdown"],
  "location": {
    "country": "DE",
    "languages": ["de", "en"]
  }
}

Batch Scrape

Scrape multiple URLs (up to 10) in a single request. Batch jobs run asynchronously — submit the job, then poll for results.

Submit a Batch Job

POST https://api.octivas.com/api/v1/batch/scrape

Request Parameters

Parameter	Type	Required	Default	Description
`urls`	`string[]`	Yes	—	URLs to scrape (1–10 max)
`formats`	`string[]`	No	`["markdown"]`	Output formats (same options as single scrape)
`schema`	`object`	No	—	JSON Schema for structured extraction
`prompt`	`string`	No	—	Guidance prompt for extraction
`max_age`	`integer`	No	`172800000`	Cache freshness in ms
`store_in_cache`	`boolean`	No	`true`	Whether to cache results
`location`	`object`	No	—	Geographic settings
`only_main_content`	`boolean`	No	`true`	Extract primary content only
`timeout`	`integer`	No	`30000`	Per-URL timeout in ms

Example

curl -X POST https://api.octivas.com/api/v1/batch/scrape \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3"
    ],
    "formats": ["markdown"]
  }'

const job = await client.batchScrape({
  urls: [
    'https://example.com/page1',
    'https://example.com/page2',
    'https://example.com/page3'
  ],
  formats: ['markdown']
});

console.log(job.job_id); // "507f1f77bcf86cd799439011"

job = client.batch_scrape(
    urls=[
        "https://example.com/page1",
        "https://example.com/page2",
        "https://example.com/page3",
    ],
    formats=["markdown"],
)

print(job.job_id)  # "507f1f77bcf86cd799439011"

Response

{
  "success": true,
  "job_id": "507f1f77bcf86cd799439011",
  "status": "processing",
  "total_urls": 3
}

Poll for Results

GET https://api.octivas.com/api/v1/batch/scrape/{job_id}

curl https://api.octivas.com/api/v1/batch/scrape/507f1f77bcf86cd799439011 \
  -H "Authorization: Bearer your_api_key"

const status = await client.getBatchScrapeStatus('507f1f77bcf86cd799439011');

console.log(status.status);    // "completed"
console.log(status.completed); // 3
status.results.forEach(r => console.log(r.url, r.markdown));

status = client.get_batch_scrape_status("507f1f77bcf86cd799439011")

print(status.status)     # "completed"
print(status.completed)  # 3
for r in status.results:
    print(r.url, r.markdown)

Response

{
  "success": true,
  "job_id": "507f1f77bcf86cd799439011",
  "status": "completed",
  "completed": 3,
  "total": 3,
  "credits_used": 3,
  "results": [
    {
      "success": true,
      "url": "https://example.com/page1",
      "markdown": "# Page 1\n\nContent of page 1.",
      "metadata": {
        "title": "Page 1",
        "url": "https://example.com/page1",
        "status_code": 200,
        "credits_used": 1
      }
    },
    {
      "success": true,
      "url": "https://example.com/page2",
      "markdown": "# Page 2\n\nContent of page 2.",
      "metadata": {
        "title": "Page 2",
        "url": "https://example.com/page2",
        "status_code": 200,
        "credits_used": 1
      }
    }
  ]
}

Status Values

Status	Description
`processing`	Job is still running. Poll again to check progress.
`completed`	All URLs have been scraped. Results are ready.
`failed`	The job encountered a fatal error.

On this page