Octivas Docs
API Reference

Scrape API

Extract content from a single webpage in multiple formats including Markdown, HTML, screenshots, and structured JSON.

The Scrape API allows you to extract content from any single webpage. Provide a URL and receive clean, structured content in your preferred format — including Markdown, HTML, raw HTML, screenshots, extracted links, images, summaries, and structured JSON via schema-based extraction.

Endpoint

POST https://api.octivas.com/api/v1/scrape

Request Parameters

ParameterTypeRequiredDefaultDescription
urlstring (URL)YesThe URL to scrape content from
formatsstring[]No["markdown"]Output formats: "markdown", "html", "rawHtml", "screenshot", "links", "json", "images", "summary"
schemaobjectNoJSON Schema defining the structure for extraction (requires "json" format)
promptstringNoGuidance prompt for structured extraction (requires "json" format)
max_ageintegerNo172800000Cache freshness window in milliseconds. Set to 0 to bypass cache and get fresh content
store_in_cachebooleanNotrueWhether to cache the scrape result for future requests
locationobjectNoGeographic settings for the request. See Location Object
only_main_contentbooleanNotrueWhen true, extracts only the primary content area. Set to false to include navbars, footers, sidebars, etc.
timeoutintegerNo30000Request timeout in milliseconds

Output Formats

FormatReturnsDescription
markdownstringPage content converted to clean Markdown
htmlstringCleaned HTML content
rawHtmlstringOriginal unprocessed HTML from the page
screenshotstring (URL)URL to a full-page screenshot image stored on content.octivas.com
linksstring[]All hyperlinks found on the page
jsonobjectStructured data extracted using schema and/or prompt
imagesstring[]All image URLs found on the page
summarystringAI-generated summary of the page content

Location Object

FieldTypeRequiredDescription
countrystringYesISO 3166-1 alpha-2 country code (e.g. "US", "DE", "JP")
languagesstring[]NoPreferred languages (e.g. ["en", "de"]). Defaults to ["en"]

Example Request

curl -X POST https://api.octivas.com/api/v1/scrape \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "formats": ["markdown", "html", "links", "screenshot"]
  }'
import Octivas from 'octivas';

const client = new Octivas('your_api_key');

const result = await client.scrape({
  url: 'https://example.com',
  formats: ['markdown', 'html', 'links', 'screenshot']
});

console.log(result.markdown);
console.log(result.links);
console.log(result.screenshot); // URL to screenshot image
import octivas

client = octivas.Client("your_api_key")

result = client.scrape(
    url="https://example.com",
    formats=["markdown", "html", "links", "screenshot"]
)

print(result.markdown)
print(result.links)
print(result.screenshot)  # URL to screenshot image

Response

{
  "success": true,
  "url": "https://example.com/",
  "markdown": "# Example\n\nThis is example content.",
  "html": "<h1>Example</h1><p>This is example content.</p>",
  "raw_html": null,
  "screenshot": "https://content.octivas.com/screenshots/abc123.png",
  "links": [
    "https://example.com/about",
    "https://example.com/contact"
  ],
  "json": null,
  "images": null,
  "summary": null,
  "metadata": {
    "title": "Example Domain",
    "description": "Example website",
    "url": "https://example.com/",
    "status_code": 200,
    "credits_used": 1
  }
}

Response Fields

FieldTypeDescription
successbooleanWhether the request succeeded
urlstringThe resolved URL that was scraped
markdownstring | nullPage content as Markdown (if requested)
htmlstring | nullPage content as cleaned HTML (if requested)
raw_htmlstring | nullOriginal unprocessed HTML (if requested)
screenshotstring | nullURL to the screenshot image (if requested)
linksstring[] | nullHyperlinks found on the page (if requested)
jsonobject | nullStructured extraction result (if requested with schema/prompt)
imagesstring[] | nullImage URLs found on the page (if requested)
summarystring | nullAI-generated summary (if requested)
metadata.titlestringPage title
metadata.descriptionstringPage meta description
metadata.urlstringFinal URL after redirects
metadata.status_codenumberHTTP status code
metadata.credits_usednumberCredits consumed by this request

Structured Extraction

Use the json format with a schema and optional prompt to extract structured data from any page.

curl -X POST https://api.octivas.com/api/v1/scrape \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/product/123",
    "formats": ["json"],
    "schema": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "price": { "type": "number" },
        "currency": { "type": "string" },
        "in_stock": { "type": "boolean" }
      },
      "required": ["name", "price"]
    },
    "prompt": "Extract the product details from this page."
  }'
const result = await client.scrape({
  url: 'https://example.com/product/123',
  formats: ['json'],
  schema: {
    type: 'object',
    properties: {
      name: { type: 'string' },
      price: { type: 'number' },
      currency: { type: 'string' },
      in_stock: { type: 'boolean' }
    },
    required: ['name', 'price']
  },
  prompt: 'Extract the product details from this page.'
});

console.log(result.json);
// { name: "Widget Pro", price: 29.99, currency: "USD", in_stock: true }
result = client.scrape(
    url="https://example.com/product/123",
    formats=["json"],
    schema={
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "price": {"type": "number"},
            "currency": {"type": "string"},
            "in_stock": {"type": "boolean"},
        },
        "required": ["name", "price"],
    },
    prompt="Extract the product details from this page.",
)

print(result.json)
# {"name": "Widget Pro", "price": 29.99, "currency": "USD", "in_stock": True}

Caching

By default, scrape results are cached for 2 days (172,800,000 ms). You can control caching behavior:

  • max_age: 0 — Always fetch fresh content, bypassing the cache
  • store_in_cache: false — Fetch normally but don't store the result in the cache
{
  "url": "https://example.com",
  "formats": ["markdown"],
  "max_age": 0,
  "store_in_cache": false
}

Geo / Locale

Use the location parameter to scrape pages as if from a specific country and language:

{
  "url": "https://example.com",
  "formats": ["markdown"],
  "location": {
    "country": "DE",
    "languages": ["de", "en"]
  }
}

Batch Scrape

Scrape multiple URLs (up to 10) in a single request. Batch jobs run asynchronously — submit the job, then poll for results.

Submit a Batch Job

POST https://api.octivas.com/api/v1/batch/scrape

Request Parameters

ParameterTypeRequiredDefaultDescription
urlsstring[]YesURLs to scrape (1–10 max)
formatsstring[]No["markdown"]Output formats (same options as single scrape)
schemaobjectNoJSON Schema for structured extraction
promptstringNoGuidance prompt for extraction
max_ageintegerNo172800000Cache freshness in ms
store_in_cachebooleanNotrueWhether to cache results
locationobjectNoGeographic settings
only_main_contentbooleanNotrueExtract primary content only
timeoutintegerNo30000Per-URL timeout in ms

Example

curl -X POST https://api.octivas.com/api/v1/batch/scrape \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": [
      "https://example.com/page1",
      "https://example.com/page2",
      "https://example.com/page3"
    ],
    "formats": ["markdown"]
  }'
const job = await client.batchScrape({
  urls: [
    'https://example.com/page1',
    'https://example.com/page2',
    'https://example.com/page3'
  ],
  formats: ['markdown']
});

console.log(job.job_id); // "507f1f77bcf86cd799439011"
job = client.batch_scrape(
    urls=[
        "https://example.com/page1",
        "https://example.com/page2",
        "https://example.com/page3",
    ],
    formats=["markdown"],
)

print(job.job_id)  # "507f1f77bcf86cd799439011"

Response

{
  "success": true,
  "job_id": "507f1f77bcf86cd799439011",
  "status": "processing",
  "total_urls": 3
}

Poll for Results

GET https://api.octivas.com/api/v1/batch/scrape/{job_id}
curl https://api.octivas.com/api/v1/batch/scrape/507f1f77bcf86cd799439011 \
  -H "Authorization: Bearer your_api_key"
const status = await client.getBatchScrapeStatus('507f1f77bcf86cd799439011');

console.log(status.status);    // "completed"
console.log(status.completed); // 3
status.results.forEach(r => console.log(r.url, r.markdown));
status = client.get_batch_scrape_status("507f1f77bcf86cd799439011")

print(status.status)     # "completed"
print(status.completed)  # 3
for r in status.results:
    print(r.url, r.markdown)

Response

{
  "success": true,
  "job_id": "507f1f77bcf86cd799439011",
  "status": "completed",
  "completed": 3,
  "total": 3,
  "credits_used": 3,
  "results": [
    {
      "success": true,
      "url": "https://example.com/page1",
      "markdown": "# Page 1\n\nContent of page 1.",
      "metadata": {
        "title": "Page 1",
        "url": "https://example.com/page1",
        "status_code": 200,
        "credits_used": 1
      }
    },
    {
      "success": true,
      "url": "https://example.com/page2",
      "markdown": "# Page 2\n\nContent of page 2.",
      "metadata": {
        "title": "Page 2",
        "url": "https://example.com/page2",
        "status_code": 200,
        "credits_used": 1
      }
    }
  ]
}

Status Values

StatusDescription
processingJob is still running. Poll again to check progress.
completedAll URLs have been scraped. Results are ready.
failedThe job encountered a fatal error.

On this page