Skip to content

Job Parameters

Every scraping job is a single JSON payload sent to POST /api/v1/jobs (or passed as keyword arguments to the Python SDK). This page is the complete reference for that payload, grouped by area, with types, defaults, and which worker tier honors each option.

Two required fields, everything else optional

The only required parameters are job_type and target_url. Start minimal, then add options as you need them.

Core (all tiers)

Parameter Type Default Description
job_type string required Worker tier: light, standard, or stealth. (http and browser are deprecated aliases for light and stealth.)
target_url string (URI) required The absolute URL to scrape. Must be http/https and pass SSRF safety checks.
timeout_ms integer 30000 Overall job timeout in milliseconds. Range 1000–300000.
tags string[] [] Free-form labels for filtering and grouping (e.g. ["env:prod", "batch:nightly"]).
idempotency_token string Re-submitting with the same token returns the original job instead of creating a duplicate. See Idempotency.
client.scrape_sync(
    job_type="light",
    target_url="https://example.com",
    timeout_ms=45000,
    tags=["env:prod"],
)
curl -X POST "https://api.scrapenest.com/api/v1/jobs" \
  -H "X-API-Key: sn_live_..." \
  -H "Content-Type: application/json" \
  -d '{"job_type": "light", "target_url": "https://example.com", "timeout_ms": 45000, "tags": ["env:prod"]}'

HTTP options (Light tier)

These apply to job_type: "light", which uses a high-performance HTTP engine with TLS impersonation.

Parameter Type Default Description
method string GET HTTP method: GET, POST, PUT, PATCH, DELETE, HEAD.
headers object Request headers to send to the target, e.g. {"Accept": "text/html"}.
body string | null Request body for POST/PUT/PATCH.
follow_redirects boolean true Whether to follow 3xx redirects.
user_agent string engine default Override the User-Agent header.
retry_policy object engine default Per-job retry rules — see below.

retry_policy shape:

Field Type Description
max_attempts integer Maximum attempts (including the first).
retry_on_status integer[] HTTP status codes that trigger a retry, e.g. [429, 500, 502].
retry_methods string[] Methods eligible for retry, e.g. ["GET"].
client.scrape_sync(
    job_type="light",
    target_url="https://api.example.com/products",
    method="POST",
    headers={"Accept": "application/json"},
    body='{"query": "widgets"}',
    retry_policy={"max_attempts": 3, "retry_on_status": [429, 500, 502], "retry_methods": ["GET", "POST"]},
)
curl -X POST "https://api.scrapenest.com/api/v1/jobs" \
  -H "X-API-Key: sn_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "job_type": "light",
    "target_url": "https://api.example.com/products",
    "method": "POST",
    "headers": {"Accept": "application/json"},
    "body": "{\"query\": \"widgets\"}",
    "retry_policy": {"max_attempts": 3, "retry_on_status": [429, 500, 502], "retry_methods": ["GET", "POST"]}
  }'

Browser options (Standard & Stealth tiers)

These apply to job_type: "standard" and job_type: "stealth", which render the page in a real browser.

Parameter Type Default Description
headless boolean true Run the browser headless.
browser string chromium Engine: chromium, camoufox, or firefox. Stealth jobs default to the hardened engine.
wait_until string load When navigation is considered complete: load, domcontentloaded, networkidle, commit.
navigation_timeout_ms integer 30000 Per-navigation timeout. Range 1000–300000.
viewport object engine default {"width": <320–7680>, "height": <240–4320>}.
locale string engine default BCP-47 locale, e.g. en-US, fr-FR.
timezone_id string engine default IANA timezone, e.g. UTC, Europe/Paris.
user_agent string engine default Override the browser User-Agent.
headers object Extra HTTP headers for the navigation.
actions object[] [] Scripted interactions performed after load — see below.
client.scrape_sync(
    job_type="standard",
    target_url="https://example.com/dashboard",
    wait_until="networkidle",
    viewport={"width": 1920, "height": 1080},
    locale="fr-FR",
    timezone_id="Europe/Paris",
)
curl -X POST "https://api.scrapenest.com/api/v1/jobs" \
  -H "X-API-Key: sn_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "job_type": "standard",
    "target_url": "https://example.com/dashboard",
    "wait_until": "networkidle",
    "viewport": {"width": 1920, "height": 1080},
    "locale": "fr-FR",
    "timezone_id": "Europe/Paris"
  }'

Actions

actions is an ordered list of interactions executed after the page loads — clicks, typing, scrolling, and waits. Use them to dismiss banners, expand content, or drive a multi-step flow before capture.

"actions": [
  {"type": "click", "selector": "#accept-cookies"},
  {"type": "fill", "selector": "input[name=q]", "value": "widgets"},
  {"type": "scroll", "direction": "down"},
  {"type": "wait", "timeout_ms": 1000}
]

See Scrape a JavaScript SPA for worked examples.

Stealth & anti-blocking options (Stealth tier)

These apply to job_type: "stealth", the hardened browser tier for protected targets. See Anti-Blocking and Work with protected targets.

Parameter Type Default Description
os_name string engine default Spoof the OS profile: windows, macos, linux.
browser_extensions string[] [] Pre-installed helper extensions by slug, e.g. ublock (ad/tracker blocking), isdcac (consent handling).
proxy object managed pool Custom egress: {"server": "http://host:port", "bypass": "*.internal"}. Omit to use ScrapeNest's managed proxy pool.
client.scrape_sync(
    job_type="stealth",
    target_url="https://protected.example.com",
    os_name="macos",
    browser_extensions=["ublock", "isdcac"],
)
curl -X POST "https://api.scrapenest.com/api/v1/jobs" \
  -H "X-API-Key: sn_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "job_type": "stealth",
    "target_url": "https://protected.example.com",
    "os_name": "macos",
    "browser_extensions": ["ublock", "isdcac"]
  }'

Artifact options

artifact_options controls which outputs a job produces and how the browser loads the page. Set only what you need — fewer artifacts means faster, cheaper jobs. See Artifacts & Extraction.

Option Type Default Tiers Description
include_html boolean true all Save the final rendered HTML.
include_response_body boolean false all Save the raw HTTP response body.
include_extraction boolean false all Run extraction hooks and save results as JSON.
include_screenshot boolean false browser Capture a PNG screenshot.
include_har boolean false browser Capture a full network HAR trace.
include_console boolean false browser Capture browser console logs.
block_images boolean false browser Block image loading (faster, cheaper).
block_stylesheets boolean false browser Block CSS.
block_media boolean false browser Block audio/video.
block_fonts boolean false browser Block web fonts.
no_scroll boolean false browser Disable auto-scroll before screenshot/capture.

Response metadata (HTTP status, response headers, timing) is always recorded in the job manifest — there is no separate metadata option or artifact.

client.scrape_sync(
    job_type="stealth",
    target_url="https://example.com",
    artifact_options={
        "include_html": True,
        "include_screenshot": True,
        "block_images": True,
        "block_fonts": True,
    },
)
curl -X POST "https://api.scrapenest.com/api/v1/jobs" \
  -H "X-API-Key: sn_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "job_type": "stealth",
    "target_url": "https://example.com",
    "artifact_options": {"include_html": true, "include_screenshot": true, "block_images": true, "block_fonts": true}
  }'

Extraction

extraction.hooks pulls structured data out of the page using CSS selectors, regular expressions, or JSONPath — no post-processing on your side. This is documented in full, with limits and the result format, in Artifacts & Extraction → Extraction hooks.

"extraction": {
  "hooks": [
    {"hook_id": "title", "type": "css", "selector": "h1"},
    {"hook_id": "price", "type": "css", "selector": ".price", "attribute": "data-amount"},
    {"hook_id": "sku", "type": "regex", "pattern": "SKU-(\\d+)", "all_matches": true}
  ]
}

Validation & errors

The API validates the payload before accepting a job. Invalid parameters return 422 with a machine-readable error code naming the offending field — for example invalid_wait_until, invalid_viewport_width, invalid_method, invalid_timeout_ms. Fix the parameter and resubmit; these are not retryable. See Errors & Retries.

Enforced constraints worth remembering:

  • timeout_ms and navigation_timeout_ms: 1000–300000.
  • viewport.width: 320–7680; viewport.height: 240–4320.
  • wait_until ∈ {load, domcontentloaded, networkidle, commit}.
  • method ∈ {GET, POST, PUT, PATCH, DELETE, HEAD}.

Next steps