Job Parameters¶

Every scraping job is a single JSON payload sent to POST /v1/jobs (or passed as keyword arguments to the Python SDK). This page is the complete reference for that payload, grouped by area, with types, defaults, and which worker tier honors each option.

Two required fields, everything else optional

The only required parameters are job_type and target_url. Start minimal, then add options as you need them.

Core (all tiers)¶

Parameter	Type	Default	Description
`job_type`	string	- required	Worker tier: `light`, `standard`, or `stealth`. These are the only accepted values.
`target_url`	string (URI)	- required	The absolute URL to scrape. Must be `http`/`https` and pass SSRF safety checks.
`timeout_ms`	integer	`30000`	Overall job timeout in milliseconds. Range 1000–300000.
`tags`	string[]	`[]`	Free-form labels for filtering and grouping (e.g. `["env:prod", "batch:nightly"]`).
`idempotency_token`	string	-	Re-submitting with the same token returns the original job instead of creating a duplicate. See Idempotency.

Pythoncurl

client.scrape(
    job_type="light",
    target_url="https://example.com",
    timeout_ms=45000,
    tags=["env:prod"],
)

curl -X POST "https://api.scrapenest.com/v1/jobs" \
  -H "X-API-Key: sn_live_..." \
  -H "Content-Type: application/json" \
  -d '{"job_type": "light", "target_url": "https://example.com", "timeout_ms": 45000, "tags": ["env:prod"]}'

HTTP options (Light tier)¶

These apply to job_type: "light", which uses a high-performance HTTP engine with TLS impersonation.

Parameter	Type	Default	Description
`method`	string	`GET`	HTTP method: `GET`, `POST`, `PUT`, `PATCH`, `DELETE`, `HEAD`.
`headers`	object	-	Request headers to send to the target, e.g. `{"Accept": "text/html"}`.
`body`	string \| null	-	Request body for `POST`/`PUT`/`PATCH`.
`follow_redirects`	boolean	`true`	Whether to follow 3xx redirects.
`max_redirects`	integer	engine default	Maximum redirects to follow when `follow_redirects` is true. Range 0-20.
`user_agent`	string	engine default	Override the User-Agent header.
`retry_policy`	object	engine default	Per-job retry rules - see below.

retry_policy shape:

Field	Type	Description
`max_attempts`	integer	Maximum attempts (including the first).
`retry_on_status`	integer[]	HTTP status codes that trigger a retry, e.g. `[429, 500, 502]`.
`retry_methods`	string[]	Methods eligible for retry, e.g. `["GET"]`.

Pythoncurl

client.scrape(
    job_type="light",
    target_url="https://api.example.com/products",
    method="POST",
    headers={"Accept": "application/json"},
    body='{"query": "widgets"}',
    retry_policy={"max_attempts": 3, "retry_on_status": [429, 500, 502], "retry_methods": ["GET", "POST"]},
)

curl -X POST "https://api.scrapenest.com/v1/jobs" \
  -H "X-API-Key: sn_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "job_type": "light",
    "target_url": "https://api.example.com/products",
    "method": "POST",
    "headers": {"Accept": "application/json"},
    "body": "{\"query\": \"widgets\"}",
    "retry_policy": {"max_attempts": 3, "retry_on_status": [429, 500, 502], "retry_methods": ["GET", "POST"]}
  }'

Browser options (Standard & Stealth tiers)¶

These apply to job_type: "standard" and job_type: "stealth", which render the page in a real browser.

Parameter	Type	Default	Description
`headless`	boolean	`true`	Run the browser headless.
`browser`	string	`chromium`	Engine, determined by the job tier: `chromium` on the Standard tier, the hardened `stealth` engine on the Stealth tier.
`wait_until`	string	`load`	When navigation is considered complete: `load`, `domcontentloaded`, `networkidle`, `commit`.
`navigation_timeout_ms`	integer	`30000`	Per-navigation timeout. Range 1000–300000.
`viewport`	object	engine default	Standard only. `{"width": <320–7680>, "height": <240–4320>}`. Rejected on Stealth with `invalid_viewport_tier` - see the note below.
`locale`	string	engine default	Standard only. BCP-47 locale, e.g. `en-US`, `fr-FR`. Rejected on Stealth with `invalid_locale_tier`.
`timezone_id`	string	engine default	Standard only. IANA timezone, e.g. `UTC`, `Europe/Paris`. Rejected on Stealth with `invalid_timezone_id_tier`.
`user_agent`	string	engine default	Standard only. Override the browser User-Agent. Rejected on Stealth with `invalid_user_agent_tier`.
`headers`	object	-	Extra HTTP headers for the navigation.
`actions`	object[]	`[]`	Scripted interactions performed after load - see below.

The browser profile is a Standard-tier control

viewport, locale, timezone_id and user_agent all describe the browser you appear to be. The Stealth engine generates these as one self-consistent fingerprint per browser process; overriding a single facet while the rest stays put makes the profile internally inconsistent, and that mismatch is exactly what anti-bot systems look for - so Stealth applies none of them by design.

Sending any of the four with job_type: "stealth" returns 422 (invalid_viewport_tier, invalid_locale_tier, invalid_timezone_id_tier, invalid_user_agent_tier) rather than silently dropping a value you set.

On Stealth, use os_name (windows, macos, linux) to choose which profile to present. Use the Standard tier when you need to control the dimensions, language, clock or User-Agent directly - for example for consistent screenshot framing or a specific locale variant.

Pythoncurl

client.scrape(
    job_type="standard",
    target_url="https://example.com/dashboard",
    wait_until="networkidle",
    viewport={"width": 1920, "height": 1080},
    locale="fr-FR",
    timezone_id="Europe/Paris",
)

curl -X POST "https://api.scrapenest.com/v1/jobs" \
  -H "X-API-Key: sn_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "job_type": "standard",
    "target_url": "https://example.com/dashboard",
    "wait_until": "networkidle",
    "viewport": {"width": 1920, "height": 1080},
    "locale": "fr-FR",
    "timezone_id": "Europe/Paris"
  }'

Actions¶

actions is an ordered list of interactions executed after the page loads - clicks, typing, scrolling, and waits. Use them to dismiss banners, expand content, or drive a multi-step flow before capture.

"actions": [
  {"type": "click", "selector": "#accept-cookies"},
  {"type": "fill", "selector": "input[name=q]", "value": "widgets"},
  {"type": "scroll", "direction": "down"},
  {"type": "wait", "timeout_ms": 1000}
]

See Scrape a JavaScript SPA for worked examples.

Stealth & anti-blocking options (Stealth tier)¶

These apply to job_type: "stealth", the hardened browser tier for protected targets. See Anti-Blocking and Work with protected targets.

Parameter	Type	Default	Description
`os_name`	string	engine default	Spoof the OS profile: `windows`, `macos`, `linux`.
`browser_extensions`	string[]	`[]`	Pre-installed helper extensions by slug, e.g. `ublock` (ad/tracker blocking), `isdcac` (consent handling).
`proxy`	object	managed pool	Route egress through your own proxy (Stealth tier only). ScrapeNest chains traffic through it - it is not a bypass, requests still pass through ScrapeNest first. `{"server": "http://host:port", "username": "...", "password": "...", "bypass": "*.example.com"}`. `server` must use http or https, include an explicit port, and resolve to a public address. `username`/`password` and `bypass` are optional. Omit `proxy` to use ScrapeNest's managed pool.

The job manifest records the effective egress under egress (mode: managed or mode: custom with the proxy host, never credentials), so you can confirm which exit a job used.

PythonPython (custom proxy)Shell (custom proxy)curl

client.scrape(
    job_type="stealth",
    target_url="https://protected.example.com",
    os_name="macos",
    browser_extensions=["ublock", "isdcac"],
)

client.scrape(
    job_type="stealth",
    target_url="https://protected.example.com",
    proxy={
        "server": "http://proxy.example.com:8080",
        "username": "user",
        "password": "secret",
    },
)

curl -X POST "https://api.scrapenest.com/v1/jobs" \
  -H "X-API-Key: sn_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "job_type": "stealth",
    "target_url": "https://protected.example.com",
    "proxy": {"server": "http://proxy.example.com:8080", "username": "user", "password": "secret"}
  }'

curl -X POST "https://api.scrapenest.com/v1/jobs" \
  -H "X-API-Key: sn_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "job_type": "stealth",
    "target_url": "https://protected.example.com",
    "os_name": "macos",
    "browser_extensions": ["ublock", "isdcac"]
  }'

Artifact options¶

artifact_options controls which outputs a job produces and how the browser loads the page. Set only what you need - fewer artifacts means faster, cheaper jobs. See Artifacts & Extraction.

Option	Type	Default	Tiers	Description
`include_html`	boolean	`true`	all	Save the final rendered HTML.
`include_response_body`	boolean	`false`	all	Save the raw HTTP response body.
`include_extraction`	boolean	`false`	all	Run `extraction` hooks and save results as JSON.
`include_screenshot`	boolean	`false`	browser	Capture a PNG screenshot.
`include_har`	boolean	`false`	browser	Capture a full network HAR trace (no response bodies - see Artifacts).
`include_reader`	boolean	`false`	browser	Save article content as Markdown. Empty on pages with no article-like content.
`include_links`	boolean	`false`	browser	Save every `<a href>` as JSON.
`include_text`	boolean	`false`	browser	Save the page's visible text.
`include_page_metadata`	boolean	`false`	browser	Save title/description/canonical/Open Graph/Twitter/JSON-LD as JSON.
`include_console`	boolean	`false`	browser	Capture browser console logs.
`block_images`	boolean	`false`	browser	Block image loading (faster, cheaper).
`block_styles`	boolean	`false`	browser	Block CSS. Also accepted as `block_stylesheets`.
`block_media`	boolean	`false`	browser	Block audio/video.
`block_fonts`	boolean	`false`	browser	Block web fonts.
`no_scroll`	boolean	`false`	browser	Disable auto-scroll before screenshot/capture.

Response metadata (HTTP status, response headers, timing) is always recorded in the job manifest - there is no separate metadata option or artifact.

Pythoncurl

client.scrape(
    job_type="stealth",
    target_url="https://example.com",
    artifact_options={
        "include_html": True,
        "include_screenshot": True,
        "block_images": True,
        "block_fonts": True,
    },
)

curl -X POST "https://api.scrapenest.com/v1/jobs" \
  -H "X-API-Key: sn_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "job_type": "stealth",
    "target_url": "https://example.com",
    "artifact_options": {"include_html": true, "include_screenshot": true, "block_images": true, "block_fonts": true}
  }'

Extraction¶

extraction.hooks pulls structured data out of the page using CSS selectors, regular expressions, or JSONPath - no post-processing on your side. This is documented in full, with limits and the result format, in Artifacts & Extraction → Extraction hooks.

"extraction": {
  "hooks": [
    {"hook_id": "title", "type": "css", "selector": "h1"},
    {"hook_id": "price", "type": "css", "selector": ".price", "attribute": "data-amount"},
    {"hook_id": "sku", "type": "regex", "pattern": "SKU-(\\d+)", "all_matches": true}
  ]
}

Validation & errors¶

The API validates the payload before accepting a job. Invalid parameters return 422 with a machine-readable error code naming the offending field - for example invalid_wait_until, invalid_viewport_width, invalid_method, invalid_timeout_ms. Fix the parameter and resubmit; these are not retryable. See Errors & Retries.

Enforced constraints worth remembering:

timeout_ms and navigation_timeout_ms: 1000–300000.
max_redirects: 0–20.
viewport.width: 320–7680; viewport.height: 240–4320. Standard tier only - viewport on Stealth returns invalid_viewport_tier.
wait_until ∈ {load, domcontentloaded, networkidle, commit}.
method ∈ {GET, POST, PUT, PATCH, DELETE, HEAD}.

Next steps¶

Artifacts & Extraction - what comes back and how to read it.
Worker Tiers - pick the right engine.
Guides - these parameters in real recipes.