Job Parameters¶
Every scraping job is a single JSON payload sent to POST /api/v1/jobs (or passed as keyword arguments to the Python SDK). This page is the complete reference for that payload, grouped by area, with types, defaults, and which worker tier honors each option.
Two required fields, everything else optional
The only required parameters are job_type and target_url. Start minimal, then add options as you need them.
Core (all tiers)¶
| Parameter | Type | Default | Description |
|---|---|---|---|
job_type |
string | — required | Worker tier: light, standard, or stealth. (http and browser are deprecated aliases for light and stealth.) |
target_url |
string (URI) | — required | The absolute URL to scrape. Must be http/https and pass SSRF safety checks. |
timeout_ms |
integer | 30000 |
Overall job timeout in milliseconds. Range 1000–300000. |
tags |
string[] | [] |
Free-form labels for filtering and grouping (e.g. ["env:prod", "batch:nightly"]). |
idempotency_token |
string | — | Re-submitting with the same token returns the original job instead of creating a duplicate. See Idempotency. |
HTTP options (Light tier)¶
These apply to job_type: "light", which uses a high-performance HTTP engine with TLS impersonation.
| Parameter | Type | Default | Description |
|---|---|---|---|
method |
string | GET |
HTTP method: GET, POST, PUT, PATCH, DELETE, HEAD. |
headers |
object | — | Request headers to send to the target, e.g. {"Accept": "text/html"}. |
body |
string | null | — | Request body for POST/PUT/PATCH. |
follow_redirects |
boolean | true |
Whether to follow 3xx redirects. |
user_agent |
string | engine default | Override the User-Agent header. |
retry_policy |
object | engine default | Per-job retry rules — see below. |
retry_policy shape:
| Field | Type | Description |
|---|---|---|
max_attempts |
integer | Maximum attempts (including the first). |
retry_on_status |
integer[] | HTTP status codes that trigger a retry, e.g. [429, 500, 502]. |
retry_methods |
string[] | Methods eligible for retry, e.g. ["GET"]. |
curl -X POST "https://api.scrapenest.com/api/v1/jobs" \
-H "X-API-Key: sn_live_..." \
-H "Content-Type: application/json" \
-d '{
"job_type": "light",
"target_url": "https://api.example.com/products",
"method": "POST",
"headers": {"Accept": "application/json"},
"body": "{\"query\": \"widgets\"}",
"retry_policy": {"max_attempts": 3, "retry_on_status": [429, 500, 502], "retry_methods": ["GET", "POST"]}
}'
Browser options (Standard & Stealth tiers)¶
These apply to job_type: "standard" and job_type: "stealth", which render the page in a real browser.
| Parameter | Type | Default | Description |
|---|---|---|---|
headless |
boolean | true |
Run the browser headless. |
browser |
string | chromium |
Engine: chromium, camoufox, or firefox. Stealth jobs default to the hardened engine. |
wait_until |
string | load |
When navigation is considered complete: load, domcontentloaded, networkidle, commit. |
navigation_timeout_ms |
integer | 30000 |
Per-navigation timeout. Range 1000–300000. |
viewport |
object | engine default | {"width": <320–7680>, "height": <240–4320>}. |
locale |
string | engine default | BCP-47 locale, e.g. en-US, fr-FR. |
timezone_id |
string | engine default | IANA timezone, e.g. UTC, Europe/Paris. |
user_agent |
string | engine default | Override the browser User-Agent. |
headers |
object | — | Extra HTTP headers for the navigation. |
actions |
object[] | [] |
Scripted interactions performed after load — see below. |
curl -X POST "https://api.scrapenest.com/api/v1/jobs" \
-H "X-API-Key: sn_live_..." \
-H "Content-Type: application/json" \
-d '{
"job_type": "standard",
"target_url": "https://example.com/dashboard",
"wait_until": "networkidle",
"viewport": {"width": 1920, "height": 1080},
"locale": "fr-FR",
"timezone_id": "Europe/Paris"
}'
Actions¶
actions is an ordered list of interactions executed after the page loads — clicks, typing, scrolling, and waits. Use them to dismiss banners, expand content, or drive a multi-step flow before capture.
"actions": [
{"type": "click", "selector": "#accept-cookies"},
{"type": "fill", "selector": "input[name=q]", "value": "widgets"},
{"type": "scroll", "direction": "down"},
{"type": "wait", "timeout_ms": 1000}
]
See Scrape a JavaScript SPA for worked examples.
Stealth & anti-blocking options (Stealth tier)¶
These apply to job_type: "stealth", the hardened browser tier for protected targets. See Anti-Blocking and Work with protected targets.
| Parameter | Type | Default | Description |
|---|---|---|---|
os_name |
string | engine default | Spoof the OS profile: windows, macos, linux. |
browser_extensions |
string[] | [] |
Pre-installed helper extensions by slug, e.g. ublock (ad/tracker blocking), isdcac (consent handling). |
proxy |
object | managed pool | Custom egress: {"server": "http://host:port", "bypass": "*.internal"}. Omit to use ScrapeNest's managed proxy pool. |
Artifact options¶
artifact_options controls which outputs a job produces and how the browser loads the page. Set only what you need — fewer artifacts means faster, cheaper jobs. See Artifacts & Extraction.
| Option | Type | Default | Tiers | Description |
|---|---|---|---|---|
include_html |
boolean | true |
all | Save the final rendered HTML. |
include_response_body |
boolean | false |
all | Save the raw HTTP response body. |
include_extraction |
boolean | false |
all | Run extraction hooks and save results as JSON. |
include_screenshot |
boolean | false |
browser | Capture a PNG screenshot. |
include_har |
boolean | false |
browser | Capture a full network HAR trace. |
include_console |
boolean | false |
browser | Capture browser console logs. |
block_images |
boolean | false |
browser | Block image loading (faster, cheaper). |
block_stylesheets |
boolean | false |
browser | Block CSS. |
block_media |
boolean | false |
browser | Block audio/video. |
block_fonts |
boolean | false |
browser | Block web fonts. |
no_scroll |
boolean | false |
browser | Disable auto-scroll before screenshot/capture. |
Response metadata (HTTP status, response headers, timing) is always recorded in the job manifest — there is no separate metadata option or artifact.
curl -X POST "https://api.scrapenest.com/api/v1/jobs" \
-H "X-API-Key: sn_live_..." \
-H "Content-Type: application/json" \
-d '{
"job_type": "stealth",
"target_url": "https://example.com",
"artifact_options": {"include_html": true, "include_screenshot": true, "block_images": true, "block_fonts": true}
}'
Extraction¶
extraction.hooks pulls structured data out of the page using CSS selectors, regular expressions, or JSONPath — no post-processing on your side. This is documented in full, with limits and the result format, in Artifacts & Extraction → Extraction hooks.
"extraction": {
"hooks": [
{"hook_id": "title", "type": "css", "selector": "h1"},
{"hook_id": "price", "type": "css", "selector": ".price", "attribute": "data-amount"},
{"hook_id": "sku", "type": "regex", "pattern": "SKU-(\\d+)", "all_matches": true}
]
}
Validation & errors¶
The API validates the payload before accepting a job. Invalid parameters return 422 with a machine-readable error code naming the offending field — for example invalid_wait_until, invalid_viewport_width, invalid_method, invalid_timeout_ms. Fix the parameter and resubmit; these are not retryable. See Errors & Retries.
Enforced constraints worth remembering:
timeout_msandnavigation_timeout_ms: 1000–300000.viewport.width: 320–7680;viewport.height: 240–4320.wait_until∈ {load,domcontentloaded,networkidle,commit}.method∈ {GET,POST,PUT,PATCH,DELETE,HEAD}.
Next steps¶
- Artifacts & Extraction — what comes back and how to read it.
- Worker Tiers — pick the right engine.
- Guides — these parameters in real recipes.