Python SDK¶
The official scrapenest Python SDK wraps the ScrapeNest API so you can submit jobs, wait for results, inspect artifacts, and download artifact bytes without writing HTTP plumbing. If you can write five lines of Python, you can scrape a page.
Prefer raw HTTP?
Everything the SDK does maps 1:1 to the REST API. Every example on this page shows the equivalent curl so you can port it to any language. See the API Reference.
Install¶
Requires Python 3.10+. The only runtime dependency is httpx.
Your first scrape¶
Create a client with your API key, then call scrape_sync to submit a job and block until it finishes:
from scrapenest import ScrapeNestClient
client = ScrapeNestClient(
api_key="sn_live_...",
base_url="https://api.scrapenest.com",
)
result = client.scrape_sync(
job_type="light",
target_url="https://example.com",
)
print(result.status) # "succeeded"
print(result.artifact_count) # 2
```
=== "curl"
```bash # 1. Submit the job
curl -X POST "https://api.scrapenest.com/api/v1/jobs" \
-H "X-API-Key: sn*live*..." \
-H "Content-Type: application/json" \
-d '{"job_type": "light", "target_url": "https://example.com"}'
# 2. Poll until status is "succeeded" or "failed"
curl "https://api.scrapenest.com/api/v1/jobs/JOB_ID?include_download_urls=true" \
-H "X-API-Key: sn_live_..."
```
That's it. `scrape_sync` handles submission and polling for you and raises if the job fails.
!!! note "`base_url` is the API host"
Pass the host only — `https://api.scrapenest.com`. The SDK appends `/api/v1/...` itself. If you omit `base_url`, it defaults to production.
## Configuring the client
```python
client = ScrapeNestClient(
api_key="sn_live_...", # required — from Console → Developer → API Keys
base_url="https://api.scrapenest.com", # optional; this is the default
timeout=30.0, # per-request HTTP timeout in seconds
verify=True, # TLS verification; keep enabled outside local dev
)
The client holds a connection pool. Reuse one instance for the lifetime of your process, and close it when done:
client.close()
# …or use it as a context manager:
with ScrapeNestClient(api_key="sn_live_...", base_url="https://api.scrapenest.com") as client:
result = client.scrape_sync(job_type="light", target_url="https://example.com")
Two ways to run a job¶
scrape_sync — submit and wait¶
Best for scripts and request/response workflows where you want the result inline. It submits the job, polls until completion, and returns a ScrapeResult.
result = client.scrape_sync(
job_type="stealth",
target_url="https://protected.example.com",
timeout=60, # max seconds to wait (1–120, default 30)
raise_on_failure=True, # raise ScrapeJobFailed if the job fails (default)
)
ScrapeResult fields:
| Field | Type | Description |
|---|---|---|
job_id |
str |
The job identifier — use it to fetch artifacts or correlate logs. |
status |
"succeeded" | "failed" |
Terminal status. |
failure_reason |
str \| None |
Populated when status == "failed" (e.g. timeout, stealth_blocked). |
artifact_count |
int |
Number of artifacts produced. |
completed_at |
datetime |
When the job finished. |
Sync waits cap at 120 seconds
scrape_sync(timeout=...) accepts up to 120 seconds. For long-running stealth jobs or high throughput, use scrape_async plus webhooks instead of holding a connection open.
create_job / scrape_async — submit and move on¶
Best for high throughput, long jobs, or fan-out. create_job returns immediately with a CreateJobResponse carrying the job_id; you collect the result later by polling jobs.get or by handling a job.completed webhook. scrape_async is an alias for the same behavior; it is not an async/await coroutine.
created = client.create_job(
job_type="light",
target_url="https://example.com",
tags=["batch:nightly"],
)
print(created.job_id, created.status) # "...", "queued"
# Later — fetch the full job and its artifacts:
job = client.jobs.get(created.job_id)
print(job.status) # "queued" | "running" | "succeeded" | "failed"
for artifact in job.artifacts:
print(artifact.artifact_type, artifact.artifact_id)
```
=== "curl"
```bash
curl -X POST "https://api.scrapenest.com/api/v1/jobs" \
-H "X-API-Key: sn*live*..." \
-H "Content-Type: application/json" \
-d '{"job_type": "light", "target_url": "https://example.com", "tags": ["batch:nightly"]}'
curl "https://api.scrapenest.com/api/v1/jobs/JOB_ID" \
-H "X-API-Key: sn_live_..."
```
## Passing job options
Any keyword argument you pass to `scrape_sync`, `scrape_async`, or `jobs.create` is sent straight through as a job parameter. This is how you reach the full power of the platform — rendering controls, screenshots, proxies, and built-in extraction:
```python
result = client.scrape_sync(
job_type="stealth",
target_url="https://example.com/listings",
os_name="macos",
wait_until="networkidle",
viewport={"width": 1920, "height": 1080},
artifact_options={
"include_html": True,
"include_screenshot": True,
},
extraction={
"hooks": [
{"hook_id": "title", "type": "css", "selector": "h1"},
{"hook_id": "prices", "type": "css", "selector": ".price", "all_matches": True},
]
},
)
See the Job Parameters reference for every accepted option and which worker tier supports it.
Reading artifacts¶
A finished job produces one or more artifacts (HTML, screenshot, extracted JSON, metadata). Fetch the job to list them, then use the artifact helper to download bytes or text:
job = client.jobs.get(result.job_id)
html_artifact = next(a for a in job.artifacts if a.artifact_type == "html")
html = client.artifacts.download_text(html_artifact.artifact_id)
```
=== "curl"
```bash # 1. Get the presigned URL
curl "https://api.scrapenest.com/api/v1/artifacts/ARTIFACT_ID/download" \
-H "X-API-Key: sn*live*..." # → {"download_url": "https://...", "expires_at": "..."}
# 2. Download the bytes
curl -L "PRESIGNED_DOWNLOAD_URL" -o result.html
```
!!! note "Presigned URLs are short-lived"
Download URLs expire (default ~15 minutes). Request a fresh one each time you need the bytes; never cache the URL itself. You can also receive a ready-to-use URL on the [`artifact.ready` webhook](../webhooks/events.md).
### Artifact helper methods
```python
download = client.artifacts.get_download_url("ARTIFACT_ID", ttl_seconds=600)
content = client.artifacts.download_bytes("ARTIFACT_ID")
text = client.artifacts.download_text("ARTIFACT_ID")
download_bytes and download_text first request a presigned URL from the API, then fetch the artifact from object storage.
Error handling¶
The SDK raises typed exceptions you can catch precisely:
from scrapenest import (
ScrapeJobFailed,
ScrapeJobTimeout,
ScrapeNestAPIError,
)
try:
result = client.scrape_sync(job_type="stealth", target_url="https://example.com", timeout=60)
except ScrapeJobFailed as e:
print("Target blocked or job failed:", e.failure_reason)
except ScrapeJobTimeout as e:
print("Still running — collect later via webhook. job_id:", e.job_id)
except ScrapeNestAPIError as e:
print("API rejected the request:", e.status_code, e.body)
See Errors & Retries for the full exception hierarchy, status-code mapping, idempotency, and retry guidance.
Next steps¶
- Job Parameters — every option you can pass.
- Guides — copy-paste recipes for common tasks.
- Webhooks — the recommended pattern for
scrape_asyncat scale. - Errors & Retries — handle failures and rate limits cleanly.