Skip to content

Python SDK

The official scrapenest Python SDK wraps the ScrapeNest API so you can submit jobs, wait for results, inspect artifacts, and download artifact bytes without writing HTTP plumbing. If you can write five lines of Python, you can scrape a page.

Prefer raw HTTP?

Everything the SDK does maps 1:1 to the REST API. Every example on this page shows the equivalent curl so you can port it to any language. See the API Reference.

Install

pip install scrapenest

Requires Python 3.10+. The only runtime dependency is httpx.

Your first scrape

Create a client with your API key, then call scrape_sync to submit a job and block until it finishes:

from scrapenest import ScrapeNestClient

    client = ScrapeNestClient(
        api_key="sn_live_...",
        base_url="https://api.scrapenest.com",
    )

    result = client.scrape_sync(
        job_type="light",
        target_url="https://example.com",
    )

    print(result.status)          # "succeeded"
    print(result.artifact_count)  # 2
    ```

=== "curl"
```bash # 1. Submit the job
curl -X POST "https://api.scrapenest.com/api/v1/jobs" \
 -H "X-API-Key: sn*live*..." \
 -H "Content-Type: application/json" \
 -d '{"job_type": "light", "target_url": "https://example.com"}'

    # 2. Poll until status is "succeeded" or "failed"
    curl "https://api.scrapenest.com/api/v1/jobs/JOB_ID?include_download_urls=true" \
      -H "X-API-Key: sn_live_..."
    ```

That's it. `scrape_sync` handles submission and polling for you and raises if the job fails.

!!! note "`base_url` is the API host"
Pass the host only  `https://api.scrapenest.com`. The SDK appends `/api/v1/...` itself. If you omit `base_url`, it defaults to production.

## Configuring the client

```python
client = ScrapeNestClient(
    api_key="sn_live_...",            # required — from Console → Developer → API Keys
    base_url="https://api.scrapenest.com",  # optional; this is the default
    timeout=30.0,                     # per-request HTTP timeout in seconds
    verify=True,                      # TLS verification; keep enabled outside local dev
)

The client holds a connection pool. Reuse one instance for the lifetime of your process, and close it when done:

client.close()

# …or use it as a context manager:
with ScrapeNestClient(api_key="sn_live_...", base_url="https://api.scrapenest.com") as client:
    result = client.scrape_sync(job_type="light", target_url="https://example.com")

Two ways to run a job

scrape_sync — submit and wait

Best for scripts and request/response workflows where you want the result inline. It submits the job, polls until completion, and returns a ScrapeResult.

result = client.scrape_sync(
    job_type="stealth",
    target_url="https://protected.example.com",
    timeout=60,                 # max seconds to wait (1–120, default 30)
    raise_on_failure=True,      # raise ScrapeJobFailed if the job fails (default)
)

ScrapeResult fields:

Field Type Description
job_id str The job identifier — use it to fetch artifacts or correlate logs.
status "succeeded" | "failed" Terminal status.
failure_reason str \| None Populated when status == "failed" (e.g. timeout, stealth_blocked).
artifact_count int Number of artifacts produced.
completed_at datetime When the job finished.

Sync waits cap at 120 seconds

scrape_sync(timeout=...) accepts up to 120 seconds. For long-running stealth jobs or high throughput, use scrape_async plus webhooks instead of holding a connection open.

create_job / scrape_async — submit and move on

Best for high throughput, long jobs, or fan-out. create_job returns immediately with a CreateJobResponse carrying the job_id; you collect the result later by polling jobs.get or by handling a job.completed webhook. scrape_async is an alias for the same behavior; it is not an async/await coroutine.

created = client.create_job(
job_type="light",
target_url="https://example.com",
tags=["batch:nightly"],
)
print(created.job_id, created.status) # "...", "queued"

    # Later — fetch the full job and its artifacts:
    job = client.jobs.get(created.job_id)
    print(job.status)                        # "queued" | "running" | "succeeded" | "failed"
    for artifact in job.artifacts:
        print(artifact.artifact_type, artifact.artifact_id)
    ```

=== "curl"
```bash
curl -X POST "https://api.scrapenest.com/api/v1/jobs" \
 -H "X-API-Key: sn*live*..." \
 -H "Content-Type: application/json" \
 -d '{"job_type": "light", "target_url": "https://example.com", "tags": ["batch:nightly"]}'

    curl "https://api.scrapenest.com/api/v1/jobs/JOB_ID" \
      -H "X-API-Key: sn_live_..."
    ```

## Passing job options

Any keyword argument you pass to `scrape_sync`, `scrape_async`, or `jobs.create` is sent straight through as a job parameter. This is how you reach the full power of the platform  rendering controls, screenshots, proxies, and built-in extraction:

```python
result = client.scrape_sync(
    job_type="stealth",
    target_url="https://example.com/listings",
    os_name="macos",
    wait_until="networkidle",
    viewport={"width": 1920, "height": 1080},
    artifact_options={
        "include_html": True,
        "include_screenshot": True,
    },
    extraction={
        "hooks": [
            {"hook_id": "title", "type": "css", "selector": "h1"},
            {"hook_id": "prices", "type": "css", "selector": ".price", "all_matches": True},
        ]
    },
)

See the Job Parameters reference for every accepted option and which worker tier supports it.

Reading artifacts

A finished job produces one or more artifacts (HTML, screenshot, extracted JSON, metadata). Fetch the job to list them, then use the artifact helper to download bytes or text:

job = client.jobs.get(result.job_id)

    html_artifact = next(a for a in job.artifacts if a.artifact_type == "html")
    html = client.artifacts.download_text(html_artifact.artifact_id)
    ```

=== "curl"
```bash # 1. Get the presigned URL
curl "https://api.scrapenest.com/api/v1/artifacts/ARTIFACT_ID/download" \
 -H "X-API-Key: sn*live*..." # → {"download_url": "https://...", "expires_at": "..."}

    # 2. Download the bytes
    curl -L "PRESIGNED_DOWNLOAD_URL" -o result.html
    ```

!!! note "Presigned URLs are short-lived"
Download URLs expire (default ~15 minutes). Request a fresh one each time you need the bytes; never cache the URL itself. You can also receive a ready-to-use URL on the [`artifact.ready` webhook](../webhooks/events.md).

### Artifact helper methods

```python
download = client.artifacts.get_download_url("ARTIFACT_ID", ttl_seconds=600)
content = client.artifacts.download_bytes("ARTIFACT_ID")
text = client.artifacts.download_text("ARTIFACT_ID")

download_bytes and download_text first request a presigned URL from the API, then fetch the artifact from object storage.

Error handling

The SDK raises typed exceptions you can catch precisely:

from scrapenest import (
    ScrapeJobFailed,
    ScrapeJobTimeout,
    ScrapeNestAPIError,
)

try:
    result = client.scrape_sync(job_type="stealth", target_url="https://example.com", timeout=60)
except ScrapeJobFailed as e:
    print("Target blocked or job failed:", e.failure_reason)
except ScrapeJobTimeout as e:
    print("Still running — collect later via webhook. job_id:", e.job_id)
except ScrapeNestAPIError as e:
    print("API rejected the request:", e.status_code, e.body)

See Errors & Retries for the full exception hierarchy, status-code mapping, idempotency, and retry guidance.

Next steps

  • Job Parameters — every option you can pass.
  • Guides — copy-paste recipes for common tasks.
  • Webhooks — the recommended pattern for scrape_async at scale.
  • Errors & Retries — handle failures and rate limits cleanly.