Skip to content

Scrape a JavaScript SPA

Goal: scrape a single-page app where content is rendered client-side after the initial HTML loads.

A light (HTTP) job only sees the empty shell of a SPA. Use a browser tier (standard or stealth) and tell ScrapeNest when the page is ready with wait_until, then optionally drive it with actions.

Minimal example

Wait until the network goes idle (all XHR/fetch finished), then capture the rendered HTML:

from scrapenest import ScrapeNestClient

client = ScrapeNestClient(api_key="sn_live_...", base_url="https://api.scrapenest.com")

result = client.scrape_sync(
    job_type="standard",
    target_url="https://example.com/app",
    wait_until="networkidle",
)
curl -X POST "https://api.scrapenest.com/api/v1/jobs" \
  -H "X-API-Key: sn_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "job_type": "standard",
    "target_url": "https://example.com/app",
    "wait_until": "networkidle"
  }'

Choosing wait_until

Value Ready when Use for
commit The server response starts. Fastest; rarely enough for SPAs.
domcontentloaded Initial HTML parsed. Static/server-rendered pages.
load All initial resources loaded (default). Most pages.
networkidle No network activity for a short window. SPAs and lazy-loaded content.

If networkidle never settles (e.g. polling/websockets keep the network busy), fall back to load and add an explicit wait action.

Interacting before capture

Use actions to dismiss banners, expand sections, search, or click through steps before the page is captured:

client.scrape_sync(
    job_type="standard",
    target_url="https://example.com/app",
    wait_until="networkidle",
    actions=[
        {"type": "click", "selector": "#accept-cookies"},
        {"type": "fill", "selector": "input[name=q]", "value": "widgets"},
        {"type": "click", "selector": "button[type=submit]"},
        {"type": "wait", "timeout_ms": 1500},
        {"type": "scroll", "direction": "down"},
    ],
    artifact_options={"include_html": True, "include_screenshot": True},
)
curl -X POST "https://api.scrapenest.com/api/v1/jobs" \
  -H "X-API-Key: sn_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "job_type": "standard",
    "target_url": "https://example.com/app",
    "wait_until": "networkidle",
    "actions": [
      {"type": "click", "selector": "#accept-cookies"},
      {"type": "fill", "selector": "input[name=q]", "value": "widgets"},
      {"type": "click", "selector": "button[type=submit]"},
      {"type": "wait", "timeout_ms": 1500},
      {"type": "scroll", "direction": "down"}
    ],
    "artifact_options": {"include_html": true, "include_screenshot": true}
  }'

Extract directly

Combine rendering with extraction hooks to get JSON straight out of a rendered SPA:

client.scrape_sync(
    job_type="standard",
    target_url="https://example.com/app",
    wait_until="networkidle",
    artifact_options={"include_extraction": True},
    extraction={"hooks": [{"hook_id": "items", "type": "css", "selector": ".result", "all_matches": True}]},
)

Troubleshooting

  • Content still missing? The data may load only after interaction — add a click/scroll action and a short wait.
  • navigation_timeout? Raise navigation_timeout_ms, or loosen wait_until from networkidle to load.
  • Blocked or challenged? Escalate to stealth — see Work with protected targets.

See also