Skip to content

Getting Started with Webhooks

Webhooks allow ScrapeNest to notify your application when a scraping job is completed or fails, eliminating the need for you to constantly poll the API.

How it Works

  1. You provide a URL (e.g., https://api.yoursite.com/webhooks/scrapenest).
  2. We send an HTTP POST request to that URL whenever an event occurs.
  3. You process the payload to retrieve data or update your system.

Setting Up

1. Create an Endpoint

You can register a webhook endpoint via the Dashboard or the API.

Dashboard:

  1. Go to Developers > Webhooks.
  2. Click Add Endpoint.
  3. Enter your URL and select the events you want to subscribe to (e.g., job.completed).

API:

curl -X POST "https://api.scrapenest.com/api/v1/webhook-endpoints" \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://api.yoursite.com/webhooks",
    "enabled_events": ["job.completed", "artifact.ready"]
}'

2. Save your Secret

When you create an endpoint, we will display a Signing Secret (starts with whsec_).

Save this immediately. You will use it to verify signatures.

Delivery Semantics

  • At-Least-Once Delivery: We guarantee that we will attempt to deliver the event. In rare cases, you might receive the same event twice. Use the event_id to handle duplicates (idempotency).
  • Retries: If your server returns a non-200 status code or times out, we will retry delivery with exponential backoff (up to 3 days).
  • Timeouts: Your server must respond within 10 seconds. If processing takes longer, verify the signature, queue the work asynchronously, and respond with 202 Accepted immediately.

Enterprise Reliability

ScrapeNest's webhook infrastructure is built for high availability and strict security.

  • Source IP Verification: All webhook requests originate from our static IP ranges. You can allowlist these IPs in your firewall to ensure only ScrapeNest can reach your endpoint.
  • Payload Signing: Every request includes a Svix-Signature header. Verifying this signature is mandatory for production environments to prevent replay attacks.
  • Temporal Orchestration: Webhook delivery is orchestrated via Temporal, ensuring that retries are managed deterministically even during partial system outages.

High-Volume Processing

For organizations processing thousands of jobs per hour, we recommend the following best practices:

  1. Acknowledge Immediately: Return a 202 Accepted response as soon as you have validated the signature. Do not perform heavy processing (like downloading large artifacts) synchronously within the webhook request.
  2. Use an Internal Queue: Push the webhook payload into an internal queue (e.g., RabbitMQ, SQS, Redis) for asynchronous processing.
  3. Idempotency: Always check the event_id against your local database before processing to handle the At-Least-Once Delivery guarantee.
  4. Bulk Management: If you expect extreme bursts, consider using our API to poll for events in batches as a fallback.