Getting Started with Webhooks¶
Webhooks allow ScrapeNest to notify your application when a scraping job is completed or fails, eliminating the need for you to constantly poll the API.
How it Works¶
- You provide a URL (e.g.,
https://api.yoursite.com/webhooks/scrapenest). - We send an HTTP POST request to that URL whenever an event occurs.
- You process the payload to retrieve data or update your system.
Setting Up¶
1. Create an Endpoint¶
You can register a webhook endpoint via the Dashboard or the API.
Dashboard:
- Go to Developers > Webhooks.
- Click Add Endpoint.
- Enter your URL and select the events you want to subscribe to (e.g.,
job.completed).
API:
curl -X POST "https://api.scrapenest.com/api/v1/webhook-endpoints" \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://api.yoursite.com/webhooks",
"enabled_events": ["job.completed", "artifact.ready"]
}'
2. Save your Secret¶
When you create an endpoint, we will display a Signing Secret (starts with whsec_).
Save this immediately. You will use it to verify signatures.
Delivery Semantics¶
- At-Least-Once Delivery: We guarantee that we will attempt to deliver the event. In rare cases, you might receive the same event twice. Use the
event_idto handle duplicates (idempotency). - Retries: If your server returns a non-200 status code or times out, we will retry delivery with exponential backoff (up to 3 days).
- Timeouts: Your server must respond within 10 seconds. If processing takes longer, verify the signature, queue the work asynchronously, and respond with
202 Acceptedimmediately.
Enterprise Reliability¶
ScrapeNest's webhook infrastructure is built for high availability and strict security.
- Source IP Verification: All webhook requests originate from our static IP ranges. You can allowlist these IPs in your firewall to ensure only ScrapeNest can reach your endpoint.
- Payload Signing: Every request includes a
Svix-Signatureheader. Verifying this signature is mandatory for production environments to prevent replay attacks. - Temporal Orchestration: Webhook delivery is orchestrated via Temporal, ensuring that retries are managed deterministically even during partial system outages.
High-Volume Processing¶
For organizations processing thousands of jobs per hour, we recommend the following best practices:
- Acknowledge Immediately: Return a
202 Acceptedresponse as soon as you have validated the signature. Do not perform heavy processing (like downloading large artifacts) synchronously within the webhook request. - Use an Internal Queue: Push the webhook payload into an internal queue (e.g., RabbitMQ, SQS, Redis) for asynchronous processing.
- Idempotency: Always check the
event_idagainst your local database before processing to handle the At-Least-Once Delivery guarantee. - Bulk Management: If you expect extreme bursts, consider using our API to poll for events in batches as a fallback.