Data Lifecycle & Retention¶
Managing the lifecycle of your scraping data is essential for both compliance (GDPR) and cost efficiency. ScrapeNest provides a robust framework for controlling how long your artifacts are stored.
Artifact Management¶
Every scraping job produces Artifacts (HTML, screenshots, logs, metadata) which are stored in our secure, multi-tenant object storage.
- Content Integrity: Each artifact is hashed (SHA-256) upon creation.
- Access Control: Artifacts are never public. Access is granted via short-lived, presigned URLs authorized by your API key or session.
Retention Policies¶
By default, artifacts are retained for 30 days before being permanently purged. You can customize this policy at the organization level to align with your internal data retention standards.
Configuring Retention¶
Retention is managed at the organization level. Set the default retention period in Organization Settings → Retention, or via the API:
# View the current policy
curl "https://api.scrapenest.com/api/v1/org/retention-policy" \
-H "X-API-Key: YOUR_KEY"
# Update retention to 90 days
curl -X PUT "https://api.scrapenest.com/api/v1/org/retention-policy" \
-H "X-API-Key: YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"retention_days": 90, "legal_hold_enabled": true}'
The response includes max_retention_days (the ceiling allowed by your plan) and a policy_version that increments on each change. Changes are audit-logged. To preserve specific data beyond the policy, use a Legal Hold (below).
Legal Holds¶
For sensitive data that must be preserved for compliance or legal reasons, you can apply a Legal Hold.
- Prevention of Purge: Any artifact or job with an active legal hold cannot be deleted, even if it exceeds the retention policy.
- Granular Control: Scope a hold to the whole organization (
org), a single job (job), or an individual artifact (artifact). - Audit Requirement: Creating or releasing a legal hold requires a justification that is recorded in your organization's audit log.
API Example: Create a Legal Hold¶
curl -X POST "https://api.scrapenest.com/api/v1/org/retention-holds" \
-H "X-API-Key: YOUR_KEY" \
-d '{
"scope_type": "job",
"scope_ref": "3d7d1e6e-2b8e-47c2-8bbd-9c2a1a3f9b10",
"justification": "Compliance audit req #456"
}'
Secure Data Purge¶
When the retention period expires or a deletion request (DSAR) is processed:
- Metadata Soft-Delete: The job and artifact metadata are marked as deleted and removed from API results.
- Object Storage Purge: The raw files are permanently removed from our primary object storage.
- Audit Trail: The purge event is recorded in our internal system logs to prove compliance.