Webhooks Guide
This document explains how DeDuplica securely delivers webhooks and how customers can verify that incoming webhook requests are authentic, untampered, and recent.
The approach described here is industry-standard and aligns with patterns used by platforms such as Stripe, GitHub, and Slack.
1. How DeDuplica Secures Webhooks
Every webhook sent by DeDuplica is protected using HMAC-SHA256 signatures.
This ensures that:
- The webhook was sent by DeDuplica
- The payload has not been modified in transit
- Old or replayed requests can be safely rejected
Each webhook subscription is associated with a unique secret, shared only between DeDuplica and your system.
2. High-Level Flow
- DeDuplica generates a webhook payload
- The payload is signed using your webhook secret
- DeDuplica sends the webhook to your endpoint
- Your endpoint recomputes the signature and validates it
- If valid, your system processes the webhook
3. Webhook Payload Structure
DeDuplica sends the business payload directly as the HTTP request body. There is no wrapper or envelope object.
The JSON body follows this schema:
public class WebhookPayload
{
public string DuplicateId { get; set; }
public DateTime DateIssued { get; set; }
public DateTime DateFound { get; set; }
public string SubscriptionExternalId { get; set; }
public string JobExternalId { get; set; }
public string JobExecutionExternalId { get; set; }
public string TableName { get; set; }
public string BaseRecordId { get; set; }
public int ClusterSize { get; set; }
public double MaxProbability { get; set; }
public List<SubordinateRecord> Subordinates { get; set; }
public string MergeOutputJson { get; set; }
}
public class SubordinateRecord
{
public string RecordId { get; set; }
public string RecordJson { get; set; }
public double Probability { get; set; }
}Field Summary
| Field | Description |
|---|---|
| DuplicateId | Unique identifier of the detected duplicate cluster |
| DateIssued | When the webhook was issued by DeDuplica (UTC) |
| DateFound | When the duplicate was detected (UTC) |
| SubscriptionExternalId | Customer-defined subscription identifier |
| JobExternalId | Customer-defined job identifier |
| JobExecutionExternalId | Customer-defined job execution identifier |
| TableName | Source table where the duplicate was detected |
| BaseRecordId | Identifier of the base (master) record in the cluster |
| ClusterSize | Total number of records in the cluster (base + all subordinates) |
| MaxProbability | The highest match probability across all edges in the cluster (0.0 – 1.0) |
| Subordinates | Array of subordinate records, sorted by probability descending. Each entry has RecordId, RecordJson, and Probability |
| Subordinates[].RecordId | ID of this subordinate record |
| Subordinates[].RecordJson | Full field data of the subordinate record as JSON string |
| Subordinates[].Probability | Path probability from this subordinate to the base record (weakest-link across the cluster graph, 0.0 – 1.0) |
| MergeOutputJson | JSON string with the computed merged field values. See MergeOutputJson below. |
Important Notes
- All timestamps are sent in UTC (ISO‑8601) format
Subordinates[].RecordJsonentries may contain large JSON documents- The exact raw HTTP request body is used for signature verification
MergeOutputJson
MergeOutputJson is a JSON string (not an inline JSON object). You must parse it as a second JSON decode step to access individual field values. Each key corresponds to a field configured in the job’s merge rules; the value is the result of the configured merge strategy applied across all cluster members.
MergeOutputJson is null if no merge fields are configured on the job.
CLIENT_ENCRYPTION_KEY: If the Local Agent that produced this duplicate has
CLIENT_ENCRYPTION_KEYconfigured,MergeOutputJsonwill arrive encrypted. You must decrypt it using the sameCLIENT_ENCRYPTION_KEYbefore parsing. See Local Agent — Client-Controlled Encryption for key format details.
Do not changeCLIENT_ENCRYPTION_KEYafter duplicates have been stored — you will lose the ability to decrypt existing records.
4. Signature Headers
Every webhook POST includes two custom headers:
| Header | Value |
|---|---|
x-webhook-signature | sha256=<hex> — HMAC-SHA256 of the signed message, lowercase hex |
x-webhook-timestamp | Unix epoch seconds (integer as string) — when the webhook was sent |
Signed message format: {timestamp}.{raw_body} (UTF-8 encoded)
Key: your shared secret, base64-decoded to raw bytes
Algorithm: HMAC-SHA256, output as lowercase hex
Always verify the signature before parsing or acting on the body. Reject requests where the timestamp differs from the current time by more than 5 minutes (replay attack prevention).
5. Signature Generation (Conceptual)
DeDuplica computes the signature using the following formula:
message = "{timestamp}.{payload}"
signature = HMAC_SHA256(secret, message)The signature is sent as a hex-encoded string, optionally prefixed with:
sha256=<signature>Your system must independently compute the same signature and compare it.
6. Webhook Secret
- Each subscription has its own unique secret
- Secrets are Base64-encoded for safe storage
- Secrets must be kept confidential
When validating a webhook, always Base64-decode the secret before using it.
7. Validation Rules (Required)
Your webhook endpoint should accept a webhook only if all checks pass:
- The computed signature matches the provided signature
- The timestamp is within an acceptable time window (recommended: ±5 minutes)
- The comparison is done using a constant-time method
If any check fails, respond with HTTP 401 Unauthorized.
8. Minimal Azure Function Examples
The following examples show minimal, production-safe webhook handlers for common Azure Function runtimes.
All examples assume:
- Your webhook secret is stored in an environment variable:
WEBHOOK_SECRET - A successful validation returns HTTP 200
8.1 Azure Function – Node.js
const crypto = require("crypto");
const MAX_TIMESTAMP_AGE_SECONDS = 300; // reject webhooks older than 5 minutes
module.exports = async function (context, req) {
const secret = Buffer.from(process.env.WEBHOOK_SECRET, "base64");
const signature = req.headers["x-webhook-signature"] || "";
const timestamp = req.headers["x-webhook-timestamp"];
// Reject stale requests (replay attack prevention)
const webhookTime = parseInt(timestamp, 10);
if (!webhookTime || Math.abs(Date.now() / 1000 - webhookTime) > MAX_TIMESTAMP_AGE_SECONDS) {
context.res = { status: 401 };
return;
}
// Use req.rawBody — the exact bytes received — NOT JSON.stringify(req.body).
// Re-serialising the parsed object can change key order or whitespace and
// will cause signature verification to fail.
const rawBody = req.rawBody;
const message = `${timestamp}.${rawBody}`;
const expected = crypto
.createHmac("sha256", secret)
.update(message, "utf8")
.digest("hex");
const received = signature.replace("sha256=", "");
const isValid = crypto.timingSafeEqual(
Buffer.from(expected),
Buffer.from(received)
);
if (!isValid) {
context.res = { status: 401 };
return;
}
// ✅ Safe to parse and use the payload after verification
const webhook = req.body;
context.log(`Duplicate cluster detected: ${webhook.DuplicateId}`);
context.log(`Cluster size: ${webhook.ClusterSize}, max probability: ${webhook.MaxProbability}`);
for (const sub of webhook.Subordinates) {
context.log(` Subordinate: ${sub.RecordId} (probability: ${sub.Probability})`);
}
context.res = { status: 200 };
};8.2 Azure Function – .NET 8 (Isolated)
using System.Security.Cryptography;
using System.Text;
using Microsoft.Azure.Functions.Worker;
using Microsoft.Azure.Functions.Worker.Http;
public class DeduplicaWebhook
{
[Function("DeduplicaWebhook")]
public static async Task<HttpResponseData> Run(
[HttpTrigger(AuthorizationLevel.Function, "post")] HttpRequestData req)
{
var rawBody = await new StreamReader(req.Body).ReadToEndAsync();
var timestamp = req.Headers.GetValues("X-Webhook-Timestamp").First();
var signature = req.Headers.GetValues("X-Webhook-Signature").First();
var secret = Convert.FromBase64String(
Environment.GetEnvironmentVariable("WEBHOOK_SECRET")!
);
var message = $"{timestamp}.{rawBody}";
using var hmac = new HMACSHA256(secret);
var expected = Convert.ToHexString(
hmac.ComputeHash(Encoding.UTF8.GetBytes(message))
).ToLowerInvariant();
var received = signature.Replace("sha256=", "");
var isValid = CryptographicOperations.FixedTimeEquals(
Encoding.UTF8.GetBytes(expected),
Encoding.UTF8.GetBytes(received)
);
if (!isValid)
{
return req.CreateResponse(401);
}
// ✅ Safe to deserialize after verification
var webhook = System.Text.Json.JsonSerializer.Deserialize<WebhookPayload>(rawBody)!;
Console.WriteLine($"Duplicate cluster detected: {webhook.DuplicateId}");
Console.WriteLine($"Cluster size: {webhook.ClusterSize}, max probability: {webhook.MaxProbability}");
foreach (var sub in webhook.Subordinates)
{
Console.WriteLine($" Subordinate: {sub.RecordId} (probability: {sub.Probability})");
}
return req.CreateResponse(200);
}
}8.3 Azure Function – Python
import os
import hmac
import hashlib
import base64
import json
import logging
import time
import azure.functions as func
_MAX_TIMESTAMP_AGE_SECONDS = 300 # reject webhooks older than 5 minutes
app = func.FunctionApp()
@app.function_name(name="WebhookListener")
@app.route(route="webhook", auth_level=func.AuthLevel.ANONYMOUS, methods=["POST"])
def webhook_listener(req: func.HttpRequest) -> func.HttpResponse:
signature_header = req.headers.get("x-webhook-signature", "")
timestamp_header = req.headers.get("x-webhook-timestamp", "")
# Reject stale requests (replay attack prevention)
try:
webhook_time = int(timestamp_header)
except (ValueError, TypeError):
return func.HttpResponse(status_code=401)
if abs(time.time() - webhook_time) > _MAX_TIMESTAMP_AGE_SECONDS:
return func.HttpResponse(status_code=401)
raw_body = req.get_body().decode("utf-8")
# WEBHOOK_SECRET is Base64-encoded; decode to raw bytes before use as HMAC key.
secret = base64.b64decode(os.environ["WEBHOOK_SECRET"])
message = f"{timestamp_header}.{raw_body}".encode("utf-8")
expected = hmac.new(secret, message, hashlib.sha256).hexdigest()
received = signature_header.replace("sha256=", "")
if not hmac.compare_digest(expected, received):
return func.HttpResponse(status_code=401)
# ✅ Safe to parse after verification
webhook = json.loads(raw_body)
duplicate_id = webhook.get("DuplicateId")
cluster_size = webhook.get("ClusterSize")
max_probability = webhook.get("MaxProbability")
base_record_id = webhook.get("BaseRecordId")
subordinates = webhook.get("Subordinates") or []
# MergeOutputJson is a JSON string — parse it as a second step.
# If CLIENT_ENCRYPTION_KEY is set on the local agent it will be encrypted;
# decrypt it with the same key before parsing.
merge_output = None
merge_output_json = webhook.get("MergeOutputJson")
if merge_output_json:
try:
merge_output = json.loads(merge_output_json)
except Exception:
merge_output = merge_output_json # encrypted — handle decryption here
logging.info(f"Duplicate cluster: {duplicate_id}, size={cluster_size}, maxProb={max_probability}")
logging.info(f"Base record: {base_record_id}")
for i, sub in enumerate(subordinates):
logging.info(f" Subordinate[{i}] RecordId={sub.get('RecordId')} Probability={sub.get('Probability')}")
if merge_output:
logging.info(f"Merge output: {json.dumps(merge_output) if isinstance(merge_output, dict) else merge_output}")
# TODO: implement your business logic here
return func.HttpResponse(status_code=200)Webhook Delivery Timing and Retries
- Timeout: Each webhook call has a maximum timeout of 2 minutes. If your server does not respond within this time, the attempt is considered failed. Accept the webhook call as soon as possible and handle processing on the consumer side with appropriate error handling.
- Success Response: To confirm successful processing, your endpoint should return any HTTP status code in the 2xx range (such as 200 OK or 204 No Content).
- Retries: If a webhook delivery fails (timeout or non-2xx response), DeDuplica will automatically retry up to 5 times.
- Exponential Backoff: Retries use increasing wait times between attempts, starting at 10 seconds and growing up to 30 minutes (e.g., 10s, 20s, 1m, 5m, 30m).
This approach helps ensure your system receives important webhook notifications, even if your server is temporarily unavailable. If all retries fail, the webhook will be marked as undelivered in your logs.