Image: Data engineering, code, and reporting systems

Data Engineering8 min read23 April 2026

Webhook Data Validation: How to Stop Bad Data from Polluting Your CRM

Practical patterns for validating webhook payloads before they hit your CRM - required fields, format checks, deduplication, rate limiting - with real examples in Make.com and n8n.

HAROON MOHAMED

Automation, CRM, and full-stack systems

Author

Verification note: This post was re-reviewed in May 2026. Public tool pricing, compliance rules, and platform capabilities should be checked against the source list at the end before making budget, legal, or deployment decisions. Private client metrics are not published unless they are safe, public, and verifiable.

Why webhook validation matters

Webhooks are the connective tissue of modern automation stacks. A form submits -> webhook fires -> data flows to your CRM. A call ends -> webhook fires -> outcome updates your pipeline.

Without validation, every webhook becomes a potential vector for bad data:

A form with no email address creates a ghost contact
A VAPI webhook with malformed fields breaks your workflow
A Stripe webhook received twice creates a duplicate deal
A malicious POST fills your CRM with junk

Validation is the checkpoint between "data exists" and "data enters your system."

The 5 validation layers

Layer 1: Structure validation

Does the payload match the expected shape?

Check:

Required fields are present (email, phone, or whatever's minimum)
Fields are of expected types (string, number, boolean)
Nested objects/arrays exist if expected

Fail behavior: Reject with a 400 status code. Don't log as success.

Layer 2: Format validation

Do fields have valid values?

Check:

Email matches regex (/^[^\s@]+@[^\s@]+\.[^\s@]+$/)
Phone is parseable as E.164
Dates are valid ISO 8601
URLs are well-formed

Fail behavior: Route to an error queue. Log for human review.

Layer 3: Business logic validation

Does the data make sense for your business?

Check:

Budget value is within realistic range
Deal amount isn't negative
Timeline values are from expected set
Source tag is from canonical list

Fail behavior: Accept but flag with a "review" tag.

Layer 4: Deduplication

Is this data already in the system?

Check:

Normalized email or phone matches existing contact
Same event ID already processed (idempotency)
Same form submission within dedup window (e.g., 5 minutes)

Fail behavior: Update existing record instead of creating duplicate.

Layer 5: Security validation

Is the request actually from the expected source?

Check:

Signature header matches expected HMAC (for providers that sign webhooks - Stripe, Shopify, GitHub)
IP whitelist (if provider publishes allowed IPs)
Shared secret in header or query param

Fail behavior: Reject with 401 Unauthorized. Log attempt.

Implementation: Make.com

Basic structure validation

At the top of the webhook scenario, add a filter module that checks required fields:

email is not empty AND email contains @ AND phone is not empty

If false, route to error branch.

Email regex validation

Use a filter with:

email matches pattern ^[^\s@]+@[^\s@]+\.[^\s@]+$

Deduplication via HubSpot/GHL lookup

Before creating a contact:

Search existing contacts by email (or normalized phone)
If found -> update instead of create
If not found -> create

Security via shared secret

Most webhook providers let you set a secret query parameter. In Make:

Add a filter: _{query.secret}_ equals "YOUR-SECRET-HERE"
If not, reject

For signed webhooks (Stripe, Shopify):

Extract signature from header
Compute HMAC using your secret
Compare - if mismatch, reject

Implementation: n8n

Webhook node validation

Start with a Webhook trigger node. Immediately after, add a Code node with validation logic:

const { email, phone, name } = $input.item.json;

const errors = [];

// Required fields
if (!email && !phone) {
  errors.push('Must have email or phone');
}

// Email format
if (email && !/^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email)) {
  errors.push('Invalid email format');
}

// Phone format (strict E.164)
if (phone && !/^\+\d{10,15}$/.test(phone)) {
  errors.push('Invalid phone format');
}

if (errors.length > 0) {
  return { json: { valid: false, errors } };
}

return { json: { valid: true, data: { email, phone, name } } };

Then an IF node to branch on valid === true.

Signature verification (Stripe example)

const crypto = require('crypto');
const signature = $input.item.headers['stripe-signature'];
const payload = JSON.stringify($input.item.body);
const secret = 'your-stripe-webhook-secret';

const expectedSig = crypto
  .createHmac('sha256', secret)
  .update(payload, 'utf8')
  .digest('hex');

const valid = signature && signature.includes(expectedSig);

return { json: { valid } };

Deduplication patterns

Pattern 1: Email/phone lookup before create

For every incoming lead:

Normalize email (lowercase, trim)
Normalize phone (E.164)
Lookup contact by normalized email
If found -> update fields (merge strategy: newer wins)
If not found -> create new contact

Pattern 2: Event ID idempotency

For events that might be delivered twice (Stripe, Shopify retry failed webhooks):

Extract event ID from payload
Check if we've processed this event ID before (store in Data Store or database)
If yes -> skip (return 200 OK to prevent retry)
If no -> process and record event ID

Pattern 3: Time-window dedup

For forms where users might accidentally double-submit:

Check if same email submitted in last 5 minutes
If yes -> treat as duplicate, don't create new contact or trigger new workflow
If no -> process

Rate limiting

If a webhook endpoint is public, it can be abused. Rate limiting prevents flooding.

Simple IP rate limit

Track request count per IP in a Data Store / Redis / database. If >N requests in time window (e.g., 10 requests/minute), reject with 429.

In-app rate limiting (if supported)

GoHighLevel, HubSpot, and others rate-limit incoming webhooks by workflow. Configure at the destination level.

Per-contact rate limit

Prevent the same email from triggering 20 workflows in an hour. Use a tag like "processed-today" with 24-hour expiry - skip if present.

Error handling

When validation fails:

Option 1: Reject with HTTP status

Return 400/401/403 to the sender. For webhooks from tools like Stripe or Shopify, this triggers automatic retry with exponential backoff.

Option 2: Accept but log to error queue

Return 200 OK but route the payload to an error-handling workflow:

Store payload in a "Review" table
Notify admin via Slack/email
Don't create bad data in CRM

Option 2 is better when you can't risk upstream retries creating worse problems.

Option 3: Partial acceptance with flags

Accept the data, create the contact, but tag it "needs-review." Human reviews and cleans up. Not ideal but sometimes necessary for critical flows that can't miss any data.

Real webhook examples

Form submission webhook

Expected payload:

{
  "email": "[email protected]",
  "phone": "+15551234567",
  "name": "Jane Doe",
  "business_type": "solar",
  "budget": "10k-25k"
}

Validation:

email OR phone required
email format valid
business_type in canonical list
budget from dropdown values

VAPI call webhook

Expected payload:

{
  "call_id": "uuid",
  "status": "completed",
  "duration": 180,
  "transcript": "...",
  "outcome": "qualified",
  "contact_phone": "+15551234567"
}

Validation:

call_id present (for idempotency)
status from expected enum
duration is positive integer
contact_phone is valid E.164
Dedup by call_id to prevent double-processing

Stripe payment webhook

Expected payload: Standard Stripe event object.

Validation:

Signature verification (critical - prevents webhook spoofing)
Event type in expected list
Idempotency by event ID

What NOT to do

1. Trust all incoming data. "It came from a webhook, so it must be fine." Webhooks can be malformed, malicious, or accidentally duplicated.

2. Build happy-path only. Your workflow works for valid data. What about invalid? What about missing fields? What about duplicates? Design for the failure cases.

3. Skip logging. When something goes wrong, you need a trail. Log every webhook receipt, even invalid ones, with enough context to debug.

4. Validate too strictly. If validation rejects 30% of real submissions because of a too-tight regex, you're losing leads. Validate what matters, accept what's valid.

5. Rely only on application-layer validation. If possible, validate at the database layer too. Unique constraints prevent dupes even if application logic has bugs.

Sources

Patterns in this article are industry-standard data validation practices, adaptable from programming language references (RFC 5321 for emails, ITU-T E.164 for phone numbers) and service documentation (Stripe webhooks, Shopify webhooks, GitHub webhooks - all of which document signature verification patterns). Implementation examples tested across Make.com and n8n deployments.

Need help designing webhook validation for a specific integration? Let's talk - I can audit your current webhook endpoints and harden them.

Sources and verification

This article was reviewed in May 2026. Vendor pricing, platform features, ad policies, and telemarketing rules change often, so operational or budget decisions should be checked against the current source pages below before implementation.

Private client metrics, lead counts, appointment counts, cost reductions, and revenue examples are intentionally removed, softened, or framed as modeled examples unless they can be verified publicly without exposing client data.

Need this built?

Turn this reading into a scoped operating system.

Use the intake to send the business context first, then the build conversation can stay focused on the workflow that needs to change.

Build My System See Proof

Data Engineering

Data Normalization for CRM Contacts: Fixing the Mess Before It Gets Worse

> Verification note: This post was re-reviewed in May 2026. Public tool pricing, compliance rules, and platform capabiliti...

18 Apr 2026 / 7 min read

Image: Data engineering, code, and reporting systems

Data Engineering

Large CRM Data Pipeline Case Study: Privacy-Safe Architecture Notes

18 Nov 2025

8 min

read

> Verification note: This post was re-reviewed in May 2026. Public tool pricing, compliance rules, and platform capabilities should be checked against the source list...

HMX ZONERead article

Why webhook validation matters

The 5 validation layers

Layer 1: Structure validation

Layer 2: Format validation

Layer 3: Business logic validation

Layer 4: Deduplication

Layer 5: Security validation

Implementation: Make.com

Basic structure validation

Email regex validation

Deduplication via HubSpot/GHL lookup

Security via shared secret

Implementation: n8n

Webhook node validation

Signature verification (Stripe example)

Deduplication patterns

Pattern 1: Email/phone lookup before create

Pattern 2: Event ID idempotency

Pattern 3: Time-window dedup

Rate limiting

Simple IP rate limit

In-app rate limiting (if supported)

Per-contact rate limit

Error handling

Option 1: Reject with HTTP status

Option 2: Accept but log to error queue

Option 3: Partial acceptance with flags

Real webhook examples

Form submission webhook

VAPI call webhook

Stripe payment webhook

What NOT to do

Sources

Sources and verification

Turn this reading into a scoped operating system.

Related articles

Data Normalization for CRM Contacts: Fixing the Mess Before It Gets Worse

Large CRM Data Pipeline Case Study: Privacy-Safe Architecture Notes