Building a Resilient Webhook System with Node.js and Google Cloud Tasks
Real-time notifications are the backbone of modern, interconnected applications. Whether you're alerting a Slack channel about a new sale or syncing data with a CRM, a robust notification system is essential. At Bookify, we recently implemented a webhook system to provide our users with real-time updates on events, starting with booking.created
.
In this post, we'll break down how we designed and built this system, focusing on our key architectural goals: security, resilience, and scalability.
The 30,000-Foot View: System Architecture
Our webhook system is composed of two main parts:
- A Management API: A standard set of CRUD endpoints that allow users to create, read, update, and delete their webhook subscriptions.
- A Notification Pipeline: An asynchronous, queue-based pipeline that handles the delivery of webhook payloads when an event occurs.
Let's dive into how each part is built.
Part 1: The Management API - A Secure Foundation
First, users need a way to manage their webhook endpoints. We exposed a new set of endpoints under /api/webhooks
that provide full control.
The API contract is straightforward:
- POST /webhooks: Creates a new webhook endpoint. Requires a
url
and an array ofevents
to subscribe to. - GET /webhooks: Lists all webhook endpoints for the user's organization.
- PATCH /webhooks/{id}: Updates an endpoint's
url
,disabled
status, or subscribedevents
. - DELETE /webhooks/{id}: Deletes a webhook endpoint permanently.
The implementation follows a standard Controller-Service pattern. The WebhookEndpointsController
is responsible for handling incoming HTTP requests, validating the input, and calling the appropriate service method.
// src/controllers/webhookEndpointsController.ts
// ...
export class WebhookEndpointsController {
// ...
async createWebhookEndpoint(req: AuthenticatedRequest, res: Response): Promise<void> {
const orgId = req.user?.orgId;
// ... organization check ...
const createDto: ICreateWebhookEndpointDto = {
orgId,
url: req.body.url,
events: req.body.events
};
// Validate input using our utility function
const validation = validateCreateWebhookEndpointDto(createDto);
if (!validation.isValid) {
res.status(400).json({ message: 'Invalid input', errors: validation.errors });
return;
}
const { endpoint } = await this.webhookEndpointsService.createWebhook(createDto);
res.status(201).json({ endpoint });
}
// ... other methods
}
This is pretty standard, but the real magic happens in the service layer, where we enforce our security principles.
Part 2: Core Logic & Security - Secrets and Isolation
Security was our top priority. Two principles guided our implementation in WebhookEndpointsService
: secure secret management and strict data isolation.
The Webhook Secret Lifecycle 🤫
Every webhook endpoint is protected by a unique secret. This secret is used by the subscriber to verify that an incoming request genuinely came from Bookify.
Here's how we handle it:
- Generation: When a user creates a new endpoint, we generate a unique, cryptographically secure secret prefixed with
whsec_
. - One-Time Reveal: The plaintext secret is returned to the user only once, in the response to the POST
/webhooks
creation request. We make it clear in our documentation that they must store this secret securely. - Encrypted at Rest: We never store the plaintext secret in our database. Instead, we use an
encrypt()
utility to encrypt the secret before persisting it. This adds a critical layer of defense. - Never Exposed Again: The secret is never included in any other API response, such as GET or PATCH requests.
// src/services/webhookEndpointsService.ts
async createWebhook(createDto: ICreateWebhookEndpointDto): Promise<{ endpoint: IWebhookEndpointResponse, plaintextSecret: string }> {
const { orgId, url, events } = createDto;
// 1. Generate a plaintext secret
const plaintextSecret = generateWebhookSecret();
// 2. Encrypt the secret for storage
const encryptedSecret = encrypt(plaintextSecret);
const newWebhook: IWebhookEndpoint = {
// ... other fields
secret: encryptedSecret, // Store the encrypted version
};
await this.db.write(this.collectionName, newWebhook);
// 3. Return the plaintext secret ONLY on creation
const endpointResponse: IWebhookEndpointResponse = {
// ... other fields
secret: plaintextSecret
};
return { endpoint: endpointResponse, plaintextSecret };
}
Strict Organization Isolation 🛡️
In a multi-tenant system, it's absolutely critical that one organization cannot view or modify the resources of another. We enforce this in our service layer. Instead of just relying on the webhook id, every database operation requires both the id and the user's orgId
.
The pattern is simple but effective: fetch the record first, then verify ownership before performing the action.
// src/services/webhookEndpointsService.ts
async deleteWebhook(id: string, orgId: string): Promise<boolean> {
// 1. Fetch the webhook by its unique ID
const existingWebhooks = await this.db.read(this.collectionName, {
field: 'id',
operator: '==',
value: id
});
if (existingWebhooks.length === 0) {
return false; // Not found
}
const existingWebhook = existingWebhooks[0] as IWebhookEndpoint;
// 2. VERIFY ownership before deleting
if (existingWebhook.orgId !== orgId) {
return false; // Unauthorized
}
// 3. Proceed with deletion
await this.db.delete(
this.collectionName,
{ field: 'id', operator: '==', value: id }
);
return true;
}
This prevents Insecure Direct Object Reference (IDOR) vulnerabilities and ensures data is strictly sandboxed.
Part 3: The Notification Pipeline - Asynchronous & Resilient Delivery
When a booking is created, we need to notify all subscribed webhooks. A naive approach would be to loop through the subscribers and send POST requests directly within the booking creation logic. This has two major flaws:
- Brittleness: If a subscriber's endpoint is slow or down, our booking creation API will hang or fail. The core user experience is held hostage by a secondary system.
- Poor Performance: The user who made the booking has to wait for all webhooks to be sent before they get a response.
To solve this, we made our delivery pipeline asynchronous using Google Cloud Tasks.
The key dependency we added was @google-cloud/tasks
.
The new workflow looks like this:
- A booking is created successfully in
createBookingUtil
. - We then call
webhookNotificationService.notifyWebhooks
. This is a non-blocking, "fire-and-forget" call. - The
CloudTaskNotificationService
finds all active webhooks subscribed to thebooking.created
event for that organization. - For each webhook, it creates a task and enqueues it in Google Cloud Tasks. This task contains the payload and destination URL.
- A separate, serverless Cloud Function (our "delivery worker") listens to this queue, picks up tasks, and makes the final HTTP POST request to the user's endpoint. Google Cloud Tasks handles retries automatically if the delivery fails.
This architecture decouples the core business logic from the notification logic. We explicitly wrap the webhook notification call in a try...catch
block to ensure that even if the entire notification system fails (e.g., can't connect to the task queue), the booking creation still succeeds.
// src/utils/bookingUtils.ts
export async function createBookingUtil(...) {
// ... core booking creation logic ...
const booking = await services.bookingService.createBooking(finalBookingData);
// Send webhook notifications (non-blocking)
if (services.webhookNotificationService) {
try {
await services.webhookNotificationService.notifyWebhooks(
orgId,
WEBHOOK_EVENTS.BOOKING_CREATED,
{ booking: formattedBooking }
);
} catch (webhookError) {
// Log webhook errors but DO NOT fail the booking creation
console.error('Webhook notification failed for booking creation:', webhookError);
}
}
return { success: true, booking };
}
This design makes our system incredibly resilient. The user gets a fast response, and we can guarantee that webhook delivery is reliably handled (and retried) in the background, without impacting core application performance.
Conclusion
By combining a secure API for management with a resilient, queue-based delivery pipeline, we've built a webhook system that is robust and ready to scale. The key takeaways from our implementation are:
- Secure by Design: Handle secrets with extreme care—encrypt at rest and reveal only once.
- Isolate Your Tenants: Always verify ownership of a resource before acting on it.
- Embrace Asynchronicity: Use a task queue like Google Cloud Tasks to decouple non-critical operations (like sending notifications) from your core business logic. This is the secret to building resilient and performant systems.