Engineering
Designing for reliable webhook delivery
Daniel ParkJanuary 8, 20258 min read
Reliable webhook delivery is one of the hardest problems in distributed systems. Here's the architecture we've iterated on - 99.95% successful delivery across 42 million events.
The retry architecture
Our retry engine uses an exponential back-off strategy with jitter, starting at 5 seconds and maxing out at 24 hours. Each attempt is recorded in an append-only log so customers can inspect every delivery attempt.
Dead letter queues
When all retries are exhausted, failed deliveries need a clear operational path. Forge records final status, endpoint response bodies, and single-event retry controls so teams can fix issues without replaying a whole queue.