Rhino Queues feature: detecting message failures
One of the more annoying problems with async messaging is that you have no way of knowing if the destination that you are sending to is up or not. Well, that is also one of the major advantages, of course. It gets annoying, however, when you need to be able to respond to node failure in a more proactive manner. There are ways around that, usually with message replies, heartbeats or timeouts, but they tend to be complex, and they also tend to be visible for the actual application.
I just added a very small feature for Rhino Queues that would let you know whenever we have failed to send a message to its destination. Note that failing to send the message to the destination doesn't really mean much, Rhino Queues will retry sending the message in increasing intervals for about 3 days. In fact, here is the retry schedule for a message before it is considered dead:
Retry | Delay | Time |
0 | 0:00:00 | 4/6/2009 00:00:00 |
1 | 0:00:01 | 4/6/2009 00:00:01 |
2 | 0:00:04 | 4/6/2009 00:00:05 |
3 | 0:00:09 | 4/6/2009 00:00:14 |
4 | 0:00:16 | 4/6/2009 00:00:30 |
5 | 0:00:25 | 4/6/2009 00:00:55 |
6 | 0:00:36 | 4/6/2009 00:01:31 |
7 | 0:00:49 | 4/6/2009 00:02:20 |
8 | 0:01:04 | 4/6/2009 00:03:24 |
9 | 0:01:21 | 4/6/2009 00:04:45 |
10 | 0:01:40 | 4/6/2009 00:06:25 |
11 | 0:02:01 | 4/6/2009 00:08:26 |
12 | 0:02:24 | 4/6/2009 00:10:50 |
13 | 0:02:49 | 4/6/2009 00:13:39 |
14 | 0:03:16 | 4/6/2009 00:16:55 |
15 | 0:03:45 | 4/6/2009 00:20:40 |
16 | 0:04:16 | 4/6/2009 00:24:56 |
17 | 0:04:49 | 4/6/2009 00:29:45 |
18 | 0:05:24 | 4/6/2009 00:35:09 |
19 | 0:06:01 | 4/6/2009 00:41:10 |
20 | 0:06:40 | 4/6/2009 00:47:50 |
21 | 0:07:21 | 4/6/2009 00:55:11 |
22 | 0:08:04 | 4/6/2009 01:03:15 |
23 | 0:08:49 | 4/6/2009 01:12:04 |
24 | 0:09:36 | 4/6/2009 01:21:40 |
25 | 0:10:25 | 4/6/2009 01:32:05 |
26 | 0:11:16 | 4/6/2009 01:43:21 |
27 | 0:12:09 | 4/6/2009 01:55:30 |
28 | 0:13:04 | 4/6/2009 02:08:34 |
29 | 0:14:01 | 4/6/2009 02:22:35 |
30 | 0:15:00 | 4/6/2009 02:37:35 |
31 | 0:16:01 | 4/6/2009 02:53:36 |
32 | 0:17:04 | 4/6/2009 03:10:40 |
33 | 0:18:09 | 4/6/2009 03:28:49 |
34 | 0:19:16 | 4/6/2009 03:48:05 |
35 | 0:20:25 | 4/6/2009 04:08:30 |
36 | 0:21:36 | 4/6/2009 04:30:06 |
37 | 0:22:49 | 4/6/2009 04:52:55 |
38 | 0:24:04 | 4/6/2009 05:16:59 |
39 | 0:25:21 | 4/6/2009 05:42:20 |
40 | 0:26:40 | 4/6/2009 06:09:00 |
41 | 0:28:01 | 4/6/2009 06:37:01 |
42 | 0:29:24 | 4/6/2009 07:06:25 |
43 | 0:30:49 | 4/6/2009 07:37:14 |
44 | 0:32:16 | 4/6/2009 08:09:30 |
45 | 0:33:45 | 4/6/2009 08:43:15 |
46 | 0:35:16 | 4/6/2009 09:18:31 |
47 | 0:36:49 | 4/6/2009 09:55:20 |
48 | 0:38:24 | 4/6/2009 10:33:44 |
49 | 0:40:01 | 4/6/2009 11:13:45 |
50 | 0:41:40 | 4/6/2009 11:55:25 |
51 | 0:43:21 | 4/6/2009 12:38:46 |
52 | 0:45:04 | 4/6/2009 13:23:50 |
53 | 0:46:49 | 4/6/2009 14:10:39 |
54 | 0:48:36 | 4/6/2009 14:59:15 |
55 | 0:50:25 | 4/6/2009 15:49:40 |
56 | 0:52:16 | 4/6/2009 16:41:56 |
57 | 0:54:09 | 4/6/2009 17:36:05 |
58 | 0:56:04 | 4/6/2009 18:32:09 |
59 | 0:58:01 | 4/6/2009 19:30:10 |
60 | 1:00:00 | 4/6/2009 20:30:10 |
61 | 1:02:01 | 4/6/2009 21:32:11 |
62 | 1:04:04 | 4/6/2009 22:36:15 |
63 | 1:06:09 | 4/6/2009 23:42:24 |
64 | 1:08:16 | 4/7/2009 00:50:40 |
65 | 1:10:25 | 4/7/2009 02:01:05 |
66 | 1:12:36 | 4/7/2009 03:13:41 |
67 | 1:14:49 | 4/7/2009 04:28:30 |
68 | 1:17:04 | 4/7/2009 05:45:34 |
69 | 1:19:21 | 4/7/2009 07:04:55 |
70 | 1:21:40 | 4/7/2009 08:26:35 |
71 | 1:24:01 | 4/7/2009 09:50:36 |
72 | 1:26:24 | 4/7/2009 11:17:00 |
73 | 1:28:49 | 4/7/2009 12:45:49 |
74 | 1:31:16 | 4/7/2009 14:17:05 |
75 | 1:33:45 | 4/7/2009 15:50:50 |
76 | 1:36:16 | 4/7/2009 17:27:06 |
77 | 1:38:49 | 4/7/2009 19:05:55 |
78 | 1:41:24 | 4/7/2009 20:47:19 |
79 | 1:44:01 | 4/7/2009 22:31:20 |
80 | 1:46:40 | 4/8/2009 00:18:00 |
81 | 1:49:21 | 4/8/2009 02:07:21 |
82 | 1:52:04 | 4/8/2009 03:59:25 |
83 | 1:54:49 | 4/8/2009 05:54:14 |
84 | 1:57:36 | 4/8/2009 07:51:50 |
85 | 2:00:25 | 4/8/2009 09:52:15 |
86 | 2:03:16 | 4/8/2009 11:55:31 |
87 | 2:06:09 | 4/8/2009 14:01:40 |
88 | 2:09:04 | 4/8/2009 16:10:44 |
89 | 2:12:01 | 4/8/2009 18:22:45 |
90 | 2:15:00 | 4/8/2009 20:37:45 |
91 | 2:18:01 | 4/8/2009 22:55:46 |
92 | 2:21:04 | 4/9/2009 01:16:50 |
93 | 2:24:09 | 4/9/2009 03:40:59 |
94 | 2:27:16 | 4/9/2009 06:08:15 |
95 | 2:30:25 | 4/9/2009 08:38:40 |
96 | 2:33:36 | 4/9/2009 11:12:16 |
97 | 2:36:49 | 4/9/2009 13:49:05 |
98 | 2:40:04 | 4/9/2009 16:29:09 |
99 | 2:43:21 | 4/9/2009 19:12:30 |
100 | 2:46:40 | 4/9/2009 21:59:10 |
This new feature doesn't impact this, it will simply tell you whenever a send failure has occurred. This lets you build more sophisticated error handling strategies around that. You will probably want to wait for several consecutive failures of the same endpoint before deciding to do something about it, of course, but the capability is there.
Comments
Is there some rhyme and reason to that delay schedule that I'm missing? Log, exponential, primes, table?
Yes, add the retryCount * retryCount as seconds to the current time.
Oh delay + retry; are you calling that recursively or pulling from a table?
Haha - sorry - too early in the morning for me. Thanks.
Comment preview