Rant: SignalR, Crazyiness, Head Butting & Wall Crashing

filter by tags archive

architecture (618) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (647) rss
hibernating-practices (72) rss
miscellaneous (592) rss
performance (397) rss
programming (1093) rss
raven (1459) rss
ravendb.net (545) rss
reviews (184) rss

2025
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Jul 20 2012

RantSignalR, Crazyiness, Head Butting & Wall Crashing

time to read 9 min | 1776 words

Before I get to the entire story, a few things:

The SignalR team is amazingly helpful.
SignalR isn’t released, it is a 0.5 release.

Even so, the version that I was using was the very latest, not even the properly released 0.5 version.

My use cases are probably far out from what SignalR is set out to support.
A lot of the problems were actually my fault.

One of the features for 1.2 is the changes features, a way to subscribe to notifications from the databases, so you won’t have to poll for them. Obviously, this sounded like a good candidate for SingalR, so I set out to integrate SignalR into RavenDB.

Now, that ain’t as simple as it sounds.

SignalR relies on Newtonsoft.Json, which RavenDB also used to use. The problem with version compact meant that we ended up internalizing this dependency, so we have had to resolve this first.
RavenDB runs in IIS and as its own (HttpListener based) host. SignalR does the same, but makes assumptions about how it runs.
We need to minimize connection counts.
We need to support logic & filtering for events on both server side and client side.

The first two problems we solved by brute force. We internalized the SignalR codebase and converted its Netwonsoft.Json usage to the RavenDB’s internalize version. Then I wrote modified one of the SignalR hosts to allow us to integrate that with the way RavenDB works.

So far, that was relatively straightforward process. Then we had to write the integration parts. I posted about the external API yesterday.

My first attempt to write it was something like this:

    public class Notifications : PersistentConnection
    {
        public event EventHandler Disposed = delegate { }; 
        
        private HttpServer httpServer;
        private string theConnectionId;

        public void Send(ChangeNotification notification)
        {
            Connection.Send(theConnectionId, notification);
        }
        public override void Initialize(IDependencyResolver resolver)
        {
            httpServer = resolver.Resolve<HttpServer>();
            base.Initialize(resolver);
        }

        protected override System.Threading.Tasks.Task OnConnectedAsync(IRequest request, string connectionId)
        {
            this.theConnectionId = connectionId;
            var db = request.QueryString["database"];
            if(string.IsNullOrEmpty(db))
                throw new ArgumentException("The database query string element is mandatory");

            httpServer.RegisterConnection(db, this);

            return base.OnConnectedAsync(request, connectionId);
        }

        protected override System.Threading.Tasks.Task OnDisconnectAsync(string connectionId)
        {
            Disposed(this, EventArgs.Empty);
            return base.OnDisconnectAsync(connectionId);
        }
    }

This is the very first attempt. I then added the ability to add items of interest via the connection string, but that is the basic idea.

It worked, I was able to write the feature, and aside from some issues that I had grasping things, everything was wonderful. We had passing tests, and I moved on to the next step.

Except that…. sometimes…. those tests failed. Once every so often, and that indicate a race condition.

It took a while to figure out what was going on, but basically, what happened was that sometimes, SignalR uses a long polling transport to send messages. Note the code above, we register for events as long as we are connected. In long polling system (and in general in persistent connections that may come & go), it is quite common to have periods of time where you aren’t actually connected.

The race condition would happen because of the following sequence of events:

Connected
Got message (long pooling, cause disconnect)
Disconnect
Message raised, client is not connected, message is gone
Connected
No messages for you

I want to emphasize that this particular issue is all me. I was the one misusing SignalR, and the behavior makes perfect sense.

SignalR actually contains a message bus abstraction exactly for those reasons. So I was supposed to use that. I know that now, but then I decided that I probably using the API at the wrong level, and moved to use hubs and groups.

In this way, you could connect to the hub, request to join to the group watching a particular document, and voila, we are done. That was the theory, at least. In practice, this was very frustrating. The first major issue was that I just couldn’t get this thing to work.

The relevant code is:

    return temporaryConnection.Start()
                .ContinueWith(task =>
                {
                    task.AssertNotFailed();

                    hubConnection = temporaryConnection;
                    proxy = hubConnection.CreateProxy("Notifications");
                });

Note that I create the proxy after the connection has been established.

That turned out to be an issue, you have to create the proxy first, then call start. If you don’t, SignalR will look like it is working fine, but will ignore all hub calls. I had to trace really deep into the SignalR codebase to figure that one out.

In my opinion (already communicated to the team) is that if you start a hub without a proxy, that is probably an error and should throw.

Once we got that fix, things started to work, and the test run.

Most of the time, that is. Once in a while, the tests would fail. Again, the issue was a race condition. But I wasn’t doing anything wrong, I was using SignalR’s API in a way straight out of the docs. This turned out to be a probably race condition inside InProcessMessageBus, where because of multiple threads running, registering for a group inside SignalR isn’t visible on the next request.

That was extremely hard to debug.

Next, I decided to do away with hubs, by this time, I had a lot more understanding of the way SignalR worked, and I decided to go back to persistent connections, and simply implement the message dispatch in my code, rather than rely on SignalR groups.

That worked, great. The tests even passed more or less consistently.

The problem was that they also crashed the unit testing process, because of leaked exceptions. Here is one such case, in HubDispatcher.OnRecievedAsync():

 return resultTask
                .ContinueWith(_ => base.OnReceivedAsync(request, connectionId, data))
                .FastUnwrap();

Note that “_” parameter. This is a convention I use as well, to denote a parameter that I don’t care for). The problem here is that this parameter is a task, and if this task failed, you have a major problem, because on .NET 4.0, this will crash your system. In 4.5, that is fine and can be safely ignored, but RavenDB runs on 4.0.

So I found those places and I fixed them.

And then we run into hangs. Specifically, we had issues with disposing of connections, and sometimes of not disposing them, and…

That was the point when I cut it.

I like the SignalR model, and most of the codebase is really good. But it is just not in the right shape for what I needed. By this time, I already have a pretty good idea about how SignalR operates, and it was a work of a few hours to get it working without SignalR. RavenDB now sports a streamed endpoint that you can register yourself to, and we have a side channel that you can use to send commands on to the server. It might not be as elegant, but it is simpler by a few orders of magnitude, and once we figure that out, we have a full blown working system at our hands. All the test passes, we have no crashes, yeah!

I will post exactly on how we did it in a future post.

Tweet Share Share 17 comments

Tags:

raven

Comments

20 Jul 2012
12:26 PM

tobi

Sounds like you don't need what SignalR has anyway: All you need is a persistent TCP connection to each listing client (of which there are only a few).

20 Jul 2012
12:57 PM

Damian

streamed endpoint

This is over http, right? Does it handle re-connects? Is it proxy / firewall friendly? Any sort of guaranteed message (notification) delivery? Consider using websockets at all?

20 Jul 2012
16:30 PM

Slava

Thanks for sharing this, we are few days away from jumping into SignalIR, but now i would reconsider it. Did you try any other tools by chance, bosh, websockets?

20 Jul 2012
16:44 PM

Daniel Lang

Tobi and Damian, since RavenDB exposes only an http endpoint, I don't think it can nor should it use WebSockets or any other TCP based protocol except http. So, long-polling is probably the only way to go and it should work with any kind of http hardware, e.g. load-balancers (although the RavenDB client can do this much better).

20 Jul 2012
16:47 PM

Daniel Lang

Slava, if you don't have such fancy use-cases as Oren has, then go for SignalR. It is an awesome piece of software and we've been using it since its early version without any serious issues.

20 Jul 2012
16:53 PM

David Fowler

The very first implementation looks like it should work just fine. We don't actually raise disconnect in the longpolling transport when messages are received. The logical connection hasn't been disconnected, just the underlying transport's connection (but that's what this abstraction is for). We buffer messages for 30 seconds, so if the transport is reconnecting it will still get those messages that it "missed" (as long as they are still there). If you didn't see that behavior I'd love to know why as it should just work.

20 Jul 2012
17:25 PM

Ayende Rahien

David, I put Console.WriteLine in the disconnected, and it was getting called.

20 Jul 2012
17:34 PM

David Fowler

Let's setup some time to go over things. I'm sure it's something that can be solved pretty easily. That's the intent, there might have been some unrelated thing going on that was causing those issues.

20 Jul 2012
20:14 PM

Damian Hickey

Daniel, websockets would the be preferred mechanism to receive notifications, falling back to long polling if needed.

20 Jul 2012
22:37 PM

Daniel Lang

Damian, yes, in case you have a web server (starting with IIS 8) and a web application. This is just a database, that uses http as the transport protocol. Using websockets for this kind of thing would mean that we need additional ports to be opened on the RavenDB server, whereas long-polling can share the same connection.

21 Jul 2012
00:11 AM

Slava

out f curiosity why not to use Bosh? xmpp were using it for very long time.

21 Jul 2012
07:32 AM

Ayende Rahien

Slava, We only need streaming one way

21 Jul 2012
07:42 AM

Damian Hickey

Daniel, websockets are over the same https(s) ports. The initial connection is still http. IIS is not required. http://paulbatum.github.com/WebSocket-Samples/HttpListenerWebSocketEcho/ . Yes, this is .net 4.5, but there are other .net 4.0 websocket implementations out there.

21 Jul 2012
07:45 AM

Ayende Rahien

Damian, That requires software that is not released, and I looked at the other WebSockets implememntations for 4.0. No thanks, they are scary inside.

21 Jul 2012
07:56 AM

Damian Hickey

Ayende, should be released if a few a weeks. Prob not in time for 1.2 though. I may scratch that itch then, for the craic. Fair enough on the other implementations... Still wondering if you handle dis/re-connects and any sort guaranteed message delivery in the case of a dropped connection?

21 Jul 2012
08:02 AM

Ayende Rahien

Damian, We do retries for that

26 Jul 2012
16:09 PM

Andrei Alecu

Interesting that you ran into the same InProcessMessageBus that I did. I have made a temporary fix available in this pull request:

https://github.com/SignalR/SignalR/pull/559

Dfowler said he has a better idea for a permanent fix, but for now, the above PR should do :)

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

RantSignalR, Crazyiness, Head Butting & Wall Crashing

More posts in "Rant" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "Rant" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication