Ayende @ Rahien

Refunds available at head office

Document based modeling: Auctions & Bids

In my previous post, we dealt with how to model Auctions and Products, this time, we are going to look at how to model bids.

Before we can do that, we need to figure out how we are going to use them. As I mentioned, I am going to use Ebay as the source for “application mockups”.  So I went to Ebay and took a couple of screen shots.

Here is the actual auction page:

image

And here is the actual bids page.

image

This tells us several things:

  • Bids aren’t really accessed for the main page.
  • There is a strong likelihood that the number of bids is going to be small for most items (less than a thousand).
  • Even for items with a lot of bids, we only care about the most recent ones for the most part.

This is the Auction document as we have last seen it:

{
   "Quantity":15,
   "Product":{
      "Name":"Flying Monkey Doll",
      "Colors":[
         "Blue & Green"
      ],
      "Price":29,
      "Weight":0.23
   },
   "StartsAt":"2011-09-01",
   "EndsAt":"2011-09-15"
}

The question is where are we putting the Bids? One easy option would be to put all the bids inside the Auction document, like so:

{
   "Quantity":15,
   "Product":{
      "Name":"Flying Monkey Doll",
      "Colors":[
         "Blue & Green"
      ],
      "Price":29,
      "Weight":0.23
   },
   "StartsAt":"2011-09-01",
   "EndsAt":"2011-09-15",
   "Bids": [
     {"Bidder": "bidders/123", "Amount": 0.1, "At": "2011-09-08T12:20" }
   ]
}

The problem with such an approach is that we are now forced to load the Bids whenever we want to load the Auction, but the main scenario is that we just need the Auction details, not all of the Bids details. In fact, we only need the count of Bids and the Winning Bid, it will also fail to handle properly the scenario of High Interest Auction, one that has a lot of Bids.

That leave us with few options. One of those indicate that we don’t really care about Bids and Auction as a time sensitive matter. As long as we are accepting Bids, we don’t really need to give you immediate feedback. Indeed, this is how most Auction sites work. They give you a cached view of the data, refreshing it every 30 seconds or so. The idea is to reduce the cost of actually accepting a new Bids to the minimum necessary. Once the Auction is closed, we can figure out who actually won and notify them.

A good design for this scenario would be a separate Bid document for each Bid, and a map/reduce index to get the Winning Bid Amount and Big Count. Something like this:

     {"Bidder": "bidders/123", "Amount": 0.1, "At": "2011-09-08T12:20", "Auction": "auctions/1234"}
     {"Bidder": "bidders/234", "Amount": 0.15, "At": "2011-09-08T12:21", "Auction": "auctions/1234" }
     {"Bidder": "bidders/123", "Amount": 0.2, "At": "2011-09-08T12:22", "Auction": "auctions/1234" }

And the index:

from bids in docs.Bids
select new { Count = 1, bid.Amount, big.Auction }

select result from results
group result by result.Auction into g
select new 
{
   Count = g.Sum(x=>x.Count),
   Amount = g.Max(x=>x.Amount),
   Auction = g.Key
}

As you can imagine, due to the nature of RavenDB’s indexes, we can cheaply insert new Bids, without having to wait for the indexing to work. And we can always display the last calculated value of the Auction, including what time it is stable for.

That is one model for an Auction site, but another one would be a much stringer scenario, where you can’t just accept any Bid. It might be a system where you are charged per bid, so accepting a known invalid bid is not allowed (if you were outbid in the meantime). How would we build such a system? We can still use the previous design, and just defer the actual billing for a later stage, but let us assume that this is a strong constraint on the system.

In this case, we can’t rely on the indexes, because we need immediately consistent information, and we need it to be cheap. With RavenDB, we have the document store, which is ACIDly consistent. So we can do the following, store all of the Bids for an Auction in a single document:

{
   "Auction": "auctions/1234",
   "Bids": [
     {"Bidder": "bidders/123", "Amount": 0.1, "At": "2011-09-08T12:20", "Auction": "auctions/1234"}
     {"Bidder": "bidders/234", "Amount": 0.15, "At": "2011-09-08T12:21", "Auction": "auctions/1234" }
     {"Bidder": "bidders/123", "Amount": 0.2, "At": "2011-09-08T12:22", "Auction": "auctions/1234" }
    ]
}

And we modify the Auction document to be:

{
   "Quantity":15,
   "Product":{
      "Name":"Flying Monkey Doll",
      "Colors":[
         "Blue & Green"
      ],
      "Price":29,
      "Weight":0.23
   },
   "StartsAt":"2011-09-01",
   "EndsAt":"2011-09-15",
   "WinningBidAmount": 0.2,
   "BidsCount" 3
}

Adding the BidsCount and WinningBidAmount to the Auction means that we can very cheaply show them to the users. Because RavenDB is transactional, we can actually do it like this:

using(var session = store.OpenSession())
{
  session.Advanced.OptimisticConcurrency = true;
  
  var auction = session.Load<Auction>("auctions/1234")
  var bids = session.Load<Bids>("auctions/1234/bids");
  
  bids.AddNewBid(bidder, amount);
  
  auction.UpdateStatsFrom(bids);
  
  session.SaveChanges();
}

We are now guaranteed that this will either succeed completely (and we have a new winning bid), or it will fail utterly, leaving no trace. Note that AddNewBid will reject a bid that isn’t the higher (throw an exception), and if we have two concurrent modifications, RavenDB will throw on that. Both the Auction and its Bids are treated as a single transactional unit, just the way it should.

The final question is how to handle High Interest Auction, one that gather a lot of bids. We didn’t worry about it in the previous model, because that was left for RavenDB to handle. In this case, since we are using a single document for the Bids, we need to take care of that ourselves. There are a few things that we need to consider here:

  • Bids that lost are usually of little interest.
  • We probably need to keep them around, just in case, nevertheless.

Therefor, we will implement splitting for the Bids document. What does this means?

Whenever the number of Bids in the Bids document reaches 500 Bids, we split the document. We take the oldest 250 Bids and move them to Historical Bids document, and then we save.

That way, we have a set of historical documents with 250 Bids each that no one is ever likely to read, but we need to keep, and we have the main Bids document, which contains the most recent (and relevant Bids. A High Interest Auction might end up looking like:

  • auctions/1234 <- Auction document
  • auctions/1234/bids <- Bids document
  • auctions/1234/bids/1 <- historical bids #1
  • auctions/1234/bids/2 <- historical bids #2

And that is enough for now I think, this post went on a little longer than I intended, but hopefully I was able to explain to you both the final design decisions and the process used to reach them.

Thoughts?

Tags:

Published at

Originally posted at

Comments (22)

Hiring Questions–The phone book

One of the things that we ask some of our interviewees is to give us a project that would answer the following:

We need a reusable library to manage phone books for users. User interface is not required, but we do need an API to create, delete and edit phone book entries. An entry contains a Name (first and last), type (Work, Cellphone or Home) and number. Multiple entries under the same name are allowed. The persistence format of the phone book library is a file, and text based format such as XML or Json has been ruled out.

In addition to creating / editing / deleting, the library also need to support iterating over the list in alphabetical order or by the first or last name of each entry.

The fun part with this question is that it is testing so many things at the same time, it gives me a lot of details about the kind of candidate that I have in front of me. From their actual ability to solve a non trivial problem, the way they design and organize code, the way they can understand and implement a set of requirements, etc.

The actual problem is something that I remember doing as an exercise during high school (in Pascal, IIRC).

Tags:

Published at

Originally posted at

Comments (32)

Don’t give a chicken access to your schedule / feature set

A chicken, in this case, is the same chicken from the Pig & Chicken who wanted to open the eggs & ham place. This is a term used in agile a lot.

There are many teams who feel that being responsive to client demands is a Good Thing. In general, they are usually right, but you have to be very aware who is asking, and what stakes they have in the game. If they don’t own the budget for your team, they don’t get to ask for features and get a “sure thing” automatically.

Case in point, I was asked by another team in a totally different company what direction they should go for a decision that directly impact my software. I am using their stuff, and as such, they sought my feedback. The problem is that my recommendation was based on what I actually needed. They had two options, one of which would take a week or two, and would provide the basic set of services. The other would take several months to develop, but would allow me to create much better options for my users.

I think that you can guess what I ended up recommending, since from my point of view, there is absolutely no down side whatsoever. If they end up implementing the basic stuff, that is okay. If they implement the advanced stuff, that is great. At any case, my cost end up being zero.

I am a chicken in this game, and I want the biggest piece of meat (but make it non pig) that I can get, since I am eating on the house.

Whenever you let customer feedback into the loop, you have to take that piece into account. Customers are going to favor whatever it is that benefit them, that isn’t the same as whatever benefits you.

Tags:

Published at

Originally posted at

Comments (2)

Is OR/M an anti pattern?

This article thinks so, and I was asked to comment on that. I have to say that I agree with a lot in this article. It starts by laying out what an anti pattern is:

  1. It initially appears to be beneficial, but in the long term has more bad consequences than good ones
  2. An alternative solution exists that is proven and repeatable

And then goes on to list some of the problems with OR/M:

  • Inadequate abstraction - The most obvious problem with ORM as an abstraction is that it does not adequately abstract away the implementation details. The documentation of all the major ORM libraries is rife with references to SQL concepts.
  • Incorrect abstraction – …if your data is not relational, then you are adding a huge and unnecessary overhead by using SQL in the first place and then compounding the problem by adding a further abstraction layer on top of that.
    On the the other hand, if your data is relational, then your object mapping will eventually break down. SQL is about relational algebra: the output of SQL is not an object but an answer to a question.
  • Death by a thousand queries – …when you are fetching a thousand records at a time, fetching 30 columns when you only need 3 becomes a pernicious source of inefficiency. Many ORM layers are also notably bad at deducing joins, and will fall back to dozens of individual queries for related objects.

If the article was about pointing out the problems in OR/M I would have no issues in endorsing it unreservedly. Many of the problems it points out are real. They can be mitigated quite nicely by someone who knows what they are doing, but that is beside the point.

I think that I am in a pretty unique position to answer this question. I have over 7 years of being heavily involved in the NHibernate project, and I have been living & breathing OR/M for all of that time. I have also created RavenDB, a NoSQL database, that gives me a good perspective about what it means to work with a non relational store.

And like most criticisms of OR/M that I have heard over the years, this article does only half the job. It tells you what is good & bad (most bad) in OR/M, but it fails to point out something quite important.

To misquote Churchill, Object Relational Mapping is the worst form of accessing a relational database, except all of the other options when used for OLTP.

When I see people railing against the problems in OR/M, they usually point out quite correctly problems that are truly painful. But they never seem to remember all of the other problems that OR/M usually shields you from.

One alternative is to move away from Relational Databases. RavenDB and the RavenDB Client API has been specifically designed by us to overcome a lot of the limitations and pitfalls inherit to OR/M. We have been able to take advantage of all of our experience in the area and create what I consider to be a truly awesome experience.

But if you can’t move away from Relational Databases, what are the alternative? Ad hoc SQL or Stored Procedures? You want to call that better?

A better alternative might be something like Massive, which is a very thin layer over SQL. But that suffers from a whole host of other issues (no unit of work means aliasing issues, no support for eager load means better chance for SELECT N+1, no easy way to handle migrations, etc). There is a reason why OR/M have reached where they have. There are a lot of design decisions that simply cannot be made any other way without unacceptable tradeoffs.

From my perspective, that means that if you are using Relational Databases for OLTP, you are most likely best served with an OR/M. Now, if you want to move away from Relational Databases for OLTP, I would be quite happy to agree with you that this is the right move to make.

Looking for a XAML expert

We are working on the new version of RavenDB Studio, and it has became clear very quickly that while we might be good in producing software, we are most certainly not good at making it look good.

Therefor, I would like to get some help from someone who can actually take an ugly duckling and make it into a beautiful swan.

If you are interested, I would be very happy if you can contact me.

Tags:

Published at

Originally posted at

Comments (10)

Scaling with RavenDB video

Something that we have started to recently do is just to record some of our customer interactions*, and then post that to our You Tube account.

The following is a discussion with Nick VanMatre, Solutions Architect at Archstone, about how to scale their RavenDB usage. I think you’ll find it interested.

* Nit picker corner: Obviously, with their permission.

Tags:

Published at

Originally posted at

Comments (7)

The tax calculation challenge

People seems to be more interested in answering the question than the code that solved it. Actually, people seemed to be more interested in outdoing one another in creating answers to that. What I found most interesting is that a large percentage of the answers (both in the blog post and in the interviews) got a lot of that wrong.

So here is the question in full. The following table is the current tax rates in Israel:

  Tax Rate
Up to 5,070 10%
5,071 up to 8,660 14%
8,661 up to 14,070 23%
14,071 up to 21,240 30%
21,241 up to 40,230 33%
Higher than 40,230 45%

Here are some example answers:

  • 5,000 –> 500
  • 5,800 –> 609.2
  • 9,000 –> 1087.8
  • 15,000 –> 2532.9
  • 50,000 –> 15,068.1

This problem is a bit tricky because the tax rate doesn’t apply to the whole sum, only to the part that is within the current rate.

Tags:

Published at

Originally posted at

Comments (79)

Negative hiring decisions, Part I

One of the things that I really hate is to be reminded anew how stupid some people are. Or maybe it is how stupid they think I am.  One of the things that we are doing during interviews is to ask candidates to do some fairly simple code tasks. Usually, I give them an hour or two to complete that (using VS and a laptop), and if they don’t complete everything, they can do that at home and send me the results.

This is a piece of code that one such candidate has sent. To be clear, this is something that the candidate has worked on at home and had as much time for as she wanted:

public int GetTaxs(int salary)
{
    double  net, tax;

    switch (salary)
    {
        case < 5070:
            tax = salary  * 0.1;
            net=  salary  - tax ;
            break;

        case < 8660:
        case > 5071:
            tax = (salary - 5071)*0.14;
            tax+= 5070 * 0.1;
            net = salary-tax;   
            break;
        case < 14070:
        case > 8661:
            tax=  (salary - 8661)*0.23;
            tax+= (8661 - 5071 )*0.14;
            tax+= 5070 *0.1;
            net=  salary - tax;
            break;
        case <21240:
        case >14071:
            tax=  (salary- 14071)*0.3;
            tax+= (14070 - 8661)*0.23;
            tax+= (8661 - 5071 )*0.14;
            tax+= 5070 *0.1;
            net= salary - tax;
            break;
        case <40230:
        case >21241:
            tax=  (salary- 21241)*0.33;
            tax+= (21240 - 14071)*0.3;
            tax+= (14070 - 8661)*0.23;
            tax+= (8661 - 5071 )*0.14;
            tax+= 5070 *0.1;
            net= salary - tax;
            break;
        case > 40230:
            tax= (salary - 40230)*0.45;
            tax+=  (40230- 21241)*0.33;
            tax+= (21240 - 14071)*0.3;
            tax+= (14070 - 8661)*0.23;
            tax+= (8661 - 5071 )*0.14;
            tax+= 5070 *0.1;
            net= salary - tax;
            break;
        default:
            break;
    }
}

Submitting code that doesn’t actually compiles is a great way to pretty much ensures that I won’t hire you.

Tags:

Published at

Originally posted at

Comments (70)

Project killing, code throwing (over the wall) and cost, oh my!

I got some requests to make RavenMQ an OSS project. And I thought that I might explain the thinking behind why I don’t want to do that.

Put simply, I have never thrown a significant amount of code over the wall for other people to deal with. Oh, I have done it with a lot of small projects ( < ~2,000 LOC ) which I assume that most people can figure out in an hour or less, but a significant, non trivial amount of software? Never done that.

It doesn’t feel right. More than that, it isn’t likely to actually work. Even mature, multiple contributors projects have a hard time to do a leader shift, if they were structured as a single person effort. To do so on a brand new codebase which no one really knows? That is a recipe for either tying me up with support or creating a bad impression if someone doesn’t get the code to work.  One of the things that I learned from many years of working with Open Source software is that the maturity level of the project counts, and that just throwing code over the wall is a pretty bad way of ensuring that a project will survive and thrive.

And then there is another issue, I don’t believe that RavenMQ is as valuable now that SignalR is out there. You can do pretty much whatever you could do with RavenMQ with SignalR, and that means that as far as everyone is concern, this is a pure win. There isn’t a need to create a separate project simply to have a separate project.

Tags:

Published at

Originally posted at

Comments (9)

How SignalR killed RavenMQ

Close to a year ago, I posted about RavenMQ for the first time, discussing the ideas behind it and what we intended to do with it. We even produced a private beta group to started testing it, but we never had enough resources to be able to make this more than an interesting project.

What I wanted is to make this a product, but that requires a lot more effort. Recently I started looking at SignalR as a way to undercut the amount of work that would be required to build RavenMQ. And the more that I looked at SignalR, the more uncomfortable I felt. SignalR seems to be a pretty good fit for the sort of things that I was thinking about for RavenMQ.

More than that, the longer that I dug into it, the more I liked it. The problem is that I feel that with SignalR in existence, the main reason for RavenMQ is now no longer there. Oh sure, we can build it, we can even base it on SignalR for wire compatibility, but I don’t feel that we can really generate a good enough value to create a viable product. At least, not one that I would feel comfortable charging for.

Given that, I am going to discontinue development on RavenMQ. If you were part of the RavenMQ beta group, you are strongly encouraged to look at SignalR and check whatever you can use that instead.

Tags:

Published at

Originally posted at

Comments (15)

Reviewing SignalR–Part II

After reading this post from Scott Hanselman, I decided that I really want to take a look at SignalR. This is part two of my review.

I cloned the repository and started the build, which run cleanly, so the next step was to take a look at the code. I didn’t notice any tests in the build, and I am really interested in knowing how SignalR is tested. Testing websites is notoriously hard, if only because you need to spin up IIS/WebDev to do so.

image

No tests… as I said, I can understand why, but it is worrying, because it means that if I wanted to use SignalR, I would have to come up with my own testing strategy, I hoped that there would be something there out of the box.

With RavenDB, we put special attention to making it as easy as possible to run in a unit test context:

using(var documentStore = new EmbeddableDocumentStore { RunInMemory = true }.Initialize())
{
    // run tests here
}

For the sake of doing something different, I decided to start looking at things by reading the JS scripts first. It would be interesting to look at things from that angle first.

The first interesting thing that I run into was this:

image

That is some funky syntax. I actually had to go to the docs to figure it out, and I got side tracked with the comma operator, but it seems like a typical three variable initialization, although I’ll admit that the initialize just showing up there screwed me up for a while. The rest of the code in the jquery.signalR.js is pretty straightforward, using web sockets or long polling, let us move on to hubs.js.

The next step for me was to figure out that I wanted to debug through this, so I got the SignalR-Chat sample app and tried it. It worked, perfectly, I just couldn’t figure out why it was working.

My main frustration was actually with this piece of code:

image

SignalR doesn’t have any chat property there, and I don’t think that JS allows you to define things on the fly. I was expecting this to fail, but it worked! That was quite annoying, so I set out to figure what was going on.

Looking at the network, it was making a call to http://chatapp.apphb.com/signalr/hubs, and that has this:

image

Where did this come from?! And how come that this thing got a response?

That really annoyed me, I could see no writing whatsoever for /signalr/hubs in the Chat application. I check the web.config, I check for .ashx files, I checked the global.asax, nothing.

Finally I went back to the SignalR code and figured out that it was using a PreApplicationStart hook to inject an http module, which would hook this up for you. Now I had a better understanding of what is going on, but I also needed to figure out where the chat references came from.

That is where the high level API comes from. SignalR Hub is going to process itself and generate the appropriate proxy on the client. Thinking about this, it is really nice, it was just surprising to to get there, and getting lost in web.config along the way didn’t help my state of mind.

Okay, now that I understand how the client side works, let us go and actually look at what is going on over the network. I am testing this using Chrome 13.0.782.220 against the Chat sample.

The first interesting thing is:

POST http://chatapp.apphb.com/signalr/negotiate HTTP/1.1
Host: chatapp.apphb.com
Connection: keep-alive
Referer: http://chatapp.apphb.com/
Content-Length: 0
Origin: http://chatapp.apphb.com
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.220 Safari/535.1
Accept: */*
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Cookie: userid=26c2b4a1-f5d1-439c-8ad5-f598b7bd0644

I marked the pieces of interest. The cookie indicate that we can identify a user for a long period, which is good.

HTTP/1.1 200 OK
Server: nginx/1.0.2
Date: Tue, 06 Sep 2011 07:15:49 GMT
Content-Type: application/json; charset=utf-8
Connection: keep-alive
Cache-Control: private
Content-Length: 68

{"Url":"/signalr","ClientId":"53d681cf-08d9-4afe-8357-7918f64e7a60"}

Note that the userid and client id are different. Maybe a new one is generated on every negotiation? Yep, that seems to be the case. I can see why, and why you would also want to have it persisted. Not really interesting now.

I dug into the the actual implementation, and it is using long polling. In order to do that, SignalR uses an interesting delay messages system that I find extremely interesting, mostly because it is very similar to some of the core concepts of RavenMQ. At any rate, the code seems pretty solid, and I have gone through most of it.

It is still somewhat in a state of flux, with things like API naming and conventions needing to be settled done, but so far, I think that I like what I am seeing.

One thing that bugs me about it is that I think that there is actually two different things happening at the same time. There is SignalR, which is all about persistent connections, and then there is the Hubs, which is a much higher level API. I would like to SignalR.Hubs as a separate assembly, if only to make sure that the core concepts can be used on their own, without the Hubs.

Tags:

Published at

Originally posted at

Comments (18)

Reviewing SignalR–Part I

After reading this post from Scott Hanselman, I decided that I really want to take a look at SignalR.

There aren’t any docs beyond a few code snippets, so I created a new project and downloaded the nuget version, and started exploring the API in Object Browser. My intent was to write a small sample application to see how it works. But I run into a road block very rapidly:

image

Huh?!

Why on earth would it return dynamic for the list of clients. That seems highly suspicious design decision, and then I looked at this:

image

Hm… combine that with code snippets like this (from Scott’s post):

image

If it was someone else, I would put that receive call down to someone who doesn’t know/care about the design guidelines and just didn’t name the method right. But I assume that Scott does, so it is probably something else.

What seems to be happening here is that the Hub is indeed a very high level API, it allows you to directly invoke the JS API on the client. This is still unconfirmed as of yet, but it seems reasonable from the code I have so far, I intend to go deeper into the SignalR codebase and look for further details, look for a few posts on the topic.

Tags:

Published at

Originally posted at

Comments (2)

Pet Projects and Hiring Decisions

Wow! The last time I had a post that roused that sort of reaction, I was talking about politics and religion. Let us see if I can clarify my thinking and also answer many of the people who commented on the previous post.

One of the core values that I look for when getting new people is their passion for the profession. Put simply, I think about what I do as a hobby that I actually get paid for, and I am looking for people who have the same mentality as I do. There are many ways to actually check for that. We recently made an offer to a candidate simply because her eyes lighted up when she spoke about how she build games to teach kids how to learn Math.

There were a lot of fairly common issues with my approach.

Indignation – how dare you expect me to put my own personal time into something that looks like work. That also came with a lot of explanation about family, kids and references to me being a slave driver bastard.

Put simply, if you can’t be bothered to improve your own skills, I don’t want you.

Hibernating Rhinos absolutely believes that you need to invest in your developers, and I strongly believe that we are doing that. It starts from basic policies like if you want a tech book, we’ll buy it for you. Sending developers to courses and conferences, sitting down with people and making sure that they are thinking about their career properly. And a whole lot of other things aside.

Personally, my goal is to keep everyone who works for us right now for at least 5 years, and hopefully longer. And I think that the best way of doing that is to ensure that developers are appreciated, they have a place to grow in professionally and are having to deal with fun, complex and interesting problems for significant parts of their time at work. This does not remove any obligations on your part to maintain your own skills.

imageIn the same sense that I would expect a football player to be playing on a regular basis even in the off session, in the same sense that I wouldn’t visit a doctor who doesn’t spend time getting updated on what is changing in his field, in the same sense that I wouldn’t trust a book critic that doesn’t read for fun – I don’t believe that you can abjurate your own responsibility to keeping yourself aware of what is actually going on out there.

And I am sorry, I don’t care if you read a blog or two or ten. If you want to actually learn new stuff in development, you actually have to sit down and write some code. Anything that isn’t code isn’t really meaningful in our profession. And it is far too easy to talk the talk without being able to walk the walk. My company have absolutely no intention of doing anything with Node.js in the future, I still wrote some code in Node.js, just to be able to see how it feels like to actually do that. I still spend time writing code that is never going to reach production or be in any way useful, just for me to see if I can do something.

If you are a developer, your output is code, and that is what prospective employees will look for. From my perspective, it is like hiring a photographer without looking at any of their pictures. Like getting a cook without tasting anything that he made.

And yes, it is your professional responsibility to make sure that you are hirable. That means that you keep your skills up to date and that you have something to show to someone that will give them some idea about what you are doing.

Time – I can’t find none.

There are 168 hours in a week, if you can put 4 – 6 hours a week to hone your own skills, to try things, to just explore… well, that probably indicate something about your priorities. I would like to hire people who think about what they do as a hobby. I usually work 8 – 10 hours days, 5 – 6 days a week. I am married, we’ve got two dogs, I usually read / watch TV for at least 1 – 3 hours a day.

I have been at the Work Work Work Work All The Time Work Work Work And Some More Work parade. I got the T Shirt, I got the scary Burn Out Episode. I fully understand that working all the time is a Bad Idea. It is Bad Idea for you, it is Bad Idea for the company. This isn’t what I am talking about here.

Think about the children – I have kids, I can’t spend any time out of work doing this.

That one showed up a lot, actually. I am thinking about the children. I think it is absolutely irresponsible for someone with kids not to make damn sure that he is hirable. I am not talking about spending 8 hours at the office, 8 hours doing pet projects and 1.5 minutes with your children (while you got some code compiling). And updating your skills and maintaining a portfolio of projects is something that I think is certainly part of that.

I read professionally, but I don’t code  - this is a variation on all of the other excuses, basically. Here is a direct quote: “I often find that well written blog entry/article will provide more education that can be picked up in a few minutes reading than several hours coding. And I can do that in my lunch break.”

That is nice, I also like to read a lot of Science Fiction, but I am a terrible writer. If you don’t actually practice, you aren’t any good. Sure, reading will teach you the high level concepts, but it doesn’t teach you how to apply them. You can read about WCF all day long, but it doesn’t teach you how to handle binding errors. Actually doing things will teach you that. You need actual practice to become good at something. In theory, there is no difference between reality and theory, and all of that.

I legally can’t - You signed a contract that said that you can’t do any pet projects, or that all of your work is owned by the company.

I sure do hope that you are well compensated for that, because it is going to make it harder for you to get hired.

You have a life – therefor you can’t spend time on pet projects.

So do I, I managed. If you can’t, I probably don’t want you.

Wrong this to do - I shouldn’t want someone who is on Stack Overflow all the time, or will spend work time on pet projects.

This is usually from someone who think that the only thing that I care about is lines of code committed. If someone is on Stack Overflow a lot, or reading a lot of blogs, or writing a lot of blogs. That is awesome, as long as they manages to complete their tasks I a reasonable amount of time. I routinely write blog posts during work. It helps me think, it clarify my thinking, and it usually gets me a lot of feedback on my ideas. That is a net benefit for me and for the company.

Some people hit a problem and may spin on that for hours, and VS will be the active window for all of that time. That isn’t a good thing!  Others will go and answer totally unrelated questions on Stack Overflow while they are thinking on the problem, they come back to VS and resolve the problem in 2 minutes. As long as they manage to do the work, I don’t really care. In fact, having them in Stack Overflow probably means that answers about our products will be answered faster.

As for working on their own personal projects during work. The only thing that you need to do is somehow tie it to actual work. For example, that pet project may be converted to be a sample application for our products. Or it can be a library that we will use, or any of a number of options that you can use to make sure things interesting.

You should note as well that I am speaking here from our requirements from the candidate, not from what I consider to be our responsibilities toward our employees, I’ll talk about those in another post in more detail.

Then there was this guy, who actively offended me.

The author is a selfish ego maniac who only cares about himself. As an employer, you can choose to consume a resource (employee), get all you can out of it, then discard it. Doing so adds to the evil and bad in the world.

This is spoken like someone who never actually had to recruit someone, or actually pay to recruit someone. It costs a freaking boatload of money and take a freaking huge amount of time to actually find someone that you want to hire. Treating employees as disposable resources is about as stupid as you can get, because we aren’t talking about getting someone that can flip burgers at minimum wage here.

We are talking about 3 – 6 months training period just to get to the point where you can get good results out of a new guy. We are talking about 1 – 3 months of actually looking for the right person before that. I consider employees to be a valuable resource, something that I actively need to encourage, protect and grow. Absolutely the last thing that I want is to try to have a chain of disposable same-as-the-last-one developers in my company.

I have kicked people out of the office with instructions to go home and rest, because I would like to have them available tomorrow and the next day and month and year. Doing anything else is short sighted, morally repugnant and stupid.

What is the cost of try/catch

I recently got a question about the cost of try/catch, and whatever it was prohibitive enough to make you want to avoid using it.

That caused some head scratching on my part, until I got the following reply:

But, I’m still confused about the try/catch block not generating an overhead on the server.

Are you sure about it?

I learned that the try block pre-executes the code, and that’s why it causes a processing overhead.

Take a look here: http://msdn.microsoft.com/en-us/library/ms973839.aspx#dotnetperftips_topic2

Maybe there is something that I don’t know? It is always possible, so I went and checked and found this piece:

Finding and designing away exception-heavy code can result in a decent perf win. Bear in mind that this has nothing to do with try/catch blocks: you only incur the cost when the actual exception is thrown. You can use as many try/catch blocks as you want. Using exceptions gratuitously is where you lose performance. For example, you should stay away from things like using exceptions for control flow.

Note that the emphasis is in the original. There is no cost to try/catch the only cost is when an exception is thrown, and that is regardless of whatever there is a try/catch around it or not.

Here is the proof:

var startNew = Stopwatch.StartNew();
var mightBePi = Enumerable.Range(0, 100000000).Aggregate(0d, (tot, next) => tot + Math.Pow(-1d, next)/(2*next + 1)*4);
Console.WriteLine(startNew.ElapsedMilliseconds);

Which results in: 6015 ms of execution.

Wrapping the code in a try/catch resulted in:

var startNew = Stopwatch.StartNew();
double mightBePi = Double.NaN;
try
{
    mightBePi = Enumerable.Range(0, 100000000).Aggregate(0d, (tot, next) => tot + Math.Pow(-1d, next)/(2*next + 1)*4);
}
catch (Exception e)
{
    Console.WriteLine(e);
}
Console.WriteLine(startNew.ElapsedMilliseconds);

And that run in 5999 ms.

Please note that the perf difference is pretty much meaningless (only 0.26% difference) and is well within the range of derivations for tests runs.

Tags:

Published at

Originally posted at

Comments (24)

If you don’t have pet projects, I don’t think I want you

I am busy hiring people now, and it got me thinking a lot about the sort of things that I want from my developers. In particular, I was inundated in CVs, and I used the following standard reply to help me narrow things down.

Thank you for your CV. Do you have any projects that you wrote that I can review? Have you done any OSS work that I can look at?

The replies are fairly interesting. In particular, I had a somewhat unpleasant exchange with one respondent. In reply for my question, the reply was:

My employer doesn’t allow any sharing of code. I can find some old projects that I did a while ago and send them to you, I guess.

Obviously, I don’t want to read any code that belong to someone without that someone’s explicit authorization. Someone sending me their current company code is about as bad manner as someone setting up an invite for a job interview on their work calendar (the later actually happened today).

After getting the projects and looking them over a bit, I replied that I don’t think this would be the appropriate position for this respondent. I got the following reply:

Wait a minute…

Can I know why? I took the trouble to send you stuff that I have done, maybe not the highest quality and caliber, but what I could send right now. You didn’t even interview me.

How exactly did you reach the unequivocal conclusion that I am not a good fit for this job?

My response to that was:

Put simply, we are looking for a .NET developer and one of the most important things that we look for is passion. In general, we have found that people that care and are interested in what they are doing tend to do other stuff rather than just their work assignments.

In other words, they have their own pet projects, it can be a personal site, a project for a friend, or just some code written to get familiar with some technology.

When you tell me that your only projects outside of work are 5+ years old, that is a bad indication for us.

There is more, but it gets to the details and not really relevant for this discussion.

Let me try to preempt the nitpickers. Not having pet projects doesn’t mean that you are a bad developer, nor vice versa.

But I don’t really care about experience, and assuming that you already know the syntax and has some basic knowledge in the framework, we can use you. But the one thing that I learned you can’t give people is the passion for the field. And that is critical. Not only because it means that they are either already good or going to be good (it is pretty hard to be passionate about something that you sucks at), but because it means that they care.

And if they care, it means two very important things:

  • The culture of the company is about caring for doing the appropriate thing.
  • The end result is going to be as awesome as we can get.

Now, if you’ll excuse me, I am going to check out SignalR, because I don’t feel like doing any more RavenDB work today.

Tags:

Published at

Originally posted at

Comments (170)

RavenDB: Multi Maps / Reduce indexes

badge1If you thought that map/reduce was complex, wait until we introduce the newest feature in RavenDB:

Multi Maps / Reduce Indexes

Okay, to be frank, they aren’t complex at all, they are actually quite simple, when you sit down to think about them. Again, I have to credit to Frank Schwieterman, who came up with the idea.

Wait! Let us back track a bit and try to explain what the actual problem is that we are trying to solve. The problem with Map/Reduce is that you can only gather information from a single set of documents. Let us look at the following documents as an example:

{// users/ayende 
   "Name": "Ayende Rahien" 
} 

{ // posts/1234 
  "Title": "Why RavenDB?", 
  "Author": "users/ayende" 
} 
{ // posts/1235 
  "Title": "It is awesome!", 
  "Author": "users/ayende" 
} 

We want to get an list of users with the count of posts that they made. That is trivially easy, as shown in the following map/reduce index:

from post in docs.Posts
select new { post.Author, Count = 1 }

from result in results
group result by result.Author into g
select new
{
   Author = g.Key,
   Count = g.Sum(x=>x.Count)
}

The output of this index would be something like this:

{ Author: "users/ayende", Count: 2 }

And we can load it efficiently using Includes:

session.Query<AuthorPostStats>("Posts/ByUser/Count")
     .Include(x=>x.Author)
     .ToList();

This will load all the users statistics, and also load all of the associated users, in a single query to the database. So far, fairly simple and routine.

badge5The problem begins when we want to be able to query this index using the user’s name. As you can deduct from the documents shown previously, the user name isn’t available on the post document, which means that we can’t index it. That, in turn, means that we can’t search it.

We are left with several bad options:

  • De-normalize the User’s Name property to the Post document, solely for indexing purposes.
  • Don’t implement this feature.
  • Write the following scary query:
from doc in docs.WhereEntityIs("Users","Posts") 
let user = doc.IfEntityIs("Users") 
let post = doc.IfEntityIs("Posts") 
select new 
{ 
  Count = user == null ? 1 : 0, 
  Author = user.Name, 
  UserId = user.Id ?? post.Author 
} 

from result in results 
group result by result.UserId into g 
select new 
{ 
   Count = g.Sum(x=>x.Count), 
   Author = g.FirstNotNull(x=>x.Author), 
   UserId = g.Key 
} 

This is actually pretty straightforward, when you sit down and think about it. But there is a whole lot of ceremony involved, and it is actually more than a bit hard to figure out what is going on in more complex scenarios.

This is where Frank’s suggestion came in:

…if I were try to support linq-based indexes that can map multiple types, it might look like:

public class OverallOpinion : AbstractIndexCreationTask<?>
{
   public OverallOpinion()
   {
       Map<Foo>(docs => from doc in docs select new { Id = doc.Id, LastUpdated = doc.LastUpdated }
       Map<OpinionOfFoo>(docs => from doc in docs select new { Id = Doc.DocId, Rating = doc.Rating, Count = 1}
       Reduce = docs => from doc in docs
                        group doc by doc.Id into g
                        select new {
                           Id = g.Key,
                           LastUpdated = g.Values.Where(f => f.LastUpdated != null).FirstOrDefault(),
                           Rating = g.Values.Rating.Sum(),
                           Count = g.Values.Count.Sum()
                        }
   }
}

It seems like some clever code could combine the different map expressions into one.

badge7This is part of a longer discussion, but basically, it got me thinking about how we can implement multi maps, and I came up with the following:

// Map from posts
from post in docs.Posts
select new { UserId = post.Author, Author = (string)null, Count = 1 }

// Map from users
from user in docs.Users
select new { UserId = user.Id, Author = user.Name, Count = 0 }

// Reduce takes results from both maps
from result in results
group result by result.UserId into g
select new
{
   Count = g.Sum(x=>x.Count),
   Author = g.Where(x=>x!=null).First(),
   UserId = g.Key
}

The only thing to understand now is that we have multiple map functions, getting data from multiple sources. We can then take those sources and reduce them together. The only requirements that we have is that the output of all of the map functions would be identical (and obviously, match the output of the reduce function). Then we can just treat this information as normal map/reduce index, which means that all of the usual RavenDB features kick in. Let us see what this actually means, using code. We have the following classes:

public class User
{
    public string Id { get; set; }
    public string Name { get; set; }
}

public class Post
{
    public string Id { get; set; }
    public string Title { get; set; }
    public string AuthorId { get; set; }
}

public class UserPostingStats
{
    public string UserName { get; set; }
    public string UserId { get; set; }
    public int PostCount { get; set; }
}

And we have the following index:

public class PostCountsByUser_WithName : AbstractMultiMapIndexCreationTask<UserPostingStats>
{
    public PostCountsByUser_WithName()
    {
        AddMap<User>(users => from user in users
                              select new
                              {
                                  UserId = user.Id,
                                  UserName = user.Name,
                                  PostCount = 0
                              });

        AddMap<Post>(posts => from post in posts
                              select new
                              {
                                  UserId = post.AuthorId,
                                  UserName = (string)null,
                                  PostCount = 1
                              });

        Reduce = results => from result in results
                            group result by result.UserId
                            into g
                            select new
                            {
                                UserId = g.Key,
                                UserName = g.Select(x => x.UserName).Where(x => x != null).First(),
                                PostCount = g.Sum(x => x.PostCount)
                            };

        Index(x=>x.UserName, FieldIndexing.Analyzed);
    }
}

badge8As you can see, we are getting the values from two different collections. We need to make sure that they are actually using the same output, which is what caused us the null casting for posts and the filtering that we need to do on the reduce.

But that is it! It is ridiculously easy compared to the previous alternative. Moreover, it follows quite naturally from both the exposed API and the internal implementation inside RavenDB. It took me roughly half a day to make it work, and some of that was dedicated to lunch Smile. In truth, most of that time is actually just handling the error conditions nicely, but… anyway, you get the point.

Even more interesting than the rest is the fact that for all intents and purposes, what we have done here is a join between two different collections. We were never able to really resolve the problems associated with joins before, update notifications were always too complex to figure out, but going the route of multi map makes things so easy.

Just for fun, you might have noticed that we marked the UserName property as analyzed, which means that we can issue full text queries against it. Let us assume that we want to provide users with the following UI:

image

It is now just a matter of writing the following code:

using (var session = store.OpenSession())
{
    var ups= session.Query<UserPostingStats, PostCountsByUser_WithName>()
        .Where(x => x.UserName.StartsWith("rah"))
        .ToList();

    Assert.Equal(1, ups.Count);

    Assert.Equal(5, ups[0].PostCount);
    Assert.Equal("Ayende Rahien", ups[0].UserName);
}

So you can do a cheap full text search over joins quite easily. For that matter, joins are cheap now, because they are computed on the background and queried directly from the pre-computed index.

Okay, enough blogging for now, going to implement all the proper error handling and then push an awesome new build.

Oh, and a final thought, Multi Map was shown in this blog only in the context of Multi Maps/Reduce, but we also support just the ability to use multi map on its own. This is quite useful if you want to enable search over a large number of entities that reside in different collections. I’ll just drop a bit of code here to show how it works:

public class CatsAndDogs : AbstractMultiMapIndexCreationTask
{
    public CatsAndDogs()
    {
        AddMap<Cat>(cats => from cat in cats
                         select new {cat.Name});

        AddMap<Dog>(dogs => from dog in dogs
                         select new { dog.Name });
    }
}

[Fact]
public void CanQueryUsingMutliMap()
{
    using (var store = NewDocumentStore())
    {
        new CatsAndDogs().Execute(store);

        using(var documentSession = store.OpenSession())
        {
            documentSession.Store(new Cat{Name = "Tom"});
            documentSession.Store(new Dog{Name = "Oscar"});
            documentSession.SaveChanges();
        }

        using(var session = store.OpenSession())
        {
            var haveNames = session.Query<IHaveName, CatsAndDogs>()
                .Customize(x => x.WaitForNonStaleResults(TimeSpan.FromMinutes(5)))
                .OrderBy(x => x.Name)
                .ToList();

            Assert.Equal(2, haveNames.Count);
            Assert.IsType<Dog>(haveNames[0]);
            Assert.IsType<Cat>(haveNames[1]);
        }
    }
}

All together, a great day’s work.

Tags:

Published at

Originally posted at

Comments (14)

First RavenDB webinar is now on YouTube

We now have the first webinar posted on YouTube. All details here:

http://blog.hibernatingrhinos.com/8193/ravendb-webinar-1-now-available-on-youtube

Unfortunately YouTube doesn't allow us to upload videos longer than 15 minutes, so we have to split all webinars and talks before uploading them. That is annoying for you as our users, and takes us a lot of time in pre-processing. You can help us both by participating - commenting and rating videos. The more positive action we have, the faster we can get that limitation removed. We will appreciate your help with this.

See you in future webinars.

Tags:

Published at

Originally posted at

Comments (4)

A surprise TaskCancelledException

All of a sudden, my code started getting a lot of TaskCancelledException. It took me a while to figure out what was going on. We can imagine that the code looked like this:

var unwrap = Task.Factory.StartNew(() =>
{
    if (DateTime.Now.Month % 2 != 0)
        return null;

    return Task.Factory.StartNew(() => Console.WriteLine("Test"));
}).Unwrap();

unwrap.Wait();

The key here is that when Unwrap is getting a null task, it will throw a TaskCancelledException, which was utterly confusing to me. It make sense, because if the task is null there isn’t anything that the Unwrap method can do about it. Although I do wish it would throw something like ArgumentNullException with a better error message.

The correct way to write this code is to have:

var unwrap = Task.Factory.StartNew(() =>
{
    if (DateTime.Now.Month % 2 != 0)
    {
        var taskCompletionSource = new TaskCompletionSource<object>();
        taskCompletionSource.SetResult(null);
        return taskCompletionSource.Task;
    }

    return Task.Factory.StartNew(() => Console.WriteLine("Test"));
}).Unwrap();

unwrap.Wait();

Although I do wish that there was an easier way to create a completed task.

Tags:

Published at

Originally posted at

Comments (8)

RavenDB Webinar #2– Q & A Free For All

Wow! The RavenDB Webinar has been a great success, and a wonderful trial run. I know that a lot of people haven’t been able to get in, and I apologize for that, we absolutely did not expect to have so many people in.

The session was recorded, and we will upload it soon so everyone can watch it.

As part of experimenting with the format, and since we want to give anyone another chance, we will do another Webinar tomorrow, you can register here: https://www2.gotomeeting.com/register/398501658

Unlike the one we just had, we will have this one as a Q & A where we open the phone lines and start chatting with you about RavenDB, demoing things on the fly.

Should be fun…

Changes to Advanced NHibernate Course - Warsaw, October 2011

The Advanced NHibernate Warsaw Course drew a lot of interest when I announced it, but a lot of people seemed to be thrown off by the price.

Therefore, I decided to do several additional things to make it easier for you to decide to go to the course:

  • I extended the early bird period until the 15th Sep (which represents a 20% discount).
  • All attendees to the course will receive a free license to NH Prof.
    • All companies sending attendees will receive a 20% discount for future purchases of NH Prof.
  • I intent to extend the course with a “support hour”, so we can not only go over the actual course material, but go over your actual problems and resolve them during the course.
    • If necessary, I’ll do follow up on those issues after the course as well.

I hope that with those changes, it will make it easier for your to go to the course or to convince your company to go.

Tags:

Published at

Twitter doesn’t like my identity

Recently I noticed that a very important feature in twitter website is broken for me. I can’t follow the entire conversation any longer, which drive me crazy.

image

It didn’t take me long to figure out what the actual issue is:

image

It appears that even though I am logged in, some parts of twitter doesn’t like this.

I then thought that this has something to do with chrome, so I used the incognito mode to try it:

image

It works!

I logged out and in again, nada. I cleared all browser history and restarted chrome, nada.

I uninstalled chrome (selecting delete all browsing history) and re-installed, and now it is working.

No idea why, and I don’t like the heavy handed approach, but at least now I got my twitter tracking back.

Tags:

Published at

Originally posted at

Comments (4)

RavenDB Webinar

I decided to see if a RavenDB webinar would be a useful thing to have. Ideally, we want to have something that is regularly scheduled.

For now, we are testing things, and I would like to invite you to our first webinar about RavenDB, which is scheduled for tomorrow.

You can register in the following link: https://www2.gotomeeting.com/register/992039594

Tags:

Published at

Originally posted at

Comments (9)

Document based modeling: Auctions

We recently had a client with a similar model, and I thought that this would be a good topic for a series of blog posts. For the purpose of this blog posts series, I am going to be using EBay as the example. Just to be clear, this is merely because it is a well known site that most people are likely to be familiar with.

In our model, we have the following Concepts:

  • Categories
  • Products
  • Auctions
  • Bids
  • Sellers
  • Buyers
  • Orders

We will start from what is likely the most confusing aspect. Products and Auctions.

A Product is something that can be sold. It can be either a unique item “a mint condition Spiderman comics” or it can be something that we have lots of “a flying monkey doll”. There is a big distinction between the two, because of the way the business rules are structured.

Once an Auction has been started, it cannot be changed. This include everything about this auction, pricing, shipping information, product information, etc. But a Product may change at any time (maybe I now have the flying monkey in red and green, instead of the original red and yellow). That leads us to conclude that we actually have two instances of the Product in our domain. We have the Product Template (mutable, can change at any time) and with have the Auctioned Product (immutable).

That realization leads me to the following model for products and auctions:

{
   "Name":"Flying Monkey Doll",
   "Colors":[
      "Red & Yellow",
      "Blue & Green"
   ],
   "Price":29,
   "Weight":0.23
}
{
   "Quantity":15,
   "Product":{
      "Name":"Flying Monkey Doll",
      "Colors":[
         "Blue & Green"
      ],
      "Price":29,
      "Weight":0.23
   },
   "StartsAt":"2011-09-01",
   "EndsAt":"2011-09-15"
}
products/1 auctions/1

As you can see, the Auction is going to wholly own the product. Any change made to the product will not be reflected in the Auctioned Product. This has the advantage that we need only a single document load in order to show the auction page.

Another option would be to use the versioning bundle to do that, so we would have this:

{
   "Name":"Flying Monkey Doll",
   "Colors":[
      "Red & Yellow",
      "Blue & Green"
   ],
   "Price":29,
   "Weight":0.23
}
{
   "Name":"Flying Monkey Doll",
   "Colors":[
       "Blue & Green"
   ],
   "Price":29,
   "Weight":0.23
}
{
   "Quantity":15,
   "Product": "products/1/versions/1",
   "StartsAt":"2011-09-01",
   "EndsAt":"2011-09-15"
}
products/1 products/1/versions/1 auctions/1

The versioning bundle ensures that we can get immutable views of our documents, so we can safely reference the product by its id and version.

That is it for now, on the next post, we will deal with how to work with bids.

Tags:

Published at

Originally posted at

Comments (5)