Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 6,297 | Comments: 46,812

filter by tags archive

The “average” developer field of interest

time to read 3 min | 445 words

I recently got a comment that included this:

…this "Making code faster" series is pretty useless for the average developer working on the usual application.

And I couldn’t disagree more.

Now, to be fair, the kind of challenges that we have to deal with while building high performance database engine are quite different than the kind of challenges that a typical enterprise developer has to deal with. That isn’t quite true, we have the studio, which behaves very much like an application, but you’ll rarely see me talking about the JavaScript aspects of building the RavenDB Studio. I’ll just say that from my perspective, this post summarize my feelings about modern JavaScript dev.

But back to the topic, the average developer is a mythical beast., who apparently have very little time to look around from coding yet another login page that have to be delivered now. I have had several such discussions about this in the past. And I think that this post summarize the opposing view, pretty much saying that is offesnive to expect someone to have the time to improve themselves.

My thinking is that if you value your career, you need to contiously put it in effort to actually improve and extend themselves, period. And that isn't to say that this is easy.

Here is the deal, if you are only interested in what you can bring you immediate value (the hottest JS libraries, or some design pattern that you need to use tomorrow), you are doing yourself a disservice. In order to be good, you need to continuously invest in learning new stuff. And you need to do it in such as way that you aren’t continuously learning the same stuff over and over again (no, learning WebForms, MVC 1, MVC 2 … MVC 5, MVC Core doesn’t count).

Quite a bit of this isn’t really going to be useful in the near future, but expanding your knowledge base is going to be useful in the long term. You are going to run into things and go “Ah! I know that already”, or be able to provide much better solutions then the stuff that have already been tried.

Yes, that actually take both work and effort. You need to make time to do so, and when you have family and kids that isn’t easy. But it is worth it.

And just because I know people are going to read it as such, that does not mean that you've got to abandon the kids to raise themselves while you are hacking away at your latest interest. For most people, putting in two to four hours a week is possible. Feel free to cut down the time you are browsing Facebook, for example.

Come to our booth on DotNext Moscow

time to read 1 min | 150 words

imageThis Friday, our team is going to be in DotNext Moscow, showing off RavenDB 4.0 and raffling off some really cool prizes.

You can also come and learn optimization techniques that allowed us to get more than 100,000 req/sec with RavenDB 4.0.

It is going to be a lot of fun, and we are expecting some really interesting discussions on the way we are building RavenDB 4.0, so we sent three of our core developers to give you all the details about it.

This is going to be the very first time that we are going to be showing off what RavenDB 4.0 can do Smile.

Code reviewThe bounded queue

time to read 1 min | 64 words

The following code has just been written (never run, never tested).

It’s purpose is to serve as a high speed, no locking transport between two threads, once of them producing information, the other consuming it, in a bounded, non blocking manner.

Note that this is done because the default usage of BlockingCollection<T> here generated roughly 80% of the load, which is not ideal

reWhy you can't be a good .NET developer

time to read 3 min | 526 words

This post is in reply to Rob’s post, go ahead and read it, I’ll wait.

My answer to Rob’s post can be summarize in a single word:

In particular, this statement:

it is impossible to be a good .NET developer. To work in a development shop with a team is to continually cater for the lowest common denominator of that team and the vast majority of software shops using .NET have a whole lot of lowest common denominator to choose their bad development decisions for.

Um, nope. That only apply to places that are going for the lowest common denominator. To go from there to all .NET shops is quite misleading. I’ll give our own example, of building a high performance database in managed code, which has very little lowest common denominator anything anywhere, but that would actually be too easy.

Looking at the landscape, I can see quite a lot of people doing quite a lot of interesting things at the bleeding edge. Now, it may be that this blog is a self selecting crowd, but when you issue statements as “you can’t be a good .NET developers”, that is a pretty big statement to stand behind.

Personally, I think that I’m pretty good developer, and while I dislike the term “XYZ developer”, I do 99% of my coding in C#.

Now, some shops have different metrics, they care about predictability of results, so they will be extremely conservative in their tooling and language usage, the old “they can’t handle that, so we can’t use it” approach. This has nothing to do with the platform you are using, and all to do with the type of culture of the company you are at.

I can certainly find good reasons for that behavior, by the way, when your typical product lifespan is measured in 5 – 10 years, you have a very different mindset than if you aim at most a year or two away. Making decisions on brand new stuff is dangerous, we lost a lot when we decided to use Silverlight, for example. And the decision to go with CoreCLR for RavenDB was made with explicit back off strategy in case that was sunk too.

Looking at the kind of directions that people leave .NET for, it traditionally have been to the green green hills of Rails, then it was Node.JS, not I think it is Elixir, although I’m not really paying attention. That means that in the time a .NET developer (assuming that they investing in themselves and continuously learning) invested in their platform, learned a lot on how to make it work properly, the person who left for greener pastures has had to learn multiple new frameworks and platforms. If you think that this doesn’t have an impact on productivity, you are kidding yourself.

The reason you see backlash against certain changes (project.json coming, going and then doing disappearing acts worthy of Houdini) is that there is value in all of that experience.

Sure, sometimes change is worth it, but it needs to be measured against its costs. And sometimes there are non trivial.

Proposed solution to the low level interview question

time to read 3 min | 578 words

For the actual question, see the original post.

So the first thing that we need to decide is what will be the data format on the tire. Since we have only 32KB to work with, we need to consider the actual storage.

32KB is small enough to fit in a unsigned short, so all the references we’ll used will be shorts. We also need to store a bit of metadata, so we’ll use the first 4 bytes as the header for just that.

  • ushort SpaceUsed;
  • ushort LastAllocation;

Now that we have this, we need to decide how to store the actual data. To make things easy, we are going to define the following way to allocate memory:

This is about the simplest way that you can go about doing things, note that we use a length prefix value, and we limit allocations to a max of 127 bytes each. We use a negative size to indicate a delete marker.

So basically, now we have a pretty trivial way to allocate memory, and we can implement the trie as we would normally do. There are a few wrinkles, however.

Deleting the memory doesn’t actually make it eligible for reuse, and it is quite likely to get fragmented easily. In order to handle that, we will track the amount of space that is used, and if we got to the end of the space, we’ll check the UsedSpace value. If this is still too little, we can abort, there is no available space here. However, if we go to the end of the buffer, but we have free space available, we can do the following:

  • Scan the buffer for available spots (find available locations that have negative size).
  • Failing that, we will copy the data to a temporary buffer, then re-add everything to the buffer from scratch. In other words, we defrag it.

Another issue we have is that the maximum size we can allocate is 127. This value is big enough so most actual strings can fit into it nicely, but a trie already has the property that a large string might be broken into pieces, we’ll just cut each node in the trie to a max size of 127. Actually, the max size is likely to be less than that, because there is also some information that we need to keep track per entry.

  • byte NumberOfChildren;
  • byte Flags; // node type, (internal, leaf or both)
  • ushort ChildrenPosition;

So in practice we have about 123 bytes to work with for the length. Note that we don’t store the string value of the node’s length (we can get that from the allocation information), and that we store the actual children in an array that is stored separately. This allows us to easily add items to the trie as child nodes. If the node is a leaf node, we also need to store the actual value (which is 8 bytes), we store that information at the end of the value (giving us 115 bytes for that section of the value).

All in all, there is going to be a bit of pointer arithmetic and bit counting, but is likely to be a pretty simple implementation.

Note that additional optimizations would be to try align everything so it would fit into a cache line, trying to place nodes near their children (which are more likely to be followed), etc.

FUTURE POSTS

  1. Implementing low level trie: Solving with C++ - 8 hours from now
  2. Implementing low level trie: Digging into the C++ impl - about one day from now
  3. The low level trie Rust challenge - 2 days from now
  4. Scaffolding code as sign of maturity - 5 days from now
  5. The crash at the Unicode text - 6 days from now

And 9 more posts are pending...

There are posts all the way to Feb 13, 2017

RECENT SERIES

  1. Implementing low level trie (4):
    14 Dec 2016 - Part II
  2. Answer (9):
    20 Jan 2017 - What does this code do?
  3. Challenge (48):
    19 Jan 2017 - What does this code do?
  4. The performance regression in the optimization (2):
    01 Dec 2016 - Part II
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats