Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q j

Posts: 6,669 | Comments: 48,521

filter by tags archive

Connection handling and authentication in RavenDB 4.0

time to read 3 min | 531 words

imageAn interesting question has popped up in the mailing list about the behavior of RavenDB. When will RavenDB client send the certificate to the server for authentication? SSL handshake typically takes multiple round trips to negotiate an SSL connection, and that a certificate can be a fairly large object. It makes sense that understanding this aspect of RavenDB behavior is going to be important for users.

In the mailing list, I gave the following answer:

RavenDB doesn’t send the certificate on a per request basis, instead, it send the certificate at the start of each connection.

I was asked for a follow up, because I wasn’t clear to the user. This is a problem, I was answering from my perspective, which is quite different from the way that a RavenDB user from the outside will look at things. Therefor, this post, and hopefully a more complete way of explaining how it all works.

RavenDB uses X509 Client Certificates for authentication, using SSL to both authenticate the remote client to the server (and the server to the client, using PKI) and to ensure that the communication between client and server are private. RavenDB utilizes TLS 1.2 for the actual low level wire transfer protocol. Given that .NET Core doesn’t yet implement TLS 1.3 or FastOpen, that means that we need to do the full negotiation on each connection.

Now, what exactly is a connection in this regard? It this going to be every call to OpenSession? The answer is emphatically not. RavenDB is managing a connection pool internally (actually, we are relying on the HttpClient’s pool to do that). This means that we are only ever going to have as many TCP connections to the server as you had concurrent requests. A session will effectively borrow a connection from the pool whenever it needs to talk to the server.

The connections in the pool are going to be re-used, potentially for a long time. This allow us to alleviate the cost of actually doing the TCP & SSL handshake and amortize it over many requests. This also means that the entire cost of authentication isn’t paid on a per request basis, but per connection. What actually happens is that on the beginning of the connection, the RavenDB server will validate the client certificate and remember what permissions are granted to it. Any and all requests on this connection can then just used the cached permissions for the lifetime of the connection. This stateful approach reduce the overall cost of authentication because we don’t need to run full validation on every request.

This also means that OpenSession, for example, is basically free. All it does is allocate a bunch of dictionaries and some other data structures for the session. There is no wire traffic because the session is created, only when you actually make a request to the server (Load, Query, SaveChanges, etc). Most of the time, we don’t need to create a new connection for that, but can use a pre-existing one from the pool. The entire system was explicitly designed to take advantage of best practices to optimize your overall performance.

Spanification in RavenDB

time to read 2 min | 333 words

imageWe are nearly done with RavenDB 4.1. There are currently a few minor stuff that we are still handling, but we are gearing up to push this to our production systems as part of our usual test matrix. Naturally, this means that we are already thinking about what we should do next.

There is a whole bunch of big ticket items that we want to look at, but the most important of which is the one that is likely to garner very little attention from the outside. We are going to take advantage of the new Span<T> API throughout the product. This is something that I really want to get to, since we have a lot of places where we touch native memory, memory mapped sections and in general pay a lot of attention to manual memory management. There are several cases where we had to copy data from unmanaged memory to managed memory just to make some API happy (I’m looking at you, Stream).

With the Span<T> API, that is no longer required, which means that we can usually just hand a pointer to the network that is mapped directly to a file and reduce the amount of work we need to do significantly.  We are also going to also go over the codebase and see where else we can take advantage of this behavior. For example, moving our code to the System.IO.Pipes opens up some really interesting scenarios for simplifications of code and reducing of overhead.

We are going to apply lessons learned about how we actually manage memory and apply them as part of that, so just calling it Span<T> is a bit misleading. The underlying reasoning is that we want to get to simplify both I/O and memory management, which are very closely tied together. This shouldn’t actually matter to users, except that the intent is to improve performance once again.

On interns and hiring people at the first stages of their career

time to read 4 min | 738 words

imageWhen looking for candidates, there is an ideal candidate. It is the ability to take one of the people already working for you, with all the domain knowledge and expertise and clone them. Hopefully multiple times. If you do this right, you can probably stick the clones in a basement with a bunch of computers, slide Pizza under the door every so often and get a lot of work done for the price of Pizza.

While this (dystopian) scenario is quite nice in terms of overall effort, I do believe that there are some issues with it. Naturally the biggest hurdle is the medical bills for cloning people, there are also some noise about this being inhumane. The real issue, of course, is the lack of feasible technology to accelerate the growth of the clones. I’m sure this will be solved at some point. My time machine comes back from the shop on Monday (and isn’t that ironic), so I’ll be investigating this further at that point.

Setting the clone wars option aside, there is the need to get new hires. And there are several ways to do go about that. You can try getting people with some or all the skills that you require. Or you can get someone that is a blank slate and train them internally. This post is about the later option.

The question is really what do you actually define as a blank slate. For example, hiring my 3 years old daughter as a software developer would be really nice. She is a blank slate, but given that we are currently teaching her to count to 20, I think that this might be premature.

To be perfectly honest, the amount of knowledge that is required to be an efficient developer is staggering. If I was to start the clock from scratch, I think that I would be sitting there twiddling my thumbs to this day, scared of all the things that I must understand to be effective. In some way, not knowing how much I don’t know was really helpful. It allowed me to go out and learn without being overwhelmed. If I looked at just C# and compare the language from 1.0 to 7.3, for example. Each change made sense at the time, and incrementally added to the language. Some of them were bigger than others (generics, linq) but they came in byte size chunks (typo intended). Trying to grok it all at once… much harder.

We actually hire fairly often directly from college. Either immediately after completing the degree or even beforehand. We usually look for people that have gone beyond the rote learning for the good grade but are actually able to understand why this are happening, not just what API to call. Our most junior hire ever had just finished high school and had a few months free before going to the army, effectively being an intern in the company for a short while.

The approach we take for onboarding a new employee (with no practical experience) and an intern is quite different. For a full time employee, my priority is to get them well situated and familiar with how we work and the overall codebase. That means that the typical first assignments will be things that are on the sidelines. Things that are okay if they take a little longer, since they are used to get the new developer familiar with the landscape of the code. Examples include writing new clients, building internal applications using RavenDB, becnhmarking work and building diagnostics and debug tools for production analysis.

For an intern, however, the situation is different. Given that I’m only going to have the intern for a few short months,  spending 2 – 3 months training to the expected level of a full time employee is going to be a waste. Instead, we try to give the intern experimental and research projects. Things that we wished we could have done if we had the time, but typically do not. Some of them are pretty complex, but the key “feature” in this regard is that they are possible to approach without having to have a deep understanding of RavenDB.  For example, SQL Migration, one of the main features of RavenDB 4.1, was actually initially developed by an intern.

Living in the foundations, missing all the amenities

time to read 2 min | 377 words

imageWe talked to a candidate recently with a CV that included topics such as Assembly, SQL and JavaScript.  The list of skills was quite eclectic and we called the candidate to hear more about them.

The candidate completed a two years degree focused on the foundations of development, but it looked like whoever designed it was looking primarily to get a good foundation more than anything else. In other words, the end result is someone that can write SQL queries, but never built a data driven application, who knows (about? I’m not really clear at what level that was) assembly, but never written a real application. It doesn’t sound bad, I know, but it was like moving into a new house just after the contractor is done with the foundation. Sure, that is a really important part, but you don’t even have walls yet.

In 1999, I did a year long course that was focused on teaching me C and C++. I credit this course for much of my understanding of the basics of programming and how computers actually work. It has been an eye opening experience. I wouldn’t hire my 1999’s self, as I recall, that guy (can I deny knowing him?) wrote the following masterpieces:

  • sparse_matrix<T> in C++ templates that used five (5!) levels of pointer indirection!
  • The original single page application. I wrote an entire BBS system using a a single .VBS script that used three levels of recursive switch statements and included inline HTML, JS and VB code!

These are horrible things to inflict on an innocent computer, but that got me started in actually working on software and understanding things beyond the basics of syntax and action. I usually take the other side, that people are focused far too much on the high level stuff and do not pay attention to what is actually going on under the hood. This was an interesting reversal, because the candidate was the opposite. They had some knowledge about the basics, but nothing to build upon that yet.

And until you actually build upon the foundation, it is just a whole in the ground that was covered in some cement.

Toddlers, cursing and preparing ahead of time

time to read 2 min | 275 words

imageMy daughter is 3​¼ years old now. About the time that she was born, I decided that I needed to make a small change in my language. Whenever I felt the urge to curse, I would say a food’s name. For example, after being puked on, my reaction would be some variant of: “PASTA”, “PASTA BOLOGNESE” or other pasta’s favorites.

As time went by, I got better and better at expressing emotion through increasingly disturbing food references. My current favorite is: “Pasta Bolognese with pickled carrots in a bun with anchovies and raw eggs”.

A couple of days ago, I took my daughter and a friend to an ice cream shop. As expected of an ice cream shop in the middle of (very hot) summer, the place was packed. My daughter was quite excited to go there and expressed her emotions by standing up and shouting at the top of her lungs (she is three, with a voice that carry like a foghorn): “PASTA! PASTA BOLOGNESE” over and over again.

This is a crowded shop, full of small kids and parents. I got some looks for the little girl holding up a full ice cream cone and shouting about pasta, but it was infinitely preferable to the alternative.

An unforeseen side effect, however, is that because I can, I’m very free with pasta based profanities. This had led to what is effectively a competition, with her trying to cause me to go overboard with that.

And now I must go back to work, before the gluten police arrival.

Full Stack Developer, Master of None

time to read 4 min | 609 words

imageIn my previous post, I was asked:

Is it reasonable to look for a developer who knows all the complexities of backend development (Particularly for the enterprise, at least while designing distributed applications or Micro-Services) and expect them to know React, Angular, TypeScript, and many other front-end technologies on the same level?

In my experience, it is absolutely possible to have someone who is fluent in both front end (React, Angular, etc) technologies and backend technologies (databases, k8s, distributed systems, design patterns, etc). I can point at two or three of them without searching too hard. The problem, however, is that while it is possible, it is also rare. The people I have in mind who qualify for the full stack developer moniker are also people with about a decade plus of experience in the field.  And make no mistakes, I don’t include myself in that category. I can use a user interface, but don’t ask me to build one.

For the most part, I see people who exists on a spectrum, and are typically strong in certain areas and have passing familiarity in others. For example, you may have someone who is very strong in building client side user interface and calling back to the server, with some ability to create their own server side endpoints, but without the capabilities to build a full server side solution from end to end. On the other hand, someone who is capable of building the server side, maybe do some client side work, but is stumped on the more complex issues on the client.

image

Now, I’m speaking in generalizations here, because I’m talking about large segments of developers, not individuals. But this seems to hold true for large swaths of them. It also make sense, there is quite a bit to learn, and you can either be a butterfly and skim through a lot of subjects, or you can dive deeper and become an expert on a few topics.

Either option has its value, but it is important to remember that each also has its costs. If you have dipped your toes into many areas, you don’t usually have the depth to actually handle the more complex and non trivial stuff. For example, I would generally not expect someone who spent most of their time on the client side to be aware of everything that needs to happen for a proper server side caching solution.

When talking about skills in an area, I’m talking about being able to develop, support, debug and maintain such a solution. Everyone can write code in most areas, but it takes effort, skill and knowledge to take a piece of code and turn that into production software.

The term full stack developer is a way to punt. It usually says “I do a little bit here, and little bit there”. There is some meaning here, in the sense that these are the people you’ll turn to when you want to build a full application from scratch. The problem is that they are usually only able to deliver an application that does OK across the board. When you need to do more than OK (and I’m willing to admit that in many cases you don’t), you start to need to specialize. And that takes time, and effort. I would rather the term application developer, rather than full stack. If seems to be more accurate and it doesn’t ping my spider sense about false advertising.

If your CV only contains jQuery…

time to read 3 min | 579 words

imageWe recently got a CV at the office, from a developer that has about three years of experience as a Full Stack Developer. The CV was… strange, in the sense that I was intimately familiar with all the web technologies in it. This is peculiar, because about five years ago I just threw the gauntlet and stopped even trying to pretend that I have any skills in building anything that is near the front end. And my skills as a front end developers has been atrophying even before that.

I mean, <table> is still how you properly layout things today, as far as I’m concerned. However, in a rare moment of self reflection, I have to admit that I wouldn’t hire myself to do anything related to the browser.

So we have a CV with the following keywords:

  • HTML
  • CSS
  • JavaScript
  • Ajax
  • jQuery

And that’s a bit suspicious. Oh, certainly these are foundational topics for a front end developer, and I get the need to sometimes pack a CV with keywords for the purpose of matching. However, not having anything else there is strange, and not usually indicative of a good outcome. We gave the candidate a call and talked for a bit, nonetheless. 

It appears that the first job the candidate had after university was maintaining and building an already existing application. The architecture and framework choices were already done and there hasn’t been any pressing need to change them. Therefor, that was what the candidate was used to and familiar with.

So far, this is reasonable story. I can certainly see how this can happen. What I don’t understand is the candidate’s reaction to that.  Sure, the current job may be resistant to changing things. It works, probably reasonable well as far as the current workplace is concerned. And moving to a new technology just because a person want to (literally in this case) pad their resume is a bad idea.

But what about the candidate? At this point, they are actively hunting for a new job. I would expect them to take a look at the market, evaluate their current situation and identify that they are currently working on something that give them no real value for prospective employers. In fact, I would be willing to bet that this is a large part of why this candidate is looking for a new job.

I would expect the candidate at this point to actively work at improving their skills. Spend some time watching Plural Sight videos, build sample applications, go over tutorials, etc. Coming to a job interview and saying something like: “My current job only uses jQuery, but I have been studying React on the side using Plural Sight and here is my GitHub’s sample project showing my progress so far” is amazing. It indicates a lot about the candidate, including the ability to learn and develop oneself on your own.

We aren’t going to go forward with this candidate, but I’m certain that they will be able to find another position in a company where their jQuery skills will be very valuable. However, I don’t expect that they’ll learn anything new in that place, and in 3 years or so, if they will be looking for a new job, they will be in the same exact same place.

On an unrelated note, I have another CV which listed both WinForms and VB6 as core skills.

Modeling Milk: A discussion on domain modeling

time to read 2 min | 342 words

imageI recently had a discussion at work about the complexity of modeling data in real world systems. I used the example of a bottle of milk in the discussion, and I really like it, so I thought it would make for a good blog post.

Consider a supermarket that sells milk. In most scenarios, this is not exactly a controversial statement. How would you expect the system to model the concept of milk? The answer turns out to be quite complex, in practice.

To start with, there is no one system here. A supermarket is composed of many different departments that work together to achieve the end goal. Let’s try to list some of the most prominent ones:

  • Cashier
  • Stock
  • Warehouse
  • Product catalog
  • Online

Let’s see how each of these think about milk, shall we?

The cashier rings up a specific bottle of milk, but aside from that, they don’t actually care. Milk is fungible (assuming the same expiry date). The cashier doesn’t care which particular milk cartoon was sold, only that the milk was sold.

The stock clerks care somewhat about the specific milk cartoons, but mostly because they need to make sure that the store doesn’t sell any expired milk. They might also need to remove milk cartoons that don’t look nice (crumpled, etc).

The warehouse care about the number of milk cartoons that are in stock on the shelves and in the warehouse, as well as predicting how much should be ordered.

The product catalog cares about the milk as a concept, the nutritional values, its product picture, etc.

The online team cares about presenting the data to the user, mostly similar to the product catalog, until it hits the shopping cart / actual order. The online team also does prediction, based on past orders, and may suggest shopping carts or items to be purchased.

All of these departments are talking about the same “thing”, or so it appears, but it looks, behaves and acted upon in very different ways.

FUTURE POSTS

  1. Product Release Postmortem: Things You Should Never Do, Part II - 7 hours from now
  2. Deployment Postmortem: RavenDB inside Kubernetes - 11 days from now

There are posts all the way to Aug 03, 2018

RECENT SERIES

  1. RavenDB 4.1 features (11):
    04 Jul 2018 - This document is included in your subscription
  2. Codex KV (2):
    06 Jun 2018 - Properly generating the file
  3. I WILL have order (3):
    30 May 2018 - How Bleve sorts query results
  4. Inside RavenDB 4.0 (10):
    22 May 2018 - Book update
  5. RavenDB Security Report (5):
    06 Apr 2018 - Collision in Certificate Serial Numbers
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats