Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by email or phone:

ayende@ayende.com

+972 52-548-6969

, @ Q j

Posts: 6,738 | Comments: 48,776

filter by tags archive

Abusing system flexibility to avoid paying collect tolls

time to read 3 min | 512 words

imageI’m going to feel like an old man for this post, but if you were born post 1995, it is likely that you have no idea what I’m talking about in this post, crazy as this sounds to me.

Before there was a phone in every pocket, there were land lines. It is like today’s phone, but much larger, you could only do voice calls and if you wanted to screen your calls you needed to buy another appliance. If you’ll watch the first few sessions of Friends, you’ll see how important a detail that can be. If you were out of the house or office and needed to place a call, you could use something called a public phone booth or a pay phone.

Sadly, the easiest way I can convey what this was is to invoke the Tardis. A small booth in which you had a public access phone. Because phone calls used to cost a lot, these phone had a way to drop some coins or tokens into the phone to pay for the phone call.

As a child, I didn’t have a wallet and still needed to occasionally make calls. Being stuck without cash at hand wasn’t such a strange thing so there was another way to perform the call. You could reverse the charge, instead of the person placing the call paying for it, you could call collect. In that case, the person answering the call would be paying for it. Naturally, since money is involved, you need the other party to accept the charge before actually charging them.

At some point in time, you called a special number and told the operator what number you wanted to do a collect call. The operator would ring this number and ask for permission to connect the call and charge the receiver. I think that the rate for a collect call was significantly higher than the normal call, so you wouldn’t normally do that.

As part of the system automation, the phone company replaced the manual operator collect call with an automated system. You would record a short message, which would be played to the other party. If they wanted to accept the call (and the charge), the could press 1 on the phone, or disconnect to avoid the charge.

As a kid, I quickly learned that instead of telling the other party who is calling and why (so they would accept the call), I could just tell them what my phone number is. In this way, they would write down the number, refuse the call and then call me back. That would avoid the collect toll charge.

I remember that at some point the phone company made the length of the collect hello message really short, but I got around that by speaking really fast (or sometimes by making two separate calls). I remember having to practice saying the phone number a few times to get it done in the right time.

This code has expectations from the reader

time to read 2 min | 396 words

I just had a discussion with a colleague about a fix of non trivial code. The question was what comments should go into the code to explain what was going on.  If you care to know, this related to the prefetching strategy that is used by RavenDB to reduce the amount of I/O that is required (especially on slow disks). The details don’t actually matter. The problem is that there are multiple relatively complex issues there, from managing I/O to thread safety in the critical code path (using dirty reads intentionally), etc.

The problem with doing this is that the code is complex but it is a fairly straightforward progress from the kind of code we usually write in performance sensitive sections. The fear was by over commenting the code, we’ll get ourselves into a situation where we’ll be making the code too malleable to change. This is the kind of code that sits in the perf critical section, you change it after fasting for a day or two (with strong encouragement on meditation about little vs. big endian and why half endian is so rare).

In other words, in practice. You change it when you have reason, and you back up that change with a battery of performance tests. Anything from the usual benchmarks to running production loads on various machines to poring over system traces.

Given the amount of effort that is expected from any changes to this code, I consider it to be a good idea for people who read it to understand that there is a hurdle there that must be jumped before it should be modified. Thus, we decided to skip some of the comments on the reasoning behind the overall design. Here is the most important comment in this code, this is there to explain a particular choice of value and the reasoning that must be applied when it is changed.

What about the whole complexity of the prefetching in general? That isn’t document in code, because reading code comments scattered throughout will make it very hard to grok. This is detailed in the architecture guide that go over these details.

For myself, I find it really awesome to go over a codebase and figure out what reasoning lie behind the code. But when I have people working on my projects? It is better to give them a hand than a riddle.

The candidate’s portfolio

time to read 1 min | 136 words

This is a screen shot from a CV I just read:

image

The CV itself is kind of boring. Just graduated, did so and so course and was excellent in foo and bar.

We see dozens of CVs like that on a regular basis. But the portfolio link was very nice. It linked to a Google Drive folder with a bunch of games that the candidate made, in various languages.

I didn’t actually went and read all the code, but I really skimmed through a bunch of projects there. I actually like the portfolio a lot better than a github link. A portfolio is explicitly about showing a potential employer what you can do. A github link can be used for many things.

Looking at a candidate’s GitHub profile

time to read 3 min | 505 words

When a candidate sends a CV and includes a GitHub profiler, that almost always guarantees that I’ll give that profile a look. The most interesting thing from my perspective in a GitHub profile is that it allows me to look at the candidate’s work. There aren’t that many candidates with GitHub profile links, and not having a link isn’t something that will cause me to rule out a candidate. But I thought it would be interesting to share some of my finding from such trawling of repositories.

Here is an example of something that I don’t like:

image

In fact, in most code bases, I’ll skim very quickly to find the data access code. SQL Injection is a pet peeve of mine, and seeing how a candidate’s code handle user’s input is an easy way to get a first impression. It isn’t always indicative of “this person has no skills and is careless”, mind. But I found that it is a good place to start. Especially because mostly I’ll see sample projects and half finished stuff. So seeing how they treat this particular issue (which is easily found and should be familiar to most developers) is a good quick check. Then again, here is the same candidate, with another repository:

image

This is using Hibernate, by the way. And that kind of hurt my feelings, to be fair.

On the other hand, a different candidate:

image

That is a much better, and show that they pay attention to other functional requirements.

In general, I consider the presence of a GItHub link in a CV as an invitation to evaluate the candidate’s work and will do so with the goal of understanding their approach, the quality of their code and their skills. As such, if you include a GitHub link in a CV, I would recommend consider this to be your public face and a criterion for evaluation.

This is an advantage. It means that the GitHub link mere existence make you pop out of the crowd. On the other hand, it also means that your code is under scrutiny.

I’m advising here for people starting out, without much background. As such, having a straightforward way to be evaluated on your skills is a plus. I would suggest making it easier. For example, a clear README is nice, especially if you explain what you were trying to do. “Playing around with Angular to see how it feels” is a great thing to have, because it gives context to the person reading your code. Especially for web applications and client side work, having a visible demo that I can quickly look at is great.

On the other hand, having well known bad practices (such as SQL Injection, plain text passwords, etc) in the code is a big negative.

The GitHub profile you don’t want in your CV

time to read 1 min | 108 words

I just got a CV from a candidate looking for a junior position. I looked at the CV (and oh my God, did this guy have a lot of acronyms in there). I noted that he has a GitHub account in the CV, so naturally I checked it.

There is a single repository there, which I’ll present to you in all its glory:

image

This is actually a negative. If he didn’t have a GitHub account, I wouldn’t have minded. But including one that is in this shape is not a good idea.

Toddlers, cursing and preparing ahead of time

time to read 2 min | 275 words

imageMy daughter is 3​¼ years old now. About the time that she was born, I decided that I needed to make a small change in my language. Whenever I felt the urge to curse, I would say a food’s name. For example, after being puked on, my reaction would be some variant of: “PASTA”, “PASTA BOLOGNESE” or other pasta’s favorites.

As time went by, I got better and better at expressing emotion through increasingly disturbing food references. My current favorite is: “Pasta Bolognese with pickled carrots in a bun with anchovies and raw eggs”.

A couple of days ago, I took my daughter and a friend to an ice cream shop. As expected of an ice cream shop in the middle of (very hot) summer, the place was packed. My daughter was quite excited to go there and expressed her emotions by standing up and shouting at the top of her lungs (she is three, with a voice that carry like a foghorn): “PASTA! PASTA BOLOGNESE” over and over again.

This is a crowded shop, full of small kids and parents. I got some looks for the little girl holding up a full ice cream cone and shouting about pasta, but it was infinitely preferable to the alternative.

An unforeseen side effect, however, is that because I can, I’m very free with pasta based profanities. This had led to what is effectively a competition, with her trying to cause me to go overboard with that.

And now I must go back to work, before the gluten police arrival.

The Incredibles II

time to read 2 min | 273 words

imageI just got back from watching the Incredibles 2. The previous movie was a favorite a mine from first view, and it is one of the few movies that I can actually bear to watch multiple times. I was hoping for a sequel almost from the moment I finished the first movie, and it took over a decade to get it.

I actually sat down with my 3 years old daughter to watch the first movie before I went to see the second one. I’m not sure of how much she got from it, although she is very fond of trains and really loved the train scene (and then kept asking where is the train). It is unusual for me to actually “prepare” to see a movie, by the way. But it did mean that I had the plot sharp in my head and that I could directly compare the two movies.

First, in terms of the plot. It was funny, especially since I got a kid now and could appreciate a lot more of the not so subtle digs at parenthood.

Second, in terms of visuals, wow, it improved by a lot. The original movie held up really good in terms of visuals in the past 12 years, but the new one is visibly better in this term.

Also, at this point I nearly got a heart attack because a talking book (again, my daughter) starting neighing at me at the middle of the night just as I got into the house.

Highly recommended.

When I gave up on pointers

time to read 2 min | 372 words

hI started programming with that orange turtle ( I think it was supposed to be green, but we had bad CRT screens ) by drawing stuff on the screen. I think that I was in fifth grade or so. I later graduated to VB (IIRC, that was VB3 or VB4), but my first formal programming education was done in Pascal. And I was pretty good (for a high school kid who merely dabbled), but I just couldn’t figure out pointers. I mean, they made absolutely no sense whatsoever.

Take the example of an “infinite” size stack, that was the example that we were given during class, and I just couldn’t follow it. Take the stack example, like so:

You might notice that this code is limited, if you are storing more than 3 items, the value will be silently ignored, which is probably not what you want. High school me would agree that this is bad, and therefor increase the STACK_SIZE variable to a ridiculously high size, such as 100). That would surely be big enough for everything, right?

I remember really struggling with the concept of dynamic memory management, not so much as because of the API, but because I  couldn’t make any sort of sense about what I was supposed to do there.

After high school, I went and high end course in C++. That took about a year, and I highly recommend the course (even though I don’t think they run it), it taught me a lot about basic stuff such as how things actually work. We started with low level C in DOS, and build on top of that all the way to MFC and ATL. And at some point, the instructor introduced dynamic memory management. And it was so blindingly obvious that I never actually realized that I’m learning the same concept that gave me so much grief in the past.

I had that experience several times since then. I try to learn something, and I just bounce, hard. At a while later, I do the same thing, or fight a slightly  different, and get it. Bug I’m no sure how I go from “what the hell” to “oh, this is obvious”.

Why RavenDB isn’t written in F#, or the cost of the esoteric choice

time to read 7 min | 1218 words

In a recent post, a commenter suggested that using F# rather than C# would dramatically reduce the code size (measured in line numbers).

My reply to that was:

F# would also lead to a lot more complexity, reduced participation in the community, harder to find developers and increased costs all around.

And the data to back up this statement:

C# Developers F# Developers

image

image

Nitpicker corner: Now, I realize that this is a sensitive topic, so I’ll note that this isn’t meant to be a scientific observation. It is a data point that amply demonstrate my point. I’m not going to run a full bore study.  And yes, those numbers are about jobs, not people, but I’m assuming that the numbers are at least roughly comparable.

The reply to this was:

You have that option to hire cheaper developers. I think that the cheapest developers usually will actually increase your costs. But if that is your way, then I wish you good luck, and I accept that as an answer. How about "a lot more complexity"?

Now, let me try to explain my thinking. In particular, I would strongly disagree with the “cheapest developers” mentality. That is very far from what I’m trying to achieve. You usually get what you pay for, and trying to save on software development costs when your product is software is pretty much the definition of penny wise and pound foolish.

But let us ignore such crass terms as money and look at availability. There are less than 500 jobs for F# developers (with salary ranges implications that there isn’t a whole lot of F# developers queuing up for those jobs). There are tens of thousands of jobs for C# developers, and again, the salary range suggest that there isn’t a dearth of qualified candidates that would cause demand to raise the costs. From those numbers, and my own experience, I can say the following.

There are a lot more C# developers than there are F# developers. I know that this is a stunning conclusion, likely to shatter the worldview of certain people. But I think that you would find it hard to refute that. Now, let us try to build on this conclusion.

First, there was the original point, that F# lead to reduced number of lines. I’m not going to argue that, mostly  because software development isn’t an issue of who can type the most. The primary costs for development is design, test, debugging, production proofing, etc. The act of actually typing is pretty unimportant.

For fun, I checked out the line count numbers for similar projects (RavenDB & CouchDB). The count of lines in the Raven.Database project is roughly 100K. The count of lines in CouchDB src folder is roughly 45K. CouchDB is written in Erlang, which is another functional language, so we are at least not comparing apples to camels here. We’ll ignore things like different feature set, different platforms, and the different languages for now. And just say that an F# program can deliver with 25% lines of code of a comparable C# program.

Note that I’m not actually agreeing with this statement, I’m just using this as a basis for the rest of this post. And to (try to) forestall nitpickers. It is easy to show great differences in development time and line of code in specific cases where F# is ideally suited to the task. But we are talking about general purpose usage here.

Now, for the sake of argument, we’ll even assume that the cost of F# development is 50% of the cost of C# development. That is, that the reduction in line count actually has a real effect on the time and efficiency. In other words, if an F# program is 25% smaller than a similar C# program, we’ll not assume that it takes 4 times as much time to write.

Where does this leave us? It leave us with a potential pool of people to hire that is vanishingly small. What are the implications of writing software in a language that have fewer people familiar with it?

Well, it is harder to find people to hire. That is true not only for people that your hire “as is”. Let us assume that you’re going to give those people additional training after hiring them, so they would know F# and can work on your product. An already steep learning curve has just became that much steeper. Not only that, but this additional training means that the people you hire are more expensive (there is an additional period in which they are only learning). In addition to all of that, it will be harder to hire people, not just because you can’t find people already experienced with F#, but because people don’t want to work for you.

Most developers at least try to pay attention to the market, and they make a very simple calculation. If I spend the next 2 – 5 years working in F#, what kind of hirability am I going to have in the end? Am I going to be one of those trying to get the < 500 F# jobs, or am I going to be in the position to find a job among the tens of thousands of C# jobs?

Now, let us consider another aspect of this. The community around a project. I actually have a pretty hard time finding any significant F# OSS projects. But leaving that aside, looking at the number of contributors, and the ability of users to go into your codebase and look for themselves is a major advantage. We have had users skip the bug report entirely and just send us a Pull Request for an issue they run into, others have contributed (significantly) to the project. That is possible only because there is a wide appeal. If the language is not well known, the number of people that are going to spend the time and do something with it is going to be dramatically lower.

Finally, there is the complexity angle. Consider any major effort required. Recently, we are working on porting RavenDB to Linux. Now, F# work on Linux, but anyone that we would go to in order to help us port RavenDB to Linux would have this additional (rare) requirement, need to understand F# as well as Linux & Mono. Any problem that we would run into would have to first be simplified to a C# sample so it could be readily understood by people who aren’t familiar with F#, etc.

To go back to the beginning, using F# might have reduce the lines of code counter, but it wouldn’t reduce the time to actually build the software and it would limit the number of people that can participate in the project, either as employees or Open Source contributors.

My view on crowd funding

time to read 3 min | 511 words

After my previous post, I was asked what I’m thinking about the notion of crowd funding, which is currently all the rage.

The answer is complicated. I’m focusing right now on things like kick starter and its siblings, because I’m familiar with how they work. The basic premise is pretty great. You have some idea (usually a product) that require initial capital and has some well known market. By directly contacting the target audience, we can get the seed money, judge demand and have very low risk overall. The “investors” put in small amount of money, which loss they can tolerate without hardship. The project get money for very little effort and get great marketing along the way.

This is great, if you are doing a product. Something that can be sold. For instance, let us say that we want to do a major feature, like adding time series capabilities to RavenDB. Let us say that we start a kick starter campaign for this, asking for 150,000 USD and promising backers that they’ll get a free license out of early sponsorship.

I’ll get into the exact costs associated with this option in a bit. But before we go there, remember the premise of my previous post. It isn’t money to build a specific product. It is money that is required to purchase something for the business itself. Of course, buying that cool car will raise morale and I have a spreadsheet that says that it will increase the effectiveness of the team by 17.4% (although it will decrease parking space by 37%). So it make sense to go with that, from a business perspective. However, there is very little that I can do to actually make people want to back “we want a cool car” notion. At least, I don’t think so, but the internet does have some dark corners.

Back to the notion of using this to build products. There is a very basic problem here. RavenDB isn’t targeting individuals. It is a database platform, and most of our customers are businesses or enterprises. That lead to a very different mindset. Speculative investment in something like this is going to be much rarer, harder and fraught with issues. An Open Source project can do that, but it make sense to invest in a project a business is using, but there are very few who actually manage to do that. A quick search of kick starter doesn’t show any major open source soliciting funds there.

Kick starter make sense for personal stuff, things that you actually get to hold, or need to buy. Something of some scarcity. Doing this for commercial software make very little sense, and for open source, it is even a bigger problem. For open source projects that depend on donations, usually you have a valid commercial reason for people to donate (Linux, Wikipedia, etc).

I’m open for contrarian point of view, mind. But I don’t think that crowd funding is applicable for the kind of things that I would want to use it for.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Graphs in RavenDB (11):
    08 Nov 2018 - Real world use cases
  2. Challenge (54):
    28 Sep 2018 - The loop that leaks–Answer
  3. Reviewing FASTER (9):
    06 Sep 2018 - Summary
  4. RavenDB 4.1 features (12):
    22 Aug 2018 - MongoDB & CosmosDB Migration Wizards
  5. Reading the NSA’s codebase (7):
    13 Aug 2018 - LemonGraph review–Part VII–Summary
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats