Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,573
|
Comments: 51,188
Privacy Policy · Terms
filter by tags archive
time to read 9 min | 1692 words

Okay!

After diving into CouchDB source code and doing an in depth review of the btree:lookup, I am ready for the real challenge, figuring out how CouchDB writes to the btree. This is the really interesting part, from my point of view.

The heart of this functionality is the query_modify method. This method allow us to perform a batch of operations on the tree in one go. Following my own ideal that all remote APIs (and this is talking to disk, hence, remote) should support batching, I really like the idea.

image

The first part is dedicated to building the actions list. I am not going to go over that in detail, but I will say that this first sort things by key, and then by action.

So, for example, let us say that I want to perform the following actions:

query_modify(tree, {'c', 'a'},'a','a')

What will happen is that we will first query the current value of 'a', then we will remove it, and then we will insert it. Finally, we will query the value of 'c'. This is done in order to ensure good traversal performance of the tree, as well as to ensure consistency when performing multiple operations.

The part that really interest us is modify_node, which perform the actual work:

image

Now we are starting to see how things are really happening. We can see that if the root is null we create a value node, or we find the appropriate node if it is not. This follows what we seen in the lookup.

Moving on, we forward the call to modify_kpnode or modify_kvnode. I am going to touch kpnode first, since it is bound to be simpler (kvnode needs to handle splitting, I would assume).

image

This is far from trivial code, and I don't think that I understand it fully yet, but what it basically does is fairly simple. First the first node that match the current FirstActionKey, if this is the last node that we have to deal with, we call modify_node on it, accumulate the result and return it. If it is not the last node, we split the work, sending all that are less than or equal to the current FirstActionKey to be handled using modify_node (which is likely to be a key/value node and thus handled directly) and continue to handle the rest using modify_kpnode.

In a way, this is the exact reverse of how lookup_kvnode is handled.

modify_kvnode is complex. I can't fit the functions on a single screen in a 1280x1024 resolution. I am going to split it into two sections. It is not ideal, since they are supposed to go together, but I'm sure you'll manage.

image

The first part is to handle the case where we have more actions to perform. In this case, we can simple return the results. The second is there to handle the case where we run out of the items in the node. Note how insert works, for example. You can see that the way that the btree works is the same in which Erlang does. That is, we will always rewrite the entire node, rather than modify it.  remove is also interesting. If we got to this point and haven't found the node, it doesn't exist, so we can move on. Same for fetching.

Now let us see the more complex part, what happen if we do find the items in the value node?

image

Note that we have AccNode, which is our accumulator. We find the first node that match ActionKey, and then we take the NodeTuple and the AccNode and turn them into a reverse list. This copies all the items that are less than the current one to the ResultNode, those are the ones that we are not interested in, so we can just copy them as is.

The next part handles the actual adding/removing/fetching from the node. It is pretty self explanatory, I think, so I'll leave it at that.

So now we understand what modify_kvnode and modify_kpnode works. But there is nothing here about splitting nodes, which I find suspicious. Let us go back to modify_node and look what else is going on there:

image

Ah, note the handling of NewNodeList, there is probably where the money is.

We make a couple of tests. The first is to see if there are any nodes left, the second to see if we changed the node list (by comparing it to the old value). We don't care for any of those at the moment, so we will focus on the last one. write_node is called, and this is likely where we will see what we are looking for.

image

Well, this is simple enough, and chunkify seems like what we were looking for. However, It bothers me that we write all the nodes to disk. Is seems... wrong somehow. More specifically, since we are always appending, aren't we going to break the binary tree? There is also the reduce_node call that is hiding there, which we also need to check. It is being called after the node was written to disk, so I am not sure what is going on here.

Let us read chunkify first, and then deal with reduce_node.

image

Well... it looks like the chunkify function sucks. But anyway, what is going on there is fairly simple. We check if the list that we were passed is greater than CHUNK_THRESHOLD. This is set to 1279, for some strange reason that I can't follow. I assume that the reason for that is to ensure blocks of less than 1280, but that no sector size that I have heard of came in this size.

The second part is more interesting (and complex). OutputChunks is a list of lists of the elements that started in the InList. This piece of code is very short, but it does a tremendous amount of work.

And now we can move back to reduce_node, like I promised.

image

This is short, to the point, and interesting. The idea of rereduce is that when something has changed, you don't have to go and recalculate everything from scratch, you can take partial reduced results and the new values and combine them to produce the same result that you would have if you had reduced over the entire data set.

As you can see, calling reduce_node on a key pointer node will cause a re reduction, while on a value node, it just reduce after a map. I am assuming that the thought was that value nodes are small enough that there is no point in optimizing this.

There are a few interesting points that need to be raised, which I don't have answers for at the moment.

  • Why is this happening after we write to file?
  • How does this relate to CouchDB's ability to shell out to java script in order to perform maps and reductions?
  • What ensure that the ordering of the node reduction match the tree hierarchy?

At any rate, this completes our examination of write_node and modify_node, we can now go back to where we started, query_modify:

image

We are nearly at the end. We have seen that the nodes are written to disk after being chunkified. Note that currently nothing actually have a reference to them at this point. We do have KeyPointers, but they aren't attached to anything. If we crashed right now (directly after modify_node), there is no state change as far as we are concerned, we just lost some disk space that we will recover in the next compaction.

I am pretty sure that complete_root is the one responsible for hooking everything together, so let us see what it does...

image

Yes, I expected complete_root to be complicated as well :-)

What is going on here is the actual missing piece. This is what takes all the value nodes and turn them into pointer nodes, and does so recursively until we finally get to the point where we only have a single value returned, which is the root node. There is also handling for no nodes, in which case the tree is empty.

Basically, what this means that that the way CouchDB is able to achieve ACID using a btree is by saving the modified tree structure to disk on each commit. Since it is only the modified part that is saved, and since btree structure are really small, it has no real storage penalty. Now I want to go and find what actually save the new root tree to disk, since query_modify does not modify the actual header (which means that in the case of a crash, nothing will change from the point of view of the system).

Right now I suspect that this is intentional, that this allows to combine even more actions into a single transaction, even beyond what you can do in a single query_modify. This is especially true since the common interface for those would usually be add, remove, etc.

As an interesting side effect, this is also how CouchDB is able to handle MVCC. Since everything is immutable, it can just hand a tree reference to the calling code and forget about it. No changes ever occur, so you get serializable isolation level by default. And, from what I can see, basically for free.

Going over the couch_httpd file, is seems that the main interface is couch_db, so I am heading that way... and I run into some very complex functions that I don't really feel like reading right now.

Hope this post doesn't contain too many mistakes...

time to read 4 min | 789 words

I want to dive deep into the way CouchDB's file format, which is interesting, because it maintain ACID guarantees and the code is small enough to make it possible to read.

The file starts with a header, with the following structure:

"gmk\0"
db_header:
    writer_version
    update_seq
    summary_stream_state,
    fulldocinfo_by_id_btree_state
    docinfo_by_seq_btree_state
    local_docs_btree_state
    purge_seq
    purged_docs
// zero padding to 2048
md5 (including zero padding)

At the moment, I am not sure what some of the fields are (all the state and the purged_docs), and there is indication that this header can get larger than 2Kb. I'll ignore it for now and go study the way CouchDB retrieve a node. The internal data structure for the CouchDB file is a btree.

Here is the root lookup/2, which takes the btree and a list of keys:

image

It makes a call to lookup/3, the first of which is error handling for null btree, and the second on is the really interesting one.

get_node will perform a read from the file at the specified location. As you can see, the Pointer we get pass is the btree root at this stage. So this basically reads a term (an Erlang data structure) from the file. It is important to note that Erlang has stable serialization in the face of versioning, unlike .Net.

So, we get the node, and it is either a kp or a kv node (I think that kv is key/value and kp is key/pointer). Since key value seems to be easier to understand, I am going to look it up first.

image

As usual, we are actually interested in the lower one, with the first two for error handling. The first is to search for an empty list of keys, which return the reversed output. This is a standard pattern in Erlang, where you accumulate output until you run out of input to process in a recursive manner, at which point you simple return the reversed accumulator, since you need to return the data in the same order that you got it.

Indeed, we can see if from the signature of the third function that this is how this works. We split LookupKey from RestLookupKeys. We will ignore the second function for now, since I don't understand its purpose.

find_first_gteq seemed strange at first, until I realized that gteq stands for greater or equals to. This perform a simple binary search on NodeTuple. Assuming that it contains a list of ordered tuples (key,value), it will give you the index of the first item that is equal or greater to the LookupKey. The rest of the function is a simple recursive search for all the other items in the RestLookupKeys.

That is interesting, since it means that all the items are in memory. The only file access that we had was in the get_node call in lookup. That is interesting, since I would assume that it is entirely possible to get a keys across wide distribution of nodes. I am going to assume that this is there for the best case scenario, that the root node is also a value node that has no branches. Let us look at the handling of kp_node and see what is going on.

And indeed, this seems to be the case:

image

As usual, the first two are stop conditions, and the last one is what we are really interested in.

We get the first key that interest us, as before. But now, instead of having the actual value, we have a PointerInfo. This is the pointer to where this resides on the disk, I assume.

We then split the LookupKeys list on those who are greater then what FirstLookupKey is. For all those who are less then or equal to us, we call back to lookup, to recursively find us the results of them. Then we call again to all those who are greater than FirstLookupKey, passing it all those items.

In short, this is a implementation of reading a binary tree from a file, I started to call it simple, but then I realized that it isn't, really. It is elegant, though.

Of course, this is still missing a critical piece, the part in which we actually write the data, but for now I am actually able to understand how the file format works. Now I have to wonder about attachments, and how the actual document content is stored, beyond just the key.

Considering that just yesterday I gave up on reading lookup for being too complex, I am pretty happy with this.

time to read 14 min | 2617 words

I like the ideas that Erlang promotes, but I find myself in a big problem trying to actually read Erlang. That is, I can read sample code, more or less, but real code? Not so much. I won't even touch on writing the code.

One of the problems is the lack of IDE support. I tried both ErlIDE and Erlybird, none of them gave me so much as syntax highlighting. They should, according to this post, but I couldn't get it to work. I ended up with Emacs, which is a tool that I just don't understand. It supports syntax highlighting, which is what I want to be able to read the code more easily, so that is fine.

Emacs is annoying, since I don't understand it, but I can live with that. The source code that I want to read is CouceDB, which is a Document DB with REST interface that got some attention lately. I find the idea fascinating, and some of the concepts that they raise in the technical overview are quite interesting. Anyway, I chose that to be the source code that will help me understand Erlang better. Along the way, I expect to learn some about how CouchDB is implemented.

Unlike my usual code reviews, I don't have the expertise or the background to actually judge anything with Erlang, so please take that into account.

One of the things that you notice very early with CouchDB is how little code there is in it. I would expect a project for a database to be in the hundreds of thousands of lines and hundreds of files. But apparently there is something else going on. There are about 25 files, the biggest of them being 1,200 lines or so. I know that the architecture is drastically different (CouchDB shells out to other processes to do some of the things that it does), but I don't think it is just that.

As usual, I am going to start reading code in file name order, and jump around in order to understand what is going on.

couch_btree is strange, I would expect Erlang to have an implementation of btree already. Although, considering functional languages bias toward lists, it may not be the case. A more probable option is that CouchDB couldn't use the default implementation because of the different constraint it is working under.

This took me a while to figure out:

image

The equivalent in C# is:

public class btree
{
	fd, root;
	extract_kv = (KeyValuePair kvp) => kvp;
	assemble_kv = (KeyValuePair kvp) => kvp;
	less = (a, b) => a < b;
	reduce = null
}

public object Extract(btree, val)
{
	return btree.extract_kv(val);
}

The ability to magically extract values out of a record make the code extremely readable, once you know what is going on. I am not sure why the Extract variable is on the right side, but I can live with it.

Here is another piece that I consider extremely elegant:

image

You call set_options with a list of tuples. The first value in each tuple is the option you are setting, the second is the actual value to set. This is very similar to Ruby's hash parameters.

Note how it uses recursion to systematically handle each of those items, then it call itself with the rest of it. Elegant.

It is also interesting to see how it deals with immutable structure, in that it keeps constructing new ones as copy of the old.  The only strange thing here is why we map split to extract and join to assemble. I think it would be best to name them the same.

There is also the implementation of reduce:

image

This gave me a headache for a while, before I figured it out. Let us say that we are calling final_reduce with a binary tree and a value. We expect that value to always be a tuple of two lists. We call the btree.reduce function on the value. And this is the sequence of operations that is going on:

  • if the value is two empty lists, we call reduce with an empty list, and return its value.
  • if the value is composed of an empty list and a list with a single value, then we have reduced the value to its final state, and we can return that value.
  • if the value is an empty list and a list of reduced values, we call reduce again to re-reduce the list, and return its value.
  • if the value is two lists, we call reduce on the first and prepend the result to the second list. Then we call final_reduce again with an empty list and the new second list.

Elegant, if you can follow what is going on.

There is a set of lookup functions that I am not sure that I can understand. That seems to be a problem in general with the CouchDB source code, there is very little documentation, and I don't think that it is just my newbie status at Erlang that is making this difficult.

I got up to get_node (line 305) before I realized that this isn't actually an in memory structure. This is backed by couch_file:pread_term, whatever that is. And then we have write_node. I am skipping ahead and see things like reduce_stream_kv_node2, which I am pretty sure are discouraged in any language.

Anyway, general observation tells me that we have kvnode and kpnode. I think that kvnode is key/value node and kpnode is key/pointer node. I am lost beyond that. Trying to read a function with eleven arguments tax my ability to actual understand what each of them is doing.

I found it very interesting to find test functions at the end of the file. They seem to be testing the btree functionality. Okay, I am giving up on understanding exactly how the btree works (not uncommon when starting to read code, mind you), so I'll just move on and record anything new that I have to say about the btree if I find something.

Next target, couch_config. This one actually have documentation! Be still, my heart:

image

Not sure what an ets table is, mind you.

couch_config is not just an API, like the couch_btree is. It is a process using Erlang's OTP in order to create an isolate process:

image

gen_server is a generic server component. Think about this like a separate process that is running inside the Erlang VM.

Hm, looks like ets is an API to handle in memory databases. Apparently they are implemented using Judy Arrays, a data structure implementation that aims to save in cache misses on the CPU.

This is the public API for cache_config:

image

It is a fairly typical OTP process, as far as I understand. All calls to the process are translated to messages and sent to the server process to handle. I do wonder about the ets:lookup and ets:match calls that the get is making. I assume that those have to be sync calls.

Reading the ini file itself is an interesting piece of code:

image

In the first case, I am no sure why we need io:format("~s~n", [Msg]), that seems like a no op to me. The file parsing itself (the lists:foldl) is a true functional code. It took me a long time to actually figure out what is going on with there, but I am impressed with the ability to express so much with so little code.

The last line uses a list comprehension to write all the values to the ets table.

On the other hand, it is fairly complex, for something that procedural code would handle with extreme ease and be more readable.

Let us take a look at the rest of the API, okay?

image

This is the part that actually handles the calls that can be made to it (the public API that we examined earlier). Init is pretty easy to figure out. We create a new in memory table, load all the the ini files (using list comprehension to do so).

The last line is interesting. Because servers needs to maintain state, they do it by always returning their state to their caller. It is the responsibility of the Erlang platform (actually, I think it is OTP specifically) to maintain this state between calls. The closest thing that I can think of is that this is similar to the user's session, which can live beyond the span of a single request.

However, I am not sure how Erlang deals with concurrent requests to the same process, since then two calls might result in different state. Maybe they solve this by serializing access to processes, and creating more processes to handle concurrency (that sounds likely, but I am not sure).

The second function, handle_call of set will call insert_and_commit, which we will examine shortly, and just return the current state. The function handle_call of delete is pretty obvious, I think, as is handle_call of all.

handle_call of register is interesting. We use this to register a callback function. The first thing we do is setup a monitor on the process that asked us for the callback. When that process goes down, we will be called (I'll not show that, it is pretty trivial code).

Then we prepend the new callback and the process id to the notify_funs variable (the callbacks functions) and return it as the new state.

But what are we doing with those?

image

Well, we call them in the insert_and_commit method. The first thing we do is insert to the table, then we use list comprehension over all the callbacks (notify_funs) and call them. Note that we use a catch block there to ignore errors from other processors. We write the newly added config item to file as well, on the last line.

And this is it for couch_config. couch_config_writer isn't really worth talking about.  Moving on to the first item of real meat. The couch_db.erl file itself! Unsurprisingly, couch_db, too, is an OTP managed process.  And the first two functions in the file tell us quite a bit.

image

catch, like all things in Erlang, is an expression. So start_link will return the error, instead of throwing it. I am not quite sure why this is used, it seems strange, like seeing a method whose return type is System.Exception.

start_link0, however, is interesting. couch_file:open returns an Fd, I would assume that this is a File Descriptor, but the call to unlink below means that it is actually a process identifier, which means that couch_file:open creates a new process. Probably we have a process per file. I find this interesting. We have seen couch_file used in the btree, but I haven't noticed it being its own process before.

Note the error recovery for file not found. We then actually start the server, after we ensure that the file exists and valid, delete the old file if exists, and return.

We then have a few functions that just forward to the rest of couch_db:

image

Looks like those are used as a facade. delete_doc is funny:

image

I am surprised that I can actually read this. What this does is create a new document document for all the relevant versions that were passed. I initially read it as modifying the doc and using pattern matching to do so, but it appears that this is actually creating a new one. I hope that I'll be able to understand update_docs.

In the meantime, this is fairly readable, I think:

image

This translate to the following C# code:

public var open_doc(var db, var id, var options)
{
	var doc = open_doc_internal(db,id,options);
	if(doc.deleted)
	{
		if (options.Contains("deleted"))
			return doc;
		return null;
	}
	return doc;
}

About the same line count. And I am not sure what is more readable.

I am skipping a few functions that are not interesting, but I was proud that I was able to figure this one out:

image

We start with a sorted list of documents, which we pass to group_alike_docs/2:

  • if the first arg is an empty list, return the reserved list from the second arg.
  • If the first argument is a list with value, and the second one is an empty list, create a new bucket for that document, and recurse.

If both lists are not empty, requires its own explanation. Note this piece of code:

[#doc{id=BucketId}|_] = Bucket,

This is the equivalent of the following C# code:

var BucketId = Bucket.First().Id;

Now, given that, we can test if the bucket it equals to the current document it. If it is, we add it to the existing bucket, if it is not, we create new one. The syntax is pretty confusing, I admit, but it make sense when you realize that we are dealing with a sorted list.

By this time I am pretty sure that you are tired of me spelling out all the CouchDB code. So I'll only point out things that are of real interest.

couch_server is really elegant, and I was able to figure out all the code very easily. One thing that is apparent to me is that I don't understand the choice of using sync vs. async calls in Erlang. couch_server is all synchronous calls, for example.

That is all for now, I reviewed couch_file as well, and I think that I need to go back to couch_btree to understand it more.

couch_ft_query is very interesting. It is basically a way to shell out to an external process to do additional work. This fits very well with the way that Erlang itself works. What seems to be going on is that full text search is being shelled elsewhere, and the result of calling that javascript file is a set of doc id and score, which presumably can be used later to get an ordered list from the database itself.

This seems to indicate that there is a separate process that handles the actual indexing of the data (Lucene / Solar seems to be a natural choice here). It isn't currently used, which maps to what I know about the current state of CouchDB. It is interesting to note that Lucene is also a document based store.

Moving on, couch_httpd is where the dispatching of all commands occur. I took only a single part, which represent most of how the code looks like in this module:

image

I really like the way pattern matching is used here. Although I must admit that it is strange to see function declaration being used as logic.

couch_view is the one that is responsible for views, obviously. I can't follow all it is doing now, but it is shelling out some functionality to couch_query_servers, which is shelling the work, in turn, to... a java script file. This one, to be precise. CouchDB is using an external process in order to handle all of its views processing. This give it immense flexibility, since a user can define a view that does everything they want. Communication is done using standard IO, which surprised me, but seems like the only way you could do cross platform process communication. Especially if you wanted to communicated with java script.

All in all, that has been a very interesting codebase, that is all for now...

For the single reader who actually wade through all of that, thanks.

time to read 2 min | 321 words

Recently there has been a lot of discussion about how we can make development easier. It usually started with someone stating "X is too hard, we must make X approachable to the masses".

My response to that was:

You get what you pay for, deal with it.

It was considered rude, which wasn't my intention, but that is beside the point.

One of the things that I just can't understand is the assumption that development is a non skilled labor. That you can just get a bunch of apes from the zoo and crack the whip until the project is done.

Sorry, it just doesn't work like this.

Development requires a lot of skill, it requires quite a lot of knowledge and at least some measure of affinity. It takes time and effort to be a good developer. You won't be a good developer if you seek the "X in 24 Hours" or "Y in 21 days". Those things only barely scratch the surface, and that is not going to help at all for real problems.

And yes, a lot of the people who call themselves developers should put down their keyboards and go home.

I don't think that we need to apologize if we are working at a high level that makes it hard to beginners. Coming back to the idea that this is not something that you can just pick up.

And yes, experience matters. And no, one years repeated fifteen times does not count.

Learn programming in 10 years is a good read, which I recommend. But the underlying premise is that there is quite a lot of theory and practical experience underlying what we are doing on a day to day basis. If you don't get that, you are out of the game.

I refuse to be ashamed of requiring people to understand advanced concepts. That is what their job is all about.

time to read 1 min | 126 words

That is pretty amazing, since that was a big pain point for developing for EC2 powered systems. They support both Oracle and MySQL, in addition to SimpleDB, which is a non relational DB.

From the looks of things, however, there is a significant difference between the Oracle and MySQL offerings. MySQL is a DB limited to a single machine. They talk about the ability to dynamically scale the machine (which sounds just awesome) from small to extra large based on requirement, but not about multi instance databases.

Oracle, however, does have this ability, and it is supported on EC2. So you can increase you database horizontally, rather than vertically. At least, that is how I read things.

I found this to be extremely interesting.

time to read 4 min | 636 words

I am currently reading another Erlang book, and I am one again impressed by the elegance of the language. Just to be clear, I don't have any intention to actually implement what I am talking about here, this is merely a way to organize my thoughts. And to avoid nitpickers, yes, I know of Retlang.

Processes

Erlang's processes are granular and light weight. There is no real equivalent for OS threads or processes or even to .Net's AppDomains. A quick, back of the envelope, design for this in .Net would lead to the following interface:

public class ErlangProcess
{
	public ErlangProcess(ICommand cmd);
        public static ProcessHandle Current { get; } 
	public MailBox MailBox { get; }
	public IDictionary Dictionary { get; }

	public Action Execution { get; set; }
        public Func<Message> ExecutionFilter { get; set; }
}

A couple of interesting aspects of the design, we always start a process with a command. But we allow waiting using a delegate + filter. This is quite intentional, the reason is the spawn() API call, which looks like this:

public ProcessHandle Spawn<TCommand>();

We do not allow to pass an instance, only the type. The reason for that is that passing instances between processes opens up a chance for threading bugs. For that matter, we don't give you back a reference to the process either, just a handle to it.

Messages

Message passing is an important concept for Erlang, and something that I consider to be more and more essential to the way I think about software. Sending a message is trivial, all you need to do is call:

public void Send(ProcessHandle process, object msg);

Receiving  a message is a lot more complex, because you may have multiple receivers or conditional receivers. C#'s lack of pattern match syntax is really annoying in this regard (Boo has that, though :-) ). But I was able to come up with the following API:

public void Recieve<TMsg>(
	Action<TMsg> process);

public void Recieve<TMsg>(
	Expression<Func<TMsg, bool>> condition, 
	Action<TMsg> process);

A simple example of using this API would be:

public class PMap : ICommand
{
	List<object> results = new List<object>();

	public void Execute()
	{
		this.Receive(delegate(MapMessage msg)
		{
			foreach(var item in msg.Items)
			{
				Spawn<ActionExec>().Send(new ActionExecMessage(Self(), msg.Action, item));
				this.Recieve(delegate(ProcessedItemMessage itemMsg)
				{
					results.Add(itemMsg.Item);
					if(results.Count == msg.Items.Length)
						msg.Parent.Send(new ResultsMessage( results.ToArray() ));
				});
			}
		});
	}
}

This demonstrate a couple of important ideas. Chief among them is how we actually communicate between processes. Another interesting issue is the actual execution of this. Note that we have no threading involved, we are just registering to be notified at some date. When we have no more receivers registered, the process dies.

Execution environment

The processes should executed by something. In this case, I think we can define a very simple set of rules for the scheduler:

  • A process is in runnable state if it is has a message to process.
  • A process is in stopped state if there are no messages matching the current receivers list.
  • A process with no receivers is done, and will be killed.
  • A process may only run on a single thread at any given point.
  • It is allowed to move processes between threads (no thread affinity).
  • Processes have no priorities, but there is a preference for LIFO scheduling.
  • A process unit of work is the processing of a single message (not sure about that, though).
  • Since the only thing that can wake a process is a message, the responsibility for moving a process from stopped to runnable is at the hands of the process MailBox implementation.

Okay, that is enough for now.

Comments?

time to read 2 min | 269 words

I am writing about documentation at the moment, and I found myself writing the following:

I don’t think that I can emphasize enough how important it is to have a good first impression in the DSL. It can literally make or break your project. We should make above reasonable efforts to ensure that the first impression of the user from our system would be positive.

This includes investing time in building good looking UI, and snappy graphics. They might not actually have a lot of value for the project from a technical perspective, not even from the point of day to day usage in some cases, but they are crucially important from social engineering perspective.

A project that looks good is pleasant to use, easier to demo and in general easier to get funding for.

This also includes the documentation, if we can do something in a short amount of time; we get a level of trust from the users. “Hey, I can make it go bang!” is important to gain acceptance. The first stage should be a very easy one, even if you have to design to enable that specifically.

After reading that, I quickly added this as well:

Note, however, that you should be wary of creating a demoware project, one that is strictly focused on demoing well, and not actually add value in real world conditions. Such projects may demo well, and get funding and support, but they tend to fall into the land of tortureware very rapidly, making things harder to do, instead of easier.

Beware of the demoware.

time to read 2 min | 283 words

For some reason, I am having a lot of trouble lately getting Rhino Queues out of my head. I don't actually have a project to use it on, but it keep popping up in my mind. Before committing to the fifth or sixth rewrite of the project, I decided to take a stub at testing its performance.

The scenario is sending a hundred thousand (100,000) messages, with sizes ranging from 4Kb to 16Kb, which are the most common sizes for most messages.

On my local machine, I can push that amount of data in 02:02.02. So just above two minutes. This is for pushing 1,024,540,307 bytes, or just under a gigabyte in size. This translate to over 800 messages a second and close to 8.5 MB per second.

Considering that I am pushing everything over HTTP, that is just fine by me. But the real test would be to see how it works over the Internet.

I have a limited upload capacity here, so trying to push a gigabyte up would take too long. Instead, I confined myself to 5,000 messages, of the same varying sizes as before.

This time, over the Internet, it took 27:25:95 minutes. Just short of half an hour to push 51,135,538 bytes (~48Mb).

But that is testing throughput. While that is interesting on its own, it doesn't really tell us much about another important quality of the system, latency. Over the Internet, it took an average of 25 seconds to send a message batch. This is pretty fast, but I don't think it is great.

Anyway, getting back into the code didn't really help much, and I think that I will need to rewrite it again.

That Agile Thing

time to read 2 min | 367 words

Agile introduction is an interesting problem. One that I have learned to avoid. I am not feeling comfortable standing up to a business owner and saying (paraphrased, of course), "if you will do it my way, your life will be better". At least, not about development methodologies, I have no problem saying that about techniques, design or tools. The reason that I don't feel comfortable saying that is that there are too many issues surrounding agile introduction to talk confidently about the benefits.

I usually never mention agile to the customer at all. What I am doing, however, is insisting on a regularly scheduled demo. Usually every week or two. Oh, and I ask the customer what they want to see in the next demo. Given that I have a very short amount of time between demos, I can usually get a very concise set of features for the next demo.

Having demos every week or two really strengthen the confidence in the project. So from the point of view of the customer, we have regular demos and they get to choose what they will see in the next demo, that is all.

From the point of view of the team, having to have a demo every week or two makes a lot of difference in the way we have to work. As a simple example, I can't demo a UML diagram, so we can't have a six months design phase. Having to accept new requirements for each demo means that we need to enable ourselves to make changes very rapidly, so we go to practices such as TDD, Continuous Integration, etc. It also means that the design of the application tends to be much more light weight than it would be otherwise.

In other words, everything flows from the single initial requirement, having a demo every week or two.

Once you have that, you have a free reign (more or less) to implement agile practices, as they are needed, in order to get the demo up and running. You get customer involvement from the feature selection for the next demo, and you get customer buy-in from having the demos at frequent intervals, always showing additional value.

time to read 1 min | 76 words

Recently I was asked, in two different situations, what kind of a developer I am. I refused. I am not a C# developer, or a database developer, or an agile developer (I don't even know what that means).

If pressed, I would admit that I am mostly familiar with the .Net platform, but I am not going to limit myself to that. I don't even believe that trying to put such tabs on people is useful.

FUTURE POSTS

  1. RavenDB Storage Provider for Orleans - 16 hours from now
  2. Making the costs visible, then fixing them - 3 days from now
  3. Scaling HNSW in RavenDB: Optimizing for inadequate hardware - 5 days from now
  4. Optimizing the cost of clearing a set - 8 days from now

There are posts all the way to May 12, 2025

RECENT SERIES

  1. RavenDB News (2):
    02 May 2025 - May 2025
  2. Recording (15):
    30 Apr 2025 - Practical AI Integration with RavenDB
  3. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
  4. RavenDB (13):
    02 Apr 2025 - .NET Aspire integration
  5. RavenDB 7.1 (6):
    18 Mar 2025 - One IO Ring to rule them all
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}