Ayende @ Rahien

Refunds available at head office

Immutable collections performance, take II

Why is the performance of an immutable list over 16 times slower than a standard list? I took a peek at what it was actually doing, and it made a lot of sense. In order to maintain efficient indexing access, the actual storage of the data in the immutable list is a binary tree. With the key being used as the indexer.

This result is a much higher cost for pretty much everything. Let us look at the following:

   1: var listsp = Stopwatch.StartNew();
   2: var list = new List<int>(Enumerable.Range(0, 10*1000*1000));
   3:  
   4: for (int i = 0; i < list.Count; i++)
   5: {
   6:     var _ = list[i];
   7: }
   8:  
   9: Console.WriteLine(listsp.Elapsed);
  10:  
  11: var ilist = ImmutableList<int>.Empty.AddRange(list);
  12: listsp.Restart();
  13:  
  14: for (int i = 0; i < ilist.Count; i++)
  15: {
  16:     var _ = ilist[i];
  17: }
  18: Console.WriteLine(listsp.Elapsed);

This List<T> is 0.23 seconds, ImmutableList<T> takes 1.28 seconds. When I use foreach, instead, we get 0.22 seconds vs. 2.29 seconds.

As you can see from the blog post describing them, because immutable collections are mostly implemented as binary trees, I don’t really think that there is a good way to approach this as is. The problem is that the immutable collections actually need to do a lot more than what I need them to do.

Now, it might have been more acceptable to use them if the perf was limited to just writing to them, it might have been acceptable. But as you can see, we have the same problem when we are reading, and that is quite unacceptable.

Comments

Steffen Forkmann
12/06/2013 10:18 AM by
Steffen Forkmann

Structures like Vector (see http://www.navision-blog.de/2012/05/29/porting-clojures-persistent-data-structures-to-net-part-1-of-n-persistentvector/ which comes originally from Clojure) are using Hash array mapped tries (see http://en.wikipedia.org/wiki/Hasharraymapped_trie) which have much better constant factors that binary or red/black trees.

There are also implementations that use transients to speed up bulk writes (See http://www.navision-blog.de/blog/2012/05/29/porting-clojures-persistent-data-structures-to-net-part-2-of-n-transientvector/).

Maybe you should look in that direction.

Howard Chu
12/06/2013 03:35 PM by
Howard Chu

Gosh, if only there was a nice fast immutable B+tree you could use instead of a binary tree. Oh wait, there is...

Ayende Rahien
12/07/2013 06:49 PM by
Ayende Rahien

Howard, B+Tree isn't really what we want to do here. To start with, we cannot afford the minimum 4KB cost for every modification. Worse, we don't want O(logN) search perf, I really want to get the O(1) performance that I can get from a hash table.

Steffen Forkmann
12/08/2013 08:56 AM by
Steffen Forkmann

There is no way to get a theoretical O(1), but Vector does at most 6 hops for all practical problems. So it's quasi O(1).

Ayende Rahien
12/08/2013 09:37 AM by
Ayende Rahien

Steffen, The problem isn't with list/vector. My major problem is with dictionary.

Steffen Forkmann
12/08/2013 10:04 AM by
Steffen Forkmann

You can implement a map on top of vector.

https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/APersistentMap.java

Comments have been closed on this topic.