Ayende @ Rahien

Refunds available at head office

SafeList, SafeDictionary–fast immutable structures

After the issues we run into with immutable structures, I decided that I still wanted to maintain the same guarantees that immutable data structures gave us, but that still maintain the speed of mutable structures.

I managed to do that by making different design choices when building my implementation. While the BCL immutable collections are based on binary trees, and attempt to spare GC pressure in favor of CPU cycles, I find that for a lot of the things that we do, we can spare the memory in favor of actually being faster overall. In particular, this is true since the immutable collections are order of magnitude slower for reads than their mutable counterparts. I could have dealt with the slow writes, somehow. But we do a lot of reads, and that is utterly unacceptable.

Here is how I implemented SafeList:

   1: public class SafeList<T> : IEnumerable<T>
   2: {
   3:     List<T> _inner = new List<T>();
   4:  
   5:     public static readonly SafeList<T> Empty = new SafeList<T>();
   6:  
   7:     private SafeList()
   8:     {
   9:  
  10:     }
  11:  
  12:     public SafeList<T> AddRange(IEnumerable<T> items)
  13:     {
  14:         var inner = new List<T>(_inner);
  15:         inner.AddRange(items);
  16:         return new SafeList<T>
  17:         {
  18:             _inner = inner
  19:         };
  20:     }
  21:  
  22:     public SafeList<T> Add(T item)
  23:     {
  24:         return new SafeList<T>
  25:         {
  26:             _inner = new List<T>(_inner) {item}
  27:         };
  28:     }
  29:  
  30:     public int Count
  31:     {
  32:         get { return _inner.Count; }
  33:     }
  34:  
  35:     public T this[int i]
  36:     {
  37:         get { return _inner[i]; }
  38:     }
  39:  
  40:  
  41:     public SafeList<T> RemoveAll(Func<T, bool> filter)
  42:     {
  43:         return new SafeList<T>
  44:         {
  45:             _inner = new List<T>(_inner.Where(x => filter(x) == false))
  46:         };
  47:     }
  48:  
  49:     public T Find(Predicate<T> predicate)
  50:     {
  51:         return _inner.Find(predicate);
  52:     }
  53:  
  54:     public SafeList<T> RemoveAllAndGetDiscards(Predicate<T> filter, out List<T> discards)
  55:     {
  56:         var list = new List<T>();
  57:  
  58:         discards = list;
  59:  
  60:         return new SafeList<T>
  61:         {
  62:             _inner = new List<T>(_inner.Where(x =>
  63:             {
  64:                 if (filter(x) == false)
  65:                 {
  66:                     list.Add(x);
  67:                     return false;
  68:                 }
  69:                 return true;
  70:             }))
  71:         };
  72:  
  73:     }
  74: }

The implementation for SafeDictionary is pretty similar as well. Note that I have dedicated & optimized methods for the type of things that I need in my code.

But the key part here is that I gain the safety by always creating another copy of the underlying implementation. We are never actually mutating anything.

Sure, this results in us having to deal with a lot more allocations, but we get the same speed for reads as you do for the standard mutable collections. And more importantly, we are still faster for writes.

Comments

Rafal
12/09/2013 10:11 AM by
Rafal

But now a simple 'Add' executes in linear time so I can't see how this can be faster than a mutable implementation.

Ayende Rahien
12/09/2013 10:15 AM by
Ayende Rahien

Rafal, Add happens a lot less often than reads. And it can't be faster than the mutable impl. We are trying to get something that is both safe, immutable and fast.

Rafal
12/09/2013 10:19 AM by
Rafal

Yeah, I was referring to your last sentence, that the immutable implementation is faster for writes. So probably this means 'faster than BCL collections', not 'faster than mutable collections'.

Steffen Forkmann
12/09/2013 10:57 AM by
Steffen Forkmann

Let's say it's fast for a special use case: * very few writes * very few different versions of the list

Fabian Wetzel
12/09/2013 11:12 AM by
Fabian Wetzel

Hi,

I am unsure if Add() is really safe in a multithreaded environment:

inner = new List(inner) {item}

isn't this only a shorter writing for;

inner = new List(inner); _inner.Add(item);

..if yes, then there exists a time, were _inner is visible to another thread while not fully constructed.

Ayende Rahien
12/09/2013 11:20 AM by
Ayende Rahien

Fabian, No, it goes to a temp variable first, so it is never exposed to the outside world before it happens.

Andy
12/09/2013 12:38 PM by
Andy

@FabianWetzel. Es it is indeed so. So the SafeList is not immutable if you can change it's state and keep the instance reference. And it is not thread safe, if the Add or any other changing method don't return a new collection, because it can be changed inbetween. If the Add/Remove calls are so infrequent and thread safety so negligible, you can use the fast mutable collections. Otherwise I see no other possibility than to make a compromise - whether don't bother about thread safety or about mutability. One possibility I see to use Interlocked.CompareExchange, what makes the implementation even slower than ImmutableXYZ

public void Add(T value)
{
    List<T> snapshot, newCache;
    do
    {
        snapshot = _inner;
        newCache = new List<T>(_inner) { value };
    } while (!ReferenceEquals(Interlocked.CompareExchange(ref _inner, newCache, snapshot), snapshot));
}
Ayende Rahien
12/09/2013 12:40 PM by
Ayende Rahien

Andy, Huh? Add returns a new list, and doesn't modify the existing instance.

Paul Turner
12/09/2013 01:20 PM by
Paul Turner

I might be inclined to declare _inner as IReadOnlyList, to make the intent (this state doesn't mutate) absolutely clear, except it doesn't expose the Find method.

It still could be valuable to refactor that field into a readonly field though. A helping check from the compiler to go "yes, this really never changes".

Paul Turner
12/09/2013 01:29 PM by
Paul Turner

Having looked at the implementation of Find, I think I'd be comfortable re-implementing this to get that IReadOnlyList:


public T Find(Predicate match)
{
    if (match == null)
    {
        throw new ArgumentNullException("match");
    }

    for (int i = 0; i 
Paul Turner
12/09/2013 01:31 PM by
Paul Turner

Damn, that < ate my code block!

public T Find(Predicate<T> match)
{
    if (match == null)
    {
        throw new ArgumentNullException("match");
    }

    for (int i = 0; i < _inner.Count; i++)
    {
        if (match(this._inner[i]))
        {
            return this._inner[i];
        }
    }

    return default(T);
}
Paul Turner
12/09/2013 02:45 PM by
Paul Turner

@Fabian

inner = new List<T>(inner) {item};
is effectively:

_inner = new List<T>(_inner);
_inner.Add(item);

Source: C# Language Specification - 7.5.10.3 Collection Initializers

In this scenario, it's safe, because

_inner
is a member of the
SafeList<T>
which is being composed and has not yet returned from the method. No other threads could possibly be able to access the instance yet.

Harry
12/09/2013 03:44 PM by
Harry

Would you be "leaking" an empty list every time you set _inner explicitly after creating new SafeList?

List _inner = new List();

return new SafeList { _inner = inner // Overwriting empty list, so this is "leaking" };

That seems like unnecessary GC pressure.

njy
12/09/2013 08:01 PM by
njy

Just 2 minor improvements: do not auto-initialize the _inner like @Harry said, and provide an addiitonal private ctor to pass an existing list instead of re-setting later, and check the AddRange parameter for null/empty param. Since the code using it will be generic and may receive empty lists, that's a minor optimization that doesn't cost a thing and may provide some gains here and there.

Gist here: https://gist.github.com/anonymous/edeca36cfc2a353c7c08

Knaģis
12/09/2013 08:08 PM by
Knaģis

Since List is using an array internally, you should initialize the new instances with precalculated Capacity, otherwise for certain number of items you will end up copying the data around twice.

http://www.dotnetframework.org/default.aspx/4@0/4@0/DEVDIV_TFS/Dev10/Releases/RTMRel/ndp/clr/src/BCL/System/Collections/Generic/List@cs/1305376/List@cs

njy
12/09/2013 08:08 PM by
njy

@Oren: one more thing, you know what bothers me? That each and every change will generate a new instance. What if, in a certain amount of time, there are a lot of writes and a few reads (like in a bulk insert scenario) or if if reads a lot but not "near" (from a time-related point of view) the writes? An idea, probably silly: why not use the impl of the IEnumerable.GetEnumerator() to store-and-return an enumerator for a new list only if the data is changed since the last time? This way if nobody make a read, no new list would be created. Does this make any sense?

Piers
12/09/2013 10:35 PM by
Piers

Hi Oren,

I'm probably way off here, but have you looked at how F# deals with immutable lists? Do they just use the same .Net Immutable Collections under the covers or are the classes in Microsoft.FSharp.Collections their own hand rolled implementations? You may be able to re-use these from your C# code.

kpvleeuwen
12/10/2013 10:59 AM by
kpvleeuwen

I think it is a bit confusing that RemoveAllAndGetDiscards returns the discards and has the list with items removed as out parameter - and is named discards.

Kijana Woodard
12/10/2013 07:51 PM by
Kijana Woodard

@kopvleeuwen??? "if (filter(x) == false)"

kpvleeuwen
12/23/2013 02:50 PM by
kpvleeuwen

Well, yes, the double-negative makes it end up in the wrong list. Please run the following test if you do not believe me: https://gist.github.com/koen-lee/8098277

Comments have been closed on this topic.