Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,969 | Comments: 44,490

filter by tags archive

Happy New Year, and across the board profiler discount


To celebrate the new year, I decided to offer a single day coupon that will get you 35% discount for all my profiler products.

The coupon code is valid until the 1st of January 2011, but I won’t mention in which timezone, so you might want to hurry up.

The coupon code is:

HNY-45K2D465DD

You can use it to buy:

Happy new year!

Public Service Announcement: Git master repositories for the Rhino Tools projects


There have been some changes, and it seems that it is hard to track them. Here are where you can find the master repositories for the rhino tools projects:

We are hiring!


In 2005, I got out of the army, and looked for a job. I felt that I had all the necessary qualifications (at the time, I had already written Rhino Mocks), but I run into an interesting problem. I didn’t have any commercial experience. I am sure that you are familiar with the tale. All the jobs require at least 2 – 3 years experience, and the list of qualification looked like someone just raided the TLA storage facilities.

That was pretty annoying at the time, and I wished I could do something about it. Now I can.

We aren’t hiring additional full time employees (at least not right now), but we are hiring interns. The main idea is to enable people to gain commercial experience while working on cool software.

Some information that you probably need to know:

  • You have to know to program in C#. If you have OSS credentials, that is all for the better, but it is not required.
  • This is a paid position, you are not expected to work for free.
  • It is not a remote position, our offices are located in the Southern Industrial Area in Hadera, Israel.
  • I’m not going to ask you to do all the annoying work. There is some annoying stuff to do (mostly docs), but look below to see what exactly I have in mind.

This position entails:

Working on our shipping products (the Uber Prof line of products and the Raven DB / Raven MQ line).

Among other things, I have earmarked the following features for you:

  • Raven DB’s query optimizer.
  • Raven DB’s auto sharding & scaling.
  • Raven DB’s clustering support.
  • Uber Prof’s production profiling feature.

The other things that are involved include all the usual stuff, bug fixes, doc writing and standard (in other words, not that exciting) features.

Duration:

This is a three to six months position.

What is required:

Commercial experience is not required, but I would want to see code that you wrote.

AnswerThis code should never hit production


Originally posted at 12/15/2010

Yesterday I asked what is wrong with the following code:

public ISet<string> GetTerms(string index, string field)
{
    if(field == null) throw new ArgumentNullException("field");
    if(index == null) throw new ArgumentNullException("index");
    
    var result = new HashSet<string>();
    var currentIndexSearcher = database.IndexStorage.GetCurrentIndexSearcher(index);
    IndexSearcher searcher;
    using(currentIndexSearcher.Use(out searcher))
    {
        var termEnum = searcher.GetIndexReader().Terms(new Term(field));
        while (field.Equals(termEnum.Term().Field()))
        {
           result.Add(termEnum.Term().Text());

            if (termEnum.Next() == false)
                break;
        }
    }

    return result;
}

The answer to that is quite simple, this code doesn’t have any paging available. What this means is if we executes this piece of code on an field with very high number of unique items (such as, for example, email addresses), we would return all the results in one shot. That is, if we can actually fit all of them to memory. Anything that can run over potentially unbounded result set should have paging as part of its basic API.

This is not optional.

Here is the correct piece of code:

public ISet<string> GetTerms(string index, string field, string fromValue, int pageSize)
{
    if(field == null) throw new ArgumentNullException("field");
    if(index == null) throw new ArgumentNullException("index");
    
    var result = new HashSet<string>();
    var currentIndexSearcher = database.IndexStorage.GetCurrentIndexSearcher(index);
    IndexSearcher searcher;
    using(currentIndexSearcher.Use(out searcher))
    {
        var termEnum = searcher.GetIndexReader().Terms(new Term(field, fromValue ?? string.Empty));
        if (string.IsNullOrEmpty(fromValue) == false)// need to skip this value
        {
            while(fromValue.Equals(termEnum.Term().Text()))
            {
                if (termEnum.Next() == false)
                    return result;
            }
        }
        while (field.Equals(termEnum.Term().Field()))
        {
            result.Add(termEnum.Term().Text());

            if (result.Count >= pageSize)
                break;

            if (termEnum.Next() == false)
                break;
        }
    }

    return result;
}

And that is quite efficient, even for searching large data sets.

For bonus points, the calling code ensures that pageSize cannot be too big :-)

ChallengeThis code should never hit production


This code should never have the chance to go to production, it is horribly broken in a rather subtle way, do you see it?

public ISet<string> GetTerms(string index, string field)
{
    if(field == null) throw new ArgumentNullException("field");
    if(index == null) throw new ArgumentNullException("index");
    
    var result = new HashSet<string>();
    var currentIndexSearcher = database.IndexStorage.GetCurrentIndexSearcher(index);
    IndexSearcher searcher;
    using(currentIndexSearcher.Use(out searcher))
    {
        var termEnum = searcher.GetIndexReader().Terms(new Term(field));
        while (field.Equals(termEnum.Term().Field()))
        {
           result.Add(termEnum.Term().Text());

            if (termEnum.Next() == false)
                break;
        }
    }

    return result;
}

As usual, I’ll post the answer tomorrow.

Detailed Answer: Your own ThreadLocal


Originally posted at 12/15/2010

In the last post, we have shown this code:

public class CloseableThreadLocal
{
    [ThreadStatic]
    public static Dictionary<object, object> slots;

    private readonly object holder = new object();
    private Dictionary<object, object> capturedSlots;

    private Dictionary<object, object> Slots
    {
        get
        {
            if (slots == null)
                slots = new Dictionary<object, object>();
            capturedSlots = slots;
            return slots;
        }
    }


    public /*protected internal*/ virtual Object InitialValue()
    {
        return null;
    }

    public virtual Object Get()
    {
        object val;

        lock (Slots)
        {
            if (Slots.TryGetValue(holder, out val))
            {
                return val;
            }
        }
        val = InitialValue();
        Set(val);
        return val;
    }

    public virtual void Set(object val)
    {
        lock (Slots)
        {
            Slots[holder] = val;
        }
    }

    public virtual void Close()
    {
        GC.SuppressFinalize(this);
        if (capturedSlots != null)
            capturedSlots.Remove(this);
    }

    ~CloseableThreadLocal()
    {
        if (capturedSlots == null)
            return;
        lock (capturedSlots)
            capturedSlots.Remove(holder);
    }
}

And then I asked whatever there are additional things that you may want to do here.

The obvious one is to thing about locking. For one thing, we have now introduced locking for everything. Would that be expensive?

The answer is that probably not. We can expect to have very little contention here, most of the operations are always going to be on the same thread, after all.

I would probably just change this to be a ConcurrentDictionary, though, and remove all explicit locking. And with that, we need to think whatever it would make sense to make this a static variable, rather than a thread static variable.

AnswerYour own ThreadLocal


Originally posted at 12/15/2010

Well, the problem on our last answer was that we didn’t protect ourselves from multi threaded access to the slots variable. Here is the code with this fixed:

public class CloseableThreadLocal
{
    [ThreadStatic]
    public static Dictionary<object, object> slots;

    private readonly object holder = new object();
    private Dictionary<object, object> capturedSlots;

    private Dictionary<object, object> Slots
    {
        get
        {
            if (slots == null)
                slots = new Dictionary<object, object>();
            capturedSlots = slots;
            return slots;
        }
    }


    public /*protected internal*/ virtual Object InitialValue()
    {
        return null;
    }

    public virtual Object Get()
    {
        object val;

        lock (Slots)
        {
            if (Slots.TryGetValue(holder, out val))
            {
                return val;
            }
        }
        val = InitialValue();
        Set(val);
        return val;
    }

    public virtual void Set(object val)
    {
        lock (Slots)
        {
            Slots[holder] = val;
        }
    }

    public virtual void Close()
    {
        GC.SuppressFinalize(this);
        if (capturedSlots != null)
            capturedSlots.Remove(this);
    }

    ~CloseableThreadLocal()
    {
        if (capturedSlots == null)
            return;
        lock (capturedSlots)
            capturedSlots.Remove(holder);
    }
}

Is this it? Are there still issues that we need to handle?

Wrong Answer #3: Your own Thread Local


Originally posted at 12/15/2010

Last time, we forgot that the slots dictionary is a thread static variable, and that the finalizer is going to run on another thread… Let us fix this, too:

public class CloseableThreadLocal
{
    [ThreadStatic]
    public static Dictionary<object, object> slots;

    private readonly object holder = new object();
    private Dictionary<object, object> capturedSlots;

    private Dictionary<object, object> Slots
    {
        get
        {
            if (slots == null)
                slots = new Dictionary<object, object>();
            capturedSlots = slots;
            return slots;
        }
    }


    public /*protected internal*/ virtual Object InitialValue()
    {
        return null;
    }

    public virtual Object Get()
    {
        object val;

        if (Slots.TryGetValue(holder, out val))
        {
            return val;
        }
        val = InitialValue();
        Set(val);
        return val;
    }

    public virtual void Set(object val)
    {
        Slots[holder] = val;
    }

    public virtual void Close()
    {
        GC.SuppressFinalize(this);
        if (capturedSlots != null)
            capturedSlots.Remove(this);
    }

    ~CloseableThreadLocal()
    {
        if (capturedSlots == null)
            return;
        capturedSlots.Remove(holder);
    }
}

And now it works!

Except… Under some very rare scenarios, it will not do so.

What are those scenarios? Why do we care? And how do we fix this?

Wrong Answer #2: Your own ThreadLocal


Originally posted at 12/15/2010

Well, last time we tried to introduce a finalizer, but we forgot that we were using our own instance as the key to the slots dictionary, which prevented us from being collected, hence we never run the finalizer.

All problems can be solved by adding an additional level of indirection… instead of using our own instance, let us use a separate instance:

public class CloseableThreadLocal
{
    [ThreadStatic] public static Dictionary<object, object> slots;

    object holder = new object();
    public static Dictionary<object, object> Slots
    {
        get { return slots ?? (slots = new Dictionary<object, object>()); }
    }

    public /*protected internal*/ virtual Object InitialValue()
    {
        return null;
    }

    public virtual Object Get()
    {
        object val;

        if (Slots.TryGetValue(holder, out val))
        {
            return val;
        }
        val = InitialValue();
        Set(val);
        return val;
    }

    public virtual void Set(object val)
    {
        Slots[holder] = val;
    }

    public virtual void Close()
    {
        GC.SuppressFinalize(this);
        if (slots != null)// intentionally using the field here, to avoid creating the instance
            slots.Remove(holder);
    }

    ~CloseableThreadLocal()
    {
        Close();
    }
}

Now it should work, right?

Expect that it doesn’t… We still see that the data isn’t cleaned up properly, even though the finalizer is run.

Why?

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Production postmortem (5):
    29 Jul 2015 - The evil licensing code
  2. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
  3. API Design (7):
    20 Jul 2015 - We’ll let the users sort it out
  4. What is new in RavenDB 3.5 (3):
    15 Jul 2015 - Exploring data in the dark
  5. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats