Interview questionfix the index

time to read 4 min | 630 words

This is something that goes into the “what to ask a candidate”.

Given the following class:

public class Indexer
{
    private Dictionary<string, List<string>> terms = 
        new Dictionary<string, List<string>>(StringComparer.OrdinalIgnoreCase);

    public void Index(string docId, string text)
    {
        var words = text.Split();
        foreach (var term in words)
        {
            List<string> val;
            if (terms.TryGetValue(term, out val) == false)
            {
                val = new List<string>();
                terms[term] = val;
            }
            val.Add(docId);
        }
    }

    public List<string> Query(string term)
    {
        List<string> val;
        terms.TryGetValue(term, out val);
        return val ?? new List<string>();
    }
}

This class have the following tests:

public class IndexTests
{
    [Fact]
    public void CanIndexAndQuery()
    {
        var index = new Indexer();
        index.Index("users/1", "Oren Eini");
        index.Index("users/2", "Hibernating Rhinos");

        Assert.Contains("users/1", index.Query("eini"));
        Assert.Contains("users/2", index.Query("rhinos"));
    }

    [Fact]
    public void CanUpdate()
    {
        var index = new Indexer();
        index.Index("users/1", "Oren Eini");
        //updating
        index.Index("users/1", "Ayende Rahien");

        Assert.Contains("users/1", index.Query("Rahien"));
        Assert.Empty(index.Query("eini"));
    }
}

The first test passes, but the second fails.

The task is to get the CanUpdate test to pass, while keeping memory utilization and CPU costs as small as possible. You can change the internal implementation of the Indexer as you see fit.

After CanUpdate is passing, implement a Delete(string docId) method.

More posts in "Interview question" series:

  1. (29 Sep 2016) Stackoverflow THAT
  2. (30 Mar 2015) fix the index
  3. (27 Aug 2014) That annoying 3rd party service