Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,953 | Comments: 44,410

filter by tags archive

ChallengeThis code should never hit production


This code should never have the chance to go to production, it is horribly broken in a rather subtle way, do you see it?

public ISet<string> GetTerms(string index, string field)
{
    if(field == null) throw new ArgumentNullException("field");
    if(index == null) throw new ArgumentNullException("index");
    
    var result = new HashSet<string>();
    var currentIndexSearcher = database.IndexStorage.GetCurrentIndexSearcher(index);
    IndexSearcher searcher;
    using(currentIndexSearcher.Use(out searcher))
    {
        var termEnum = searcher.GetIndexReader().Terms(new Term(field));
        while (field.Equals(termEnum.Term().Field()))
        {
           result.Add(termEnum.Term().Text());

            if (termEnum.Next() == false)
                break;
        }
    }

    return result;
}

As usual, I’ll post the answer tomorrow.

More posts in "Challenge" series:

  1. (28 Apr 2015) What is the meaning of this change?
  2. (26 Sep 2013) Spot the bug
  3. (27 May 2013) The problem of locking down tasks…
  4. (17 Oct 2011) Minimum number of round trips
  5. (23 Aug 2011) Recent Comments with Future Posts
  6. (02 Aug 2011) Modifying execution approaches
  7. (29 Apr 2011) Stop the leaks
  8. (23 Dec 2010) This code should never hit production
  9. (17 Dec 2010) Your own ThreadLocal
  10. (03 Dec 2010) Querying relative information with RavenDB
  11. (29 Jun 2010) Find the bug
  12. (23 Jun 2010) Dynamically dynamic
  13. (28 Apr 2010) What killed the application?
  14. (19 Mar 2010) What does this code do?
  15. (04 Mar 2010) Robust enumeration over external code
  16. (16 Feb 2010) Premature optimization, and all of that…
  17. (12 Feb 2010) Efficient querying
  18. (10 Feb 2010) Find the resource leak
  19. (21 Oct 2009) Can you spot the bug?
  20. (18 Oct 2009) Why is this wrong?
  21. (17 Oct 2009) Write the check in comment
  22. (15 Sep 2009) NH Prof Exporting Reports
  23. (02 Sep 2009) The lazy loaded inheritance many to one association OR/M conundrum
  24. (01 Sep 2009) Why isn’t select broken?
  25. (06 Aug 2009) Find the bug fixes
  26. (26 May 2009) Find the bug
  27. (14 May 2009) multi threaded test failure
  28. (11 May 2009) The regex that doesn’t match
  29. (24 Mar 2009) probability based selection
  30. (13 Mar 2009) C# Rewriting
  31. (18 Feb 2009) write a self extracting program
  32. (04 Sep 2008) Don't stop with the first DSL abstraction
  33. (02 Aug 2008) What is the problem?
  34. (28 Jul 2008) What does this code do?
  35. (26 Jul 2008) Find the bug fix
  36. (05 Jul 2008) Find the deadlock
  37. (03 Jul 2008) Find the bug
  38. (02 Jul 2008) What is wrong with this code
  39. (05 Jun 2008) why did the tests fail?
  40. (27 May 2008) Striving for better syntax
  41. (13 Apr 2008) calling generics without the generic type
  42. (12 Apr 2008) The directory tree
  43. (24 Mar 2008) Find the version
  44. (21 Jan 2008) Strongly typing weakly typed code
  45. (28 Jun 2007) Windsor Null Object Dependency Facility

Comments

Ryan Heath

Are you talking about API?

It seems currentIndexSearcher is a IndexSearcher too?

One could use currentIndexSearcher without resorting to currentIndexSearcher.Use ...

Another thing that seems wrong is the using statement, which implies a limited scope. Is the out param searcher bound to that scope?

If so, then the API is asking, no, demanding for problems ;)

// Ryan

Patrick Huizinga

Easy, you throw the ArgumentNullExceptions in the wrong order. :-P

But what happens if __termEnum has zero terms?

@Ryan, in the shown code __searcher is only used inside the using block, so I don't think Ayende had that in mind. Though I do wonder why this particular api was chosen instead of returning a disposable searcher.

tobi
  1. the code assumes that the termEnum will contain at least one item (the nex() check must be above the first usage)

  2. the indexreader is not disposed (don't know if that is necessary).

Ayende Rahien

Ryan,

This is a way to handle a value that may change by another thread, thing about it as ref countring.

cbp
cbp

What is being disposed by the using statement - whatever gets returned by the Use() method I presume - which could be the currentIndexSearcher itself if it is a fluent interface. The IndexSearch won't be disposed, neither will the reader returned by GetIndexReader. The code is not very clear.

Marc

what happens if index is an empty string or refers to a non-existant indexer.

If currentSearchIndex is then null you get problems with the .Use

Ayende Rahien

cbp,

The problem isn't with the use statement.

evereq

1) Preconditions are just too simple! both checked for != null, but do not checked to be not empty for example!

2) a lof of additional checks (asserts) are missed inside method body, for example: 'searcher' can be null, but code attempt to use it! Same as for 'currentIndexSearcher' - also can be potentially null! I.e. code author assume that he know interface for IndexStorage.GetCurrentIndexSearcher and currentIndexSearcher.Use methods and fact that they should throw InvalidOperationException in case if can't finish they work and return valid values! That potentially can be changed later (say after some refactoring), introducing problems for that client code! I.e. insert more asserts!

3) Method have name GetTerms, so why he return set of strings, but not instead say Dictionary of Term objects (i.e. key will be Term text and value will be term objects)? I.e. code author limit significantly possible reuse of method with so nice name! (who know, maybe in version 3.x of Lucene terms will have more properties than just field and text?)

4) termEnum.Next() should be called before call to termEnum.Term()? At least it's what I remember, needs to check on Lucene docs..

5) Should you Close() terms enum?

:)

Yitzchok

if(string.IsNullOrWhiteSpace(field) throw new ArgumentNullException("field"); ...

Why do you need "field.Equals(..)"?

Dalibor Čarapić

I find it amusing when Ayende posts some code without any context and then asks to spot something wrong. People desperately try to find something wrong with it and find bazilion code 'smells' which probably have nothing to do with the problem in question.

Good luck.

Ayende Rahien

Evereq,

1 & 2) All of which would generate a nice exception is happened.

3) Because terms is what it returns?

4) Not accurate for the usage specified.

5) Yes, fixed, thanks, but not what I meant.

Ayende Rahien

Dalibor,

Yep, that is the case.

I consider the problem glaringly obvious, and no one sees it

Diego

Hi. I've been reading your blog from a long time, and I must say that your blog became really boring.

All you do is:

1) publish your commercial products (Uber Prof, the key-value DB, etc) and

2) paste code fragments as challenges.

I still know you're a really good programmer, it's just your blog isn't good anymore (it's my opinion of course).

Diego

Ayende Rahien

Diego,

I am sorry to hear that, but you are of course free to stop reading me

Ryan Heath

Ok, I'll give it another shot:

Should you return IEnumerable <string instead of ISet <string ?

IEnumerable does not allow you to change the sequence while via ISet.Add one can.

Of course one could cast to ISet ...

and ISet implies that there are no double entries ...

// Ryan

evereq

@Ayende and Dalibor! Any code like that contains a lot of issues (actually ANY code contain issues :D, ALL the code!), so it's just interesting to try to catch some of them! Most of issues can be founded even without deep knowledge of context or even C# etc! :) So Ayende +1 to post such questions! I found it interesting to at least see how other people think, what direction they go etc :)

Regarding question: maybe it's HashSet? I.e. it's cannot contain duplicates? ;-)

P.S. about 3) from my comment about: it's very questionable question what is terms! In Lucene term is OBJECT (or better to say class), i.e. it's not only text, but it's also some other data, at least 'field' in current Lucene version! So because it's just made sense to continue with "object thinking", I am sure it's better to return Terms collection (terms objects) than just strings from method with name "GetTerms" - i.e. behavior should be expressed in domain terms, not on primitive data types. And Lucene / storage domain consists from "Terms", not just strings :) So sure it's completely valid to return strings, but it have a "smell" for me :)

Tom
Tom

The resultlist is compared item by item to parameter called field.

When it does not match the function returns.

So it does not necessarily go through the whole resultset and therefore might return an incomplete result.

Naiem

I don't know what the API does but the whole while condition looks wierd to me.

while (field.Equals(termEnum.Term().Field()))

If the searcher returns terms that relate to the given field, then what is the purpose of this condition. If it doesn't and Term() has other fields, then the loop can break anytime, since I don't think searcher returns sorted results.

However, the rest of the code is too simple to be terribly broken!

Kevin Fairclough

Are duplicates based on case important/required?

Felix

termEnum.Term() probably evaluate to null if there is no term...

Jason Meckley

There is no limit to the size of the result. performance would suffer if the result returned 1000s or millions of items.

evereq

@Naiem etc: seems nothing wrong with

while (field.Equals(termEnum.Term().Field()))

because previous line (i.e. var termEnum = searcher.GetIndexReader().Terms(new Term(field));) extracts all terms STARTING at a given term (in our case starting with a empty term but in given field). But agree - code looks more like a hack, than as normal one :D

Brad White

Diego,

I fully agree with you. I keep the blog in my RSS reader on the off chance that something interesting is posted. It use to be that all the posts were very relevant to software developers, and of general interest.

Now it has become very commercial and when we are not being subtly marketed to we get "code challenges," which for the most part are out of context chunks of code, are just used as SEO fodder and filler between the marketing.

Sigh. Another good blog bites the dust.

David Pendray

Diego, Brad,

And yet this is one of the most well-commented .net blogs around... especially posts such as these... go figure

Oleksii

Hi Ayende,

I think you get a NullReference exception at this line of code (as the searcher has never been initialized):

var termEnum = searcher.GetIndexReader().Terms(new Term(field));

Oleksii

evereq

ha! The question was "This code should never have the chance to go to production, it is horribly broken in a rather subtle way, do you see it?"

So if Jason right, we can't put such code into production? What if we know that all our documents are small and contain very small amount of terms (say max few thousands - such code WILL work with that amount!)? What about rule that such optimizations should be done when try are really required? etc!

I.e. I am not so happy with that answer if it is correct one: it's not horribly broken code because of his performance - it's more likely that code broken because of

a) Hacks when it was possible to made task without hacks!

b) Code not safe enough - absence of right pre-conditions, post-conditions, invariants, asserts.. i.e. no design by contract, its not "defensive" style of programming etc

c) BUGS in code (like the one others and me point out)

d) Code is 10s lines of code, but actually do same as Lucene method IndexReader.Terms(term) :D I.e. all that huge code with bugs was possible to replace with one single line without ANY hacks :D (Ok, if you need additional where you feel free to use LinQ here)

So, really disappointed with "question / answer" pair! Perfect question, perfect answer (i.e. Jason do correct!), but just both did not feet each other IMHO :)

James Curran

Ayende...

I did suggest a means of spicing up this blog. (BTW, did you get the last email on the topic; I sent it about 10 days ago...)

Andrey Titov

I guess termEnum.Next() should be called before accessing termEnum.Term(). It likely it should works as IEnumerable's MoveNext() and Current, otherwise termEnum should guarantee there is always at least one element.

NC
NC

That code should never go into production because it's rubbish. It doesn't read at all. It makes absolutely no sense. This is the first piece of code you've posted that's made me say 'WTF' out loud.

Code fails.

Steve Py

Well my glaring obvious point would be that the comparison is case-sensitive. Not knowing the requirements & expected behaviour of the app since I didn't write it, this is the most obvious thing that comes to mind that might be unexpected.

Luke Schafer

Diego and Brad - for people making commercial ventures, his insights are great. For people who need a break at work, the code challenges are great. I read more than I ever did.

NC - no, I think the problem here is you.

Nadav

don't know if its the problem you're talking about, but it seems that the while statement expects the temsEnum to be sorted by this:

termEnum.Term().Field()

and i guess the condition in the while statement is expected to go over the terms with the field given as parameter and then stop when reaching terms with .Field() different than the given field, because if it is sorted, you can stop here, or, of course, when reached the end of the enumeration...

And if i'm correct, the problem might be when the terms with .Field() that equals to the given field are in the middle (at least not right at the beggining), it won't get into the while statement at all, and it should have iterated until it finds something that equals to the given term before reaching the while statement:

while (!field.Equals(termEnum.Term().Field()) && termEnum.Next() );

this will skip the terms with different .Field()

Brad

Actually Luke, I agree with NC. That code is pretty horrible.

Ayende Rahien

Evereq,

1) Sorry, but there are no hacks here.

2) The code would fail under certain condition. It would throw an exception. But it would not corrupt state, and the only action I could take would be to throw an exception myself.

3) There was one bug that I saw and fixed, but bugs are not barrier for prod.

Ayende Rahien

James,

Sorry, I didn't get that email, can you resend?

Ayende Rahien

Steve,

The code actually reads a lower-case only data.

tytusse

Not knowing the API I would guess:

database.IndexStorage.GetCurrentIndexSearcher(index) might return null (I can imagine, that index with given name might not exist), which will result in NullPointerException in using() clause?

evereq

@Ayende:

1) When I speak about hacks, I just mean that according to comments here, most of developers can't get it after first reading! I.e. it's hard to be sure that code works without bugs... Sure it's not really 'hacks', but for many developers it looks that way. And it always better to keep your code easy to read / understand by others :) Reread comments to figure out what most of guys here think as "hacks" - every time they point that something wrong, while it was not, it's a Hack! :) At least I think so :)

2) Sure, agree... i.e. it would not corrupt state probably (especially if you fix bug with TermEnum that can lead to leak of resources), but still it was possible to simplify future factoring if you add more static / dynamic checks and use design by contract or at least check more in pre-conditions!

3) 'bugs are not barrier for prod." :D It's depends how significant bugs and who is your customer :) !

P.S. Are you sure that really Next() should not be called before Term()??? I remember that, maybe from Java, but it was here :)

Ayende Rahien

Evereq,

1) You are not supposed to be able to understand it at first reading. There is a lot of missing context that you don't have available.

2) If it is an exception anyway, I wouldn't bother. This is deeply internal code, the code above it should make any required checks.

3) barrier to prod is a design problem. Bugs are easily fixed, design issues, not so much

About Next(), not in the provided scenario, no. When you search for an item, it gets to the first item.

bob
bob

I still come here because it's like tdwtf but a special Israeli edition

.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats