ChallengeThis code should never hit production
This code should never have the chance to go to production, it is horribly broken in a rather subtle way, do you see it?
public ISet<string> GetTerms(string index, string field) { if(field == null) throw new ArgumentNullException("field"); if(index == null) throw new ArgumentNullException("index"); var result = new HashSet<string>(); var currentIndexSearcher = database.IndexStorage.GetCurrentIndexSearcher(index); IndexSearcher searcher; using(currentIndexSearcher.Use(out searcher)) { var termEnum = searcher.GetIndexReader().Terms(new Term(field)); while (field.Equals(termEnum.Term().Field())) { result.Add(termEnum.Term().Text()); if (termEnum.Next() == false) break; } } return result; }
As usual, I’ll post the answer tomorrow.
More posts in "Challenge" series:
- (01 Jul 2024) Efficient snapshotable state
- (13 Oct 2023) Fastest node selection metastable error state–answer
- (12 Oct 2023) Fastest node selection metastable error state
- (19 Sep 2023) Spot the bug
- (04 Jan 2023) what does this code print?
- (14 Dec 2022) What does this code print?
- (01 Jul 2022) Find the stack smash bug… – answer
- (30 Jun 2022) Find the stack smash bug…
- (03 Jun 2022) Spot the data corruption
- (06 May 2022) Spot the optimization–solution
- (05 May 2022) Spot the optimization
- (06 Apr 2022) Why is this code broken?
- (16 Dec 2021) Find the slow down–answer
- (15 Dec 2021) Find the slow down
- (03 Nov 2021) The code review bug that gives me nightmares–The fix
- (02 Nov 2021) The code review bug that gives me nightmares–the issue
- (01 Nov 2021) The code review bug that gives me nightmares
- (16 Jun 2021) Detecting livelihood in a distributed cluster
- (21 Apr 2020) Generate matching shard id–answer
- (20 Apr 2020) Generate matching shard id
- (02 Jan 2020) Spot the bug in the stream
- (28 Sep 2018) The loop that leaks–Answer
- (27 Sep 2018) The loop that leaks
- (03 Apr 2018) The invisible concurrency bug–Answer
- (02 Apr 2018) The invisible concurrency bug
- (31 Jan 2018) Find the bug in the fix–answer
- (30 Jan 2018) Find the bug in the fix
- (19 Jan 2017) What does this code do?
- (26 Jul 2016) The race condition in the TCP stack, answer
- (25 Jul 2016) The race condition in the TCP stack
- (28 Apr 2015) What is the meaning of this change?
- (26 Sep 2013) Spot the bug
- (27 May 2013) The problem of locking down tasks…
- (17 Oct 2011) Minimum number of round trips
- (23 Aug 2011) Recent Comments with Future Posts
- (02 Aug 2011) Modifying execution approaches
- (29 Apr 2011) Stop the leaks
- (23 Dec 2010) This code should never hit production
- (17 Dec 2010) Your own ThreadLocal
- (03 Dec 2010) Querying relative information with RavenDB
- (29 Jun 2010) Find the bug
- (23 Jun 2010) Dynamically dynamic
- (28 Apr 2010) What killed the application?
- (19 Mar 2010) What does this code do?
- (04 Mar 2010) Robust enumeration over external code
- (16 Feb 2010) Premature optimization, and all of that…
- (12 Feb 2010) Efficient querying
- (10 Feb 2010) Find the resource leak
- (21 Oct 2009) Can you spot the bug?
- (18 Oct 2009) Why is this wrong?
- (17 Oct 2009) Write the check in comment
- (15 Sep 2009) NH Prof Exporting Reports
- (02 Sep 2009) The lazy loaded inheritance many to one association OR/M conundrum
- (01 Sep 2009) Why isn’t select broken?
- (06 Aug 2009) Find the bug fixes
- (26 May 2009) Find the bug
- (14 May 2009) multi threaded test failure
- (11 May 2009) The regex that doesn’t match
- (24 Mar 2009) probability based selection
- (13 Mar 2009) C# Rewriting
- (18 Feb 2009) write a self extracting program
- (04 Sep 2008) Don't stop with the first DSL abstraction
- (02 Aug 2008) What is the problem?
- (28 Jul 2008) What does this code do?
- (26 Jul 2008) Find the bug fix
- (05 Jul 2008) Find the deadlock
- (03 Jul 2008) Find the bug
- (02 Jul 2008) What is wrong with this code
- (05 Jun 2008) why did the tests fail?
- (27 May 2008) Striving for better syntax
- (13 Apr 2008) calling generics without the generic type
- (12 Apr 2008) The directory tree
- (24 Mar 2008) Find the version
- (21 Jan 2008) Strongly typing weakly typed code
- (28 Jun 2007) Windsor Null Object Dependency Facility
Comments
Are you talking about API?
It seems currentIndexSearcher is a IndexSearcher too?
One could use currentIndexSearcher without resorting to currentIndexSearcher.Use ...
Another thing that seems wrong is the using statement, which implies a limited scope. Is the out param searcher bound to that scope?
If so, then the API is asking, no, demanding for problems ;)
// Ryan
Easy, you throw the ArgumentNullExceptions in the wrong order. :-P
But what happens if __termEnum has zero terms?
@Ryan, in the shown code __searcher is only used inside the using block, so I don't think Ayende had that in mind. Though I do wonder why this particular api was chosen instead of returning a disposable searcher.
the code assumes that the termEnum will contain at least one item (the nex() check must be above the first usage)
the indexreader is not disposed (don't know if that is necessary).
Ryan,
This is a way to handle a value that may change by another thread, thing about it as ref countring.
What is being disposed by the using statement - whatever gets returned by the Use() method I presume - which could be the currentIndexSearcher itself if it is a fluent interface. The IndexSearch won't be disposed, neither will the reader returned by GetIndexReader. The code is not very clear.
what happens if index is an empty string or refers to a non-existant indexer.
If currentSearchIndex is then null you get problems with the .Use
cbp,
The problem isn't with the use statement.
1) Preconditions are just too simple! both checked for != null, but do not checked to be not empty for example!
2) a lof of additional checks (asserts) are missed inside method body, for example: 'searcher' can be null, but code attempt to use it! Same as for 'currentIndexSearcher' - also can be potentially null! I.e. code author assume that he know interface for IndexStorage.GetCurrentIndexSearcher and currentIndexSearcher.Use methods and fact that they should throw InvalidOperationException in case if can't finish they work and return valid values! That potentially can be changed later (say after some refactoring), introducing problems for that client code! I.e. insert more asserts!
3) Method have name GetTerms, so why he return set of strings, but not instead say Dictionary of Term objects (i.e. key will be Term text and value will be term objects)? I.e. code author limit significantly possible reuse of method with so nice name! (who know, maybe in version 3.x of Lucene terms will have more properties than just field and text?)
4) termEnum.Next() should be called before call to termEnum.Term()? At least it's what I remember, needs to check on Lucene docs..
5) Should you Close() terms enum?
:)
if(string.IsNullOrWhiteSpace(field) throw new ArgumentNullException("field"); ...
Why do you need "field.Equals(..)"?
I find it amusing when Ayende posts some code without any context and then asks to spot something wrong. People desperately try to find something wrong with it and find bazilion code 'smells' which probably have nothing to do with the problem in question.
Good luck.
Evereq,
1 & 2) All of which would generate a nice exception is happened.
3) Because terms is what it returns?
4) Not accurate for the usage specified.
5) Yes, fixed, thanks, but not what I meant.
Dalibor,
Yep, that is the case.
I consider the problem glaringly obvious, and no one sees it
Hi. I've been reading your blog from a long time, and I must say that your blog became really boring.
All you do is:
1) publish your commercial products (Uber Prof, the key-value DB, etc) and
2) paste code fragments as challenges.
I still know you're a really good programmer, it's just your blog isn't good anymore (it's my opinion of course).
Diego
Diego,
I am sorry to hear that, but you are of course free to stop reading me
Ok, I'll give it another shot:
Should you return IEnumerable <string instead of ISet <string ?
IEnumerable does not allow you to change the sequence while via ISet.Add one can.
Of course one could cast to ISet ...
and ISet implies that there are no double entries ...
// Ryan
@Ayende and Dalibor! Any code like that contains a lot of issues (actually ANY code contain issues :D, ALL the code!), so it's just interesting to try to catch some of them! Most of issues can be founded even without deep knowledge of context or even C# etc! :) So Ayende +1 to post such questions! I found it interesting to at least see how other people think, what direction they go etc :)
Regarding question: maybe it's HashSet? I.e. it's cannot contain duplicates? ;-)
P.S. about 3) from my comment about: it's very questionable question what is terms! In Lucene term is OBJECT (or better to say class), i.e. it's not only text, but it's also some other data, at least 'field' in current Lucene version! So because it's just made sense to continue with "object thinking", I am sure it's better to return Terms collection (terms objects) than just strings from method with name "GetTerms" - i.e. behavior should be expressed in domain terms, not on primitive data types. And Lucene / storage domain consists from "Terms", not just strings :) So sure it's completely valid to return strings, but it have a "smell" for me :)
The resultlist is compared item by item to parameter called field.
When it does not match the function returns.
So it does not necessarily go through the whole resultset and therefore might return an incomplete result.
I don't know what the API does but the whole while condition looks wierd to me.
while (field.Equals(termEnum.Term().Field()))
If the searcher returns terms that relate to the given field, then what is the purpose of this condition. If it doesn't and Term() has other fields, then the loop can break anytime, since I don't think searcher returns sorted results.
However, the rest of the code is too simple to be terribly broken!
Are duplicates based on case important/required?
termEnum.Term() probably evaluate to null if there is no term...
There is no limit to the size of the result. performance would suffer if the result returned 1000s or millions of items.
@Naiem etc: seems nothing wrong with
while (field.Equals(termEnum.Term().Field()))
because previous line (i.e. var termEnum = searcher.GetIndexReader().Terms(new Term(field));) extracts all terms STARTING at a given term (in our case starting with a empty term but in given field). But agree - code looks more like a hack, than as normal one :D
Diego,
I fully agree with you. I keep the blog in my RSS reader on the off chance that something interesting is posted. It use to be that all the posts were very relevant to software developers, and of general interest.
Now it has become very commercial and when we are not being subtly marketed to we get "code challenges," which for the most part are out of context chunks of code, are just used as SEO fodder and filler between the marketing.
Sigh. Another good blog bites the dust.
Diego, Brad,
And yet this is one of the most well-commented .net blogs around... especially posts such as these... go figure
Hi Ayende,
I think you get a NullReference exception at this line of code (as the searcher has never been initialized):
var termEnum = searcher.GetIndexReader().Terms(new Term(field));
Oleksii
Jason,
Yeah!
ha! The question was "This code should never have the chance to go to production, it is horribly broken in a rather subtle way, do you see it?"
So if Jason right, we can't put such code into production? What if we know that all our documents are small and contain very small amount of terms (say max few thousands - such code WILL work with that amount!)? What about rule that such optimizations should be done when try are really required? etc!
I.e. I am not so happy with that answer if it is correct one: it's not horribly broken code because of his performance - it's more likely that code broken because of
a) Hacks when it was possible to made task without hacks!
b) Code not safe enough - absence of right pre-conditions, post-conditions, invariants, asserts.. i.e. no design by contract, its not "defensive" style of programming etc
c) BUGS in code (like the one others and me point out)
d) Code is 10s lines of code, but actually do same as Lucene method IndexReader.Terms(term) :D I.e. all that huge code with bugs was possible to replace with one single line without ANY hacks :D (Ok, if you need additional where you feel free to use LinQ here)
So, really disappointed with "question / answer" pair! Perfect question, perfect answer (i.e. Jason do correct!), but just both did not feet each other IMHO :)
Ayende...
I did suggest a means of spicing up this blog. (BTW, did you get the last email on the topic; I sent it about 10 days ago...)
I guess termEnum.Next() should be called before accessing termEnum.Term(). It likely it should works as IEnumerable's MoveNext() and Current, otherwise termEnum should guarantee there is always at least one element.
That code should never go into production because it's rubbish. It doesn't read at all. It makes absolutely no sense. This is the first piece of code you've posted that's made me say 'WTF' out loud.
Code fails.
Well my glaring obvious point would be that the comparison is case-sensitive. Not knowing the requirements & expected behaviour of the app since I didn't write it, this is the most obvious thing that comes to mind that might be unexpected.
Diego and Brad - for people making commercial ventures, his insights are great. For people who need a break at work, the code challenges are great. I read more than I ever did.
NC - no, I think the problem here is you.
don't know if its the problem you're talking about, but it seems that the while statement expects the temsEnum to be sorted by this:
termEnum.Term().Field()
and i guess the condition in the while statement is expected to go over the terms with the field given as parameter and then stop when reaching terms with .Field() different than the given field, because if it is sorted, you can stop here, or, of course, when reached the end of the enumeration...
And if i'm correct, the problem might be when the terms with .Field() that equals to the given field are in the middle (at least not right at the beggining), it won't get into the while statement at all, and it should have iterated until it finds something that equals to the given term before reaching the while statement:
while (!field.Equals(termEnum.Term().Field()) && termEnum.Next() );
this will skip the terms with different .Field()
Actually Luke, I agree with NC. That code is pretty horrible.
Evereq,
1) Sorry, but there are no hacks here.
2) The code would fail under certain condition. It would throw an exception. But it would not corrupt state, and the only action I could take would be to throw an exception myself.
3) There was one bug that I saw and fixed, but bugs are not barrier for prod.
James,
Sorry, I didn't get that email, can you resend?
Steve,
The code actually reads a lower-case only data.
Not knowing the API I would guess:
database.IndexStorage.GetCurrentIndexSearcher(index) might return null (I can imagine, that index with given name might not exist), which will result in NullPointerException in using() clause?
@Ayende:
1) When I speak about hacks, I just mean that according to comments here, most of developers can't get it after first reading! I.e. it's hard to be sure that code works without bugs... Sure it's not really 'hacks', but for many developers it looks that way. And it always better to keep your code easy to read / understand by others :) Reread comments to figure out what most of guys here think as "hacks" - every time they point that something wrong, while it was not, it's a Hack! :) At least I think so :)
2) Sure, agree... i.e. it would not corrupt state probably (especially if you fix bug with TermEnum that can lead to leak of resources), but still it was possible to simplify future factoring if you add more static / dynamic checks and use design by contract or at least check more in pre-conditions!
3) 'bugs are not barrier for prod." :D It's depends how significant bugs and who is your customer :) !
P.S. Are you sure that really Next() should not be called before Term()??? I remember that, maybe from Java, but it was here :)
Evereq,
1) You are not supposed to be able to understand it at first reading. There is a lot of missing context that you don't have available.
2) If it is an exception anyway, I wouldn't bother. This is deeply internal code, the code above it should make any required checks.
3) barrier to prod is a design problem. Bugs are easily fixed, design issues, not so much
About Next(), not in the provided scenario, no. When you search for an item, it gets to the first item.
I still come here because it's like tdwtf but a special Israeli edition
.
Comment preview