Challenge: This code should never hit production

architecture (618) rss
bugs (451) rss
challanges (123) rss
community (381) rss
databases (481) rss
design (896) rss
development (647) rss
hibernating-practices (72) rss
miscellaneous (592) rss
performance (397) rss
programming (1093) rss
raven (1459) rss
ravendb.net (545) rss
reviews (184) rss

2025
- August (6)
- July (7)
- June (7)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

Dec 23 2010

ChallengeThis code should never hit production

time to read 2 min | 273 words

This code should never have the chance to go to production, it is horribly broken in a rather subtle way, do you see it?

public ISet<string> GetTerms(string index, string field)
{
    if(field == null) throw new ArgumentNullException("field");
    if(index == null) throw new ArgumentNullException("index");
    
    var result = new HashSet<string>();
    var currentIndexSearcher = database.IndexStorage.GetCurrentIndexSearcher(index);
    IndexSearcher searcher;
    using(currentIndexSearcher.Use(out searcher))
    {
        var termEnum = searcher.GetIndexReader().Terms(new Term(field));
        while (field.Equals(termEnum.Term().Field()))
        {
           result.Add(termEnum.Term().Text());

            if (termEnum.Next() == false)
                break;
        }
    }

    return result;
}

As usual, I’ll post the answer tomorrow.

Tweet Share Share 41 comments

Comments

23 Dec 2010
10:36 AM

Ryan Heath

Are you talking about API?

It seems currentIndexSearcher is a IndexSearcher too?

One could use currentIndexSearcher without resorting to currentIndexSearcher.Use ...

Another thing that seems wrong is the using statement, which implies a limited scope. Is the out param searcher bound to that scope?

If so, then the API is asking, no, demanding for problems ;)

// Ryan

23 Dec 2010
11:06 AM

Patrick Huizinga

Easy, you throw the ArgumentNullExceptions in the wrong order. :-P

But what happens if __termEnum has zero terms?

@Ryan, in the shown code __searcher is only used inside the using block, so I don't think Ayende had that in mind. Though I do wonder why this particular api was chosen instead of returning a disposable searcher.

23 Dec 2010
11:23 AM

tobi

the code assumes that the termEnum will contain at least one item (the nex() check must be above the first usage)
the indexreader is not disposed (don't know if that is necessary).

23 Dec 2010
12:09 PM

Ayende Rahien

Ryan,

This is a way to handle a value that may change by another thread, thing about it as ref countring.

23 Dec 2010
12:16 PM

cbp

What is being disposed by the using statement - whatever gets returned by the Use() method I presume - which could be the currentIndexSearcher itself if it is a fluent interface. The IndexSearch won't be disposed, neither will the reader returned by GetIndexReader. The code is not very clear.

23 Dec 2010
12:19 PM

Marc

what happens if index is an empty string or refers to a non-existant indexer.

If currentSearchIndex is then null you get problems with the .Use

23 Dec 2010
12:19 PM

Ayende Rahien

cbp,

The problem isn't with the use statement.

23 Dec 2010
12:31 PM

evereq

1) Preconditions are just too simple! both checked for != null, but do not checked to be not empty for example!

2) a lof of additional checks (asserts) are missed inside method body, for example: 'searcher' can be null, but code attempt to use it! Same as for 'currentIndexSearcher' - also can be potentially null! I.e. code author assume that he know interface for IndexStorage.GetCurrentIndexSearcher and currentIndexSearcher.Use methods and fact that they should throw InvalidOperationException in case if can't finish they work and return valid values! That potentially can be changed later (say after some refactoring), introducing problems for that client code! I.e. insert more asserts!

3) Method have name GetTerms, so why he return set of strings, but not instead say Dictionary of Term objects (i.e. key will be Term text and value will be term objects)? I.e. code author limit significantly possible reuse of method with so nice name! (who know, maybe in version 3.x of Lucene terms will have more properties than just field and text?)

4) termEnum.Next() should be called before call to termEnum.Term()? At least it's what I remember, needs to check on Lucene docs..

5) Should you Close() terms enum?

23 Dec 2010
12:33 PM

Yitzchok

if(string.IsNullOrWhiteSpace(field) throw new ArgumentNullException("field"); ...

Why do you need "field.Equals(..)"?

23 Dec 2010
12:49 PM

Dalibor Čarapić

I find it amusing when Ayende posts some code without any context and then asks to spot something wrong. People desperately try to find something wrong with it and find bazilion code 'smells' which probably have nothing to do with the problem in question.

Good luck.

23 Dec 2010
13:00 PM

Ayende Rahien

Evereq,

1 & 2) All of which would generate a nice exception is happened.

3) Because terms is what it returns?

4) Not accurate for the usage specified.

5) Yes, fixed, thanks, but not what I meant.

23 Dec 2010
13:00 PM

Ayende Rahien

Dalibor,

Yep, that is the case.

I consider the problem glaringly obvious, and no one sees it

23 Dec 2010
13:08 PM

Diego

Hi. I've been reading your blog from a long time, and I must say that your blog became really boring.

All you do is:

1) publish your commercial products (Uber Prof, the key-value DB, etc) and

2) paste code fragments as challenges.

I still know you're a really good programmer, it's just your blog isn't good anymore (it's my opinion of course).

Diego

23 Dec 2010
13:14 PM

Ayende Rahien

Diego,

I am sorry to hear that, but you are of course free to stop reading me

23 Dec 2010
13:18 PM

Ryan Heath

Ok, I'll give it another shot:

Should you return IEnumerable <string instead of ISet <string ?

IEnumerable does not allow you to change the sequence while via ISet.Add one can.

Of course one could cast to ISet ...

and ISet implies that there are no double entries ...

// Ryan

23 Dec 2010
13:20 PM

evereq

@Ayende and Dalibor! Any code like that contains a lot of issues (actually ANY code contain issues :D, ALL the code!), so it's just interesting to try to catch some of them! Most of issues can be founded even without deep knowledge of context or even C# etc! :) So Ayende +1 to post such questions! I found it interesting to at least see how other people think, what direction they go etc :)

Regarding question: maybe it's HashSet? I.e. it's cannot contain duplicates? ;-)

P.S. about 3) from my comment about: it's very questionable question what is terms! In Lucene term is OBJECT (or better to say class), i.e. it's not only text, but it's also some other data, at least 'field' in current Lucene version! So because it's just made sense to continue with "object thinking", I am sure it's better to return Terms collection (terms objects) than just strings from method with name "GetTerms" - i.e. behavior should be expressed in domain terms, not on primitive data types. And Lucene / storage domain consists from "Terms", not just strings :) So sure it's completely valid to return strings, but it have a "smell" for me :)

23 Dec 2010
13:21 PM

Tom

The resultlist is compared item by item to parameter called field.

When it does not match the function returns.

So it does not necessarily go through the whole resultset and therefore might return an incomplete result.

23 Dec 2010
13:29 PM

Naiem

I don't know what the API does but the whole while condition looks wierd to me.

while (field.Equals(termEnum.Term().Field()))

If the searcher returns terms that relate to the given field, then what is the purpose of this condition. If it doesn't and Term() has other fields, then the loop can break anytime, since I don't think searcher returns sorted results.

However, the rest of the code is too simple to be terribly broken!

23 Dec 2010
13:46 PM

Kevin Fairclough

Are duplicates based on case important/required?

23 Dec 2010
13:56 PM

Felix

termEnum.Term() probably evaluate to null if there is no term...

23 Dec 2010
14:01 PM

Jason Meckley

There is no limit to the size of the result. performance would suffer if the result returned 1000s or millions of items.

23 Dec 2010
14:02 PM

evereq

@Naiem etc: seems nothing wrong with

while (field.Equals(termEnum.Term().Field()))

because previous line (i.e. var termEnum = searcher.GetIndexReader().Terms(new Term(field));) extracts all terms STARTING at a given term (in our case starting with a empty term but in given field). But agree - code looks more like a hack, than as normal one :D

23 Dec 2010
14:07 PM

Brad White

Diego,

I fully agree with you. I keep the blog in my RSS reader on the off chance that something interesting is posted. It use to be that all the posts were very relevant to software developers, and of general interest.

Now it has become very commercial and when we are not being subtly marketed to we get "code challenges," which for the most part are out of context chunks of code, are just used as SEO fodder and filler between the marketing.

Sigh. Another good blog bites the dust.

23 Dec 2010
14:23 PM

David Pendray

Diego, Brad,

And yet this is one of the most well-commented .net blogs around... especially posts such as these... go figure

23 Dec 2010
15:14 PM

Oleksii

Hi Ayende,

I think you get a NullReference exception at this line of code (as the searcher has never been initialized):

var termEnum = searcher.GetIndexReader().Terms(new Term(field));

Oleksii

23 Dec 2010
15:17 PM

Ayende Rahien

Jason,

Yeah!

23 Dec 2010
15:48 PM

evereq

ha! The question was "This code should never have the chance to go to production, it is horribly broken in a rather subtle way, do you see it?"

So if Jason right, we can't put such code into production? What if we know that all our documents are small and contain very small amount of terms (say max few thousands - such code WILL work with that amount!)? What about rule that such optimizations should be done when try are really required? etc!

I.e. I am not so happy with that answer if it is correct one: it's not horribly broken code because of his performance - it's more likely that code broken because of

a) Hacks when it was possible to made task without hacks!

b) Code not safe enough - absence of right pre-conditions, post-conditions, invariants, asserts.. i.e. no design by contract, its not "defensive" style of programming etc

c) BUGS in code (like the one others and me point out)

d) Code is 10s lines of code, but actually do same as Lucene method IndexReader.Terms(term) :D I.e. all that huge code with bugs was possible to replace with one single line without ANY hacks :D (Ok, if you need additional where you feel free to use LinQ here)

So, really disappointed with "question / answer" pair! Perfect question, perfect answer (i.e. Jason do correct!), but just both did not feet each other IMHO :)

23 Dec 2010
16:31 PM

James Curran

Ayende...

I did suggest a means of spicing up this blog. (BTW, did you get the last email on the topic; I sent it about 10 days ago...)

23 Dec 2010
18:00 PM

Andrey Titov

I guess termEnum.Next() should be called before accessing termEnum.Term(). It likely it should works as IEnumerable's MoveNext() and Current, otherwise termEnum should guarantee there is always at least one element.

23 Dec 2010
22:45 PM

That code should never go into production because it's rubbish. It doesn't read at all. It makes absolutely no sense. This is the first piece of code you've posted that's made me say 'WTF' out loud.

Code fails.

23 Dec 2010
23:07 PM

Steve Py

Well my glaring obvious point would be that the comparison is case-sensitive. Not knowing the requirements & expected behaviour of the app since I didn't write it, this is the most obvious thing that comes to mind that might be unexpected.

23 Dec 2010
23:22 PM

Luke Schafer

Diego and Brad - for people making commercial ventures, his insights are great. For people who need a break at work, the code challenges are great. I read more than I ever did.

NC - no, I think the problem here is you.

23 Dec 2010
23:49 PM

Nadav

don't know if its the problem you're talking about, but it seems that the while statement expects the temsEnum to be sorted by this:

termEnum.Term().Field()

and i guess the condition in the while statement is expected to go over the terms with the field given as parameter and then stop when reaching terms with .Field() different than the given field, because if it is sorted, you can stop here, or, of course, when reached the end of the enumeration...

And if i'm correct, the problem might be when the terms with .Field() that equals to the given field are in the middle (at least not right at the beggining), it won't get into the while statement at all, and it should have iterated until it finds something that equals to the given term before reaching the while statement:

while (!field.Equals(termEnum.Term().Field()) && termEnum.Next() );

this will skip the terms with different .Field()

24 Dec 2010
00:02 AM

Brad

Actually Luke, I agree with NC. That code is pretty horrible.

24 Dec 2010
04:33 AM

Ayende Rahien

Evereq,

1) Sorry, but there are no hacks here.

2) The code would fail under certain condition. It would throw an exception. But it would not corrupt state, and the only action I could take would be to throw an exception myself.

3) There was one bug that I saw and fixed, but bugs are not barrier for prod.

24 Dec 2010
04:34 AM

Ayende Rahien

James,

Sorry, I didn't get that email, can you resend?

24 Dec 2010
04:35 AM

Ayende Rahien

Steve,

The code actually reads a lower-case only data.

24 Dec 2010
10:01 AM

tytusse

Not knowing the API I would guess:

database.IndexStorage.GetCurrentIndexSearcher(index) might return null (I can imagine, that index with given name might not exist), which will result in NullPointerException in using() clause?

24 Dec 2010
12:57 PM

evereq

@Ayende:

1) When I speak about hacks, I just mean that according to comments here, most of developers can't get it after first reading! I.e. it's hard to be sure that code works without bugs... Sure it's not really 'hacks', but for many developers it looks that way. And it always better to keep your code easy to read / understand by others :) Reread comments to figure out what most of guys here think as "hacks" - every time they point that something wrong, while it was not, it's a Hack! :) At least I think so :)

2) Sure, agree... i.e. it would not corrupt state probably (especially if you fix bug with TermEnum that can lead to leak of resources), but still it was possible to simplify future factoring if you add more static / dynamic checks and use design by contract or at least check more in pre-conditions!

3) 'bugs are not barrier for prod." :D It's depends how significant bugs and who is your customer :) !

P.S. Are you sure that really Next() should not be called before Term()??? I remember that, maybe from Java, but it was here :)

25 Dec 2010
14:34 PM

Ayende Rahien

Evereq,

1) You are not supposed to be able to understand it at first reading. There is a lot of missing context that you don't have available.

2) If it is an exception anyway, I wouldn't bother. This is deeply internal code, the code above it should make any required checks.

3) barrier to prod is a design problem. Bugs are easily fixed, design issues, not so much

About Next(), not in the provided scenario, no. When you search for an item, it gets to the first item.

26 Dec 2010
22:29 PM

bob

I still come here because it's like tdwtf but a special Israeli edition

Comment preview

Comments have been closed on this topic.

Markdown turns plain text formatting into fancy HTML formatting.

Phrase Emphasis

*italic*   **bold**
_italic_   __bold__

Links

Inline:

An [example](http://url.com/ "Title")

Reference-style labels (titles are optional):

An [example][id]. Then, anywhere
else in the doc, define the link:
  [id]: http://example.com/  "Title"

Images

Inline (titles are optional):

![alt text](/path/img.jpg "Title")

Reference-style:

![alt text][id]
[id]: /url/to/img.jpg "Title"

Headers

Setext-style:

Header 1
========
Header 2
--------

atx-style (closing #'s are optional):

# Header 1 #
## Header 2 ##
###### Header 6

Lists

Ordered, without paragraphs:

1.  Foo
2.  Bar

Unordered, with paragraphs:

*   A list item.
    With multiple paragraphs.
*   Bar

You can nest them:

*   Abacus
    * answer
*   Bubbles
    1.  bunk
    2.  bupkis
        * BELITTLER
    3. burper
*   Cunning

Blockquotes

> Email-style angle brackets
> are used for blockquotes.
> > And, they can be nested.
> #### Headers in blockquotes
> 
> * You can quote a list.
> * Etc.

Horizontal Rules

Three or more dashes or asterisks:

---
* * *
- - - -

Manual Line Breaks

End a line with two or more spaces:

Roses are red,   
Violets are blue.

Fenced Code Blocks

Code blocks delimited by 3 or more backticks or tildas:

```
This is a preformatted
code block
```

Header IDs

Set the id of headings with {#<id>} at end of heading line:

## My Heading {#myheading}

Tables

Fruit    |Color
---------|----------
Apples   |Red
Pears	 |Green
Bananas  |Yellow

Definition Lists

Term 1
: Definition 1
Term 2
: Definition 2

Footnotes

Body text with a footnote [^1]
[^1]: Footnote text here

Abbreviations

MDD <- will have title
*[MDD]: MarkdownDeep

Oren Eini

Oren Eini

CEO of RavenDB

ChallengeThis code should never hit production

More posts in "Challenge" series:

Comments

Comment preview

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed

Oren Eini

CEO of RavenDB

Related posts that you may find interesting:

More posts in "Challenge" series:

Comments

Comment preview

Markdown formatting

Phrase Emphasis

Links

Images

Headers

Lists

Blockquotes

Horizontal Rules

Manual Line Breaks

Fenced Code Blocks

Header IDs

Tables

Definition Lists

Footnotes

Abbreviations

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication