Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 18 | Comments: 66

filter by tags archive

Lucene is beautiful

time to read 1 min | 191 words

So, after I finished telling you how much I don’t like the lucene.net codebase, what is this post about?

Well, I don’t like the code, but then again, I generally don’t like to read low level code. The ideas behind Lucene are actually quite amazingly powerful in their simplicity.

At its core, Lucene is just a set of sorted dictionaries on disk (greatly simplified, I know). Everything else is build on top of that, and if you grok what is going on there, you would be quite amazed at the number of things that this has made possible.

Indexing in Lucene is done by a pipeline of documents and fields and analyzers, which all participate together to generate those dictionaries. Searching in lucene is done by traversing those dictionaries in various ways, and combining the results in interesting ways.

I am not going to go into details about how it works, you can read all about that here. The important thing is that once you have grasped the essential structure inside lucene, the rest are just details.

The concept and the way the implementation fell out are quite beautiful.


Comments

tobi

It really is a NoSQL store. It uses sorted string tables on disk and merges them periodically, just like Cassandra and probably others.

Chris

I love this sentence: "amazingly powerful in their simplicity". That might be the best thought I've read in a long time.

Bart Czernicki

Anyone that is a fan on Lucene.NET should buy that book (even though it is targeted for Java), because Lucene.NET (version 3.03 RC2) matches that version of the book exactly. Lucene 2.x != 3.x (big changes especially in the Analyzer classes)

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

  1. RavenDB 3.0 New Stable Release - 3 hours from now
  2. Production postmortem: The case of the lying configuration file - about one day from now
  3. Production postmortem: The industry at large - 2 days from now
  4. The insidious cost of allocations - 3 days from now
  5. Buffer allocation strategies: A possible solution - 6 days from now

And 4 more posts are pending...

There are posts all the way to Sep 11, 2015

RECENT SERIES

  1. Find the bug (5):
    20 Apr 2011 - Why do I get a Null Reference Exception?
  2. Production postmortem (10):
    31 Aug 2015 - The case of the memory eater and high load
  3. What is new in RavenDB 3.5 (7):
    12 Aug 2015 - Monitoring support
  4. Career planning (6):
    24 Jul 2015 - The immortal choices aren't
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats