Ayende @ Rahien

Oren Eini aka Ayende Rahien CEO of Hibernating Rhinos LTD, which develops RavenDB, a NoSQL Open Source Document Database.

Get in touch with me:

oren@ravendb.net

+972 52-548-6969

Posts: 7,430 | Comments: 50,880

Privacy Policy Terms
filter by tags archive
time to read 2 min | 329 words

When you search for some text in RavenDB, you’ll use case insensitive search by default. This means that when you run this query:

image

You’ll get users with any capitalization of “Oren”. You can ask RavenDB to do a case sensitive search, like so:

image

In this case, you’ll find only exact matches, including casing.  So far, that isn’t really surprising, right?

Under what conditions will you need to do searches like that? Well, it is usually when the data itself is case sensitive. User names on Unix are a good example of that, but you may also have Base64 data (where case matters), product keys, etc.

What is interesting is that this is a property of the field, usually.

Now, how does RavenDB handles this scenario? One option would be to index the data as is and compare it using a case insensitive comparator. That ends up being quite expensive, usually. It’s cheaper by far to normalize the text and compare it using ordinals.

The exact() method tells us how the field is supposed to be treated. This is done at indexing time. If we want to be able to query using both case-sensitive and case-insensitive manner, we need to have two fields. Here is what this looks like:

image

We indexed the name field twice, marking it as case sensitive for the second index field.

Here is what actually happens behind the scenes because of this configuration:

image

 

The analyzer used determines the terms that are generated per index field. The first index field (Name) is using the default LowerCaseKeywordAnalyzer analyzer, while the second index field (ExactName) is using the default exact KeywordAnalyzer analyzer.

time to read 1 min | 130 words

I am currently teaching a RavenDB Course, and we were just talking about indexing. In particular, Search Indexes, like the one below:

image

After we defined this guy, we took a look at the stats.

As you can see, indexing 1 million documents took just over 2 minutes (full text support enabled). More interesting, you can see how we rapidly increased the number of items that we indexed to finish indexing everything faster.

image

Quite nice.

FUTURE POSTS

  1. Integer compression: Using SIMD bit packing in practice - 10 hours from now
  2. Talk: Scalable Architecture From the Ground Up - about one day from now
  3. Integer compression: SIMD bit packing and unusual usages - 4 days from now
  4. Integer compression: Understanding FastPFor - 5 days from now
  5. Integer compression: The FastPFor code - 6 days from now

There are posts all the way to Jun 14, 2023

RECENT SERIES

  1. Integer compression (6):
    07 Jun 2023 - Understanding Simd Compression by Lemire
  2. Talk (7):
    14 Dec 2021 - Scalable architecture from the ground up
  3. Fight for every byte it takes (6):
    01 May 2023 - Decoding the entries
  4. Looking into Corax’s posting lists (3):
    17 Apr 2023 - Part III
  5. Recording (8):
    17 Feb 2023 - RavenDB Usage Patterns
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats