Geo Location & Spatial Searches with RavenDB–Part IV-Searching

Jun 21 2012

Geo Location & Spatial Searches with RavenDB–Part IV-Searching

time to read 3 min | 451 words

Now we have all of the data loaded in, we need to be able to search on it. In order to do that, we define the following index:

It is a very simple one, mapping the start and end of each range for each location.

The next step is actually doing the search, and this is where we run into some issues. The problem was with the data:

Let us take the first range and translate that to IP addresses in the format that you are probably more used to:

Start: 0.177.195.68 End: 255.177.195.68

Yep, it is little endian vs. big endian here to bite us once more.

It took me a while to figure it out, I’ll admit. In other words, we have to reverse the IP address before we can search on it properly. Thankfully, that is easily done, and we have the following masterpiece:

The data source that we have only support IPv4, so that is what we allow. We reverse the IP, then do a range search based on this.

Now we can use it like this:

var location = session.GetLocationByIp(IPAddress.Parse("209.85.217.172"));

Which tells us that this is a Mountain View, CA, USA address.

More importantly for our purposes, it tells us that this is located at: 37.4192, -122.0574 We will use that shortly to do spatial searches for RavenDB events near you, which I’ll discuss in my next post.

Oh, and just for fun. You might remember that in previous posts I mentioned that MaxMind (the source for this geo location information) had to create its own binary format because relational databases took several second to process each query?

The query above completed in 53 milliseconds on my machine, without any tuning on our part. And that is before we introduce caching into the mix.

Tweet Share Share 10 comments

Tags:

raven

Comments

21 Jun 2012
10:34 AM

Has anybody tried to load it into relational database, add some indexes, and perform such query (apart from MaxMind, just to verify)? I somehow find it hard to belive that it takes several seconds...

21 Jun 2012
10:37 AM

Ayende Rahien

AG, I assume that MaxMind did.

21 Jun 2012
10:43 AM

@Ayende of course he did, what I meant is that somebody should verify that. Perhaps he is doing something wrong there. By the size of the data, it doesn't seem to me that it should take that long. Anyway that's a thing to try.

21 Jun 2012
12:17 PM

Rafal

Index lookup in a relational database? It takes no more than few milliseconds. I don't have to do any coding to know that... Probably MaxMind guys were talking about database without any indexing that you get by importing the csv file in dumbest possible way. Only then it would take few seconds to do a table scan...

21 Jun 2012
13:55 PM

tobi

Materialized views are the genius of RavenDB.

Not sure why RavenDB tries to proliferate denormalization on the writing side. I think we should write normalized and read denormalized. That's the best of both worlds.

I think that denormalization on the writing side is generally completely misguided.

21 Jun 2012
13:57 PM

Ayende Rahien

Tobi, Very simple reason, it is HARD to track denormalization on reads.

21 Jun 2012
14:17 PM

So equivalent sql query run on my machine (nothing special really...) and ms sql (with the database imported but no optimisation at all) takes something between 60 and 170 ms. Tried for a few IPs. That is waaay less than several seconds, and has plenty of places it can be improved (with indexes).

21 Jun 2012
14:32 PM

tobi

Ayende,

that might be a reason. I just know that materialized views in SQL Server are a joy to use. I write normalized, I read with full performance. Joins become no-ops at runtime. Often it is possible to save on sorting and filtering as well. Complex queries become single range scans on some index.

21 Jun 2012
14:42 PM

Ayende Rahien

Tobi, I wrote a bunch of posts about materialized views a few years back, they are great, but in RDMBS they have severe limitations, and for the same reason you can't really do full denormalization during read.

21 Jun 2012
21:22 PM

Janus007

AG, I agree it's only 2 mill rows, Microsoft Sql will do such a search in milliseconds, no doubt about that.

Max Mind is terrible wrong, but that is not up to discuss that :), but maybe they were talking about Excel LOL

Comment preview

Comments have been closed on this topic.

Oren Eini

Oren Eini

CEO of RavenDB