Geo Location & Spatial Searches with RavenDB–Part IV-Searching
Now we have all of the data loaded in, we need to be able to search on it. In order to do that, we define the following index:
It is a very simple one, mapping the start and end of each range for each location.
The next step is actually doing the search, and this is where we run into some issues. The problem was with the data:
Let us take the first range and translate that to IP addresses in the format that you are probably more used to:
Start: 0.177.195.68 End: 255.177.195.68
Yep, it is little endian vs. big endian here to bite us once more.
It took me a while to figure it out, I’ll admit. In other words, we have to reverse the IP address before we can search on it properly. Thankfully, that is easily done, and we have the following masterpiece:
The data source that we have only support IPv4, so that is what we allow. We reverse the IP, then do a range search based on this.
Now we can use it like this:
var location = session.GetLocationByIp(IPAddress.Parse("209.85.217.172"));
Which tells us that this is a Mountain View, CA, USA address.
More importantly for our purposes, it tells us that this is located at: 37.4192, -122.0574 We will use that shortly to do spatial searches for RavenDB events near you, which I’ll discuss in my next post.
Oh, and just for fun. You might remember that in previous posts I mentioned that MaxMind (the source for this geo location information) had to create its own binary format because relational databases took several second to process each query?
The query above completed in 53 milliseconds on my machine, without any tuning on our part. And that is before we introduce caching into the mix.
Comments
Has anybody tried to load it into relational database, add some indexes, and perform such query (apart from MaxMind, just to verify)? I somehow find it hard to belive that it takes several seconds...
AG, I assume that MaxMind did.
@Ayende of course he did, what I meant is that somebody should verify that. Perhaps he is doing something wrong there. By the size of the data, it doesn't seem to me that it should take that long. Anyway that's a thing to try.
Index lookup in a relational database? It takes no more than few milliseconds. I don't have to do any coding to know that... Probably MaxMind guys were talking about database without any indexing that you get by importing the csv file in dumbest possible way. Only then it would take few seconds to do a table scan...
Materialized views are the genius of RavenDB.
Not sure why RavenDB tries to proliferate denormalization on the writing side. I think we should write normalized and read denormalized. That's the best of both worlds.
I think that denormalization on the writing side is generally completely misguided.
Tobi, Very simple reason, it is HARD to track denormalization on reads.
So equivalent sql query run on my machine (nothing special really...) and ms sql (with the database imported but no optimisation at all) takes something between 60 and 170 ms. Tried for a few IPs. That is waaay less than several seconds, and has plenty of places it can be improved (with indexes).
Ayende,
that might be a reason. I just know that materialized views in SQL Server are a joy to use. I write normalized, I read with full performance. Joins become no-ops at runtime. Often it is possible to save on sorting and filtering as well. Complex queries become single range scans on some index.
Tobi, I wrote a bunch of posts about materialized views a few years back, they are great, but in RDMBS they have severe limitations, and for the same reason you can't really do full denormalization during read.
AG, I agree it's only 2 mill rows, Microsoft Sql will do such a search in milliseconds, no doubt about that.
Max Mind is terrible wrong, but that is not up to discuss that :), but maybe they were talking about Excel LOL
Comment preview