Ayende @ Rahien

Refunds available at head office

RavenDB On Azure

It took a while, but it is here. The most requested feature on the Azure Store is here:

Embedded image permalink

This is currently only available on the East US region. That is going to change, but it will take a bit of time. You can vote on which regions you want RavenHQ on Azure to expand to.

RavenHQ on Azure can be used in one of two ways. You can purchase it via the Azure Marketplace, in which case you have to deal only with a single invoice, and you can manage everything through the Azure site. However, the Azure Marketplace doesn’t currently support prorated and tiered billing, which means that the plans that you purchase in the marketplace have hard limits on data. You could also purchase those same plans directly from RavenHQ and take advantage of usage based billing, which allows you to use more storage than what’s included in the plan at a prorated cost.

RavenHQ is now offering a lower price point for replicated plans, so you don’t have to think twice before jumping into the high availability option.

Tags:

Published at

Originally posted at

Comments (2)

What is my query doing?

Recently we had to deal with several customers support requests about slow queries in RavenDB. Now, just to give you some idea about the scope. We consider a query slow if it takes more than 50ms to execute (excluding client side caching).

In this case, we had gotten reports about queries that took multiple seconds to run. That was strange, and we were able to reproduce this locally, at which point we were hit with a “Duh!” moment. In all cases, the underlying issue wasn’t that the query took a long time to execute, it was that the result of the query was very large. Typical documents were in the multi megabyte ranges, and the query returned scores of those. That means that the actual cost of the query was just transporting the data to the client.

Let us imagine that you have this query:

session.Query<User>()
.Where(x => x.Age >= 21)
.ToList();

And for some reason it is slower than you would like. The first thing to do would probably be to see what is the raw execution times on the server side:
RavenQueryStatistics queryStats;
session.Query<User>()
.Customize(x=>x.ShowTimings())
.Statistics(out queryStats)
.Where(x => x.Age > 15)
.ToList();

Now you have the following information:
  • queryStats.DurationMilliseconds – the server side total query execution time
  • queryStats.TimingsInMilliseconds – the server side query execution time, per each distinct operation
    • Lucene search – the time to query the Lucene index
    • Loading documents – the time to load the relevant documents from disk
    • Transforming results – the time to execute the result transformer, if any
  • queryStats.ResultSize – the uncompressed size of the response from the server

This should give you a good indication on the relative costs.

In most cases, the issue was resolved by the customer by specifying a transformer and selecting just the properties they needed for their use case. That transformed (pun intended) a query that returned 50+ MB to one that returned 67Kb.

Tags:

Published at

Originally posted at

Metrics hunt in RavenDB

You might have noticed that we are paying a lot of options for operational concerns in RavenDB 3.0. This is especially true because we moved away from performance counters to metrics.net, which means that is is much easier and light weight to add metrics to RavenDB.

As a result of that, we are adding a lot of stuff that will be very useful for ops team. From monitoring the duration of queries to the bandwidth available for replication to a host of other stuff.

What I wanted to ask is what kind of things do you want us to track?

Tags:

Published at

Originally posted at

Comments (8)

Small touches: Complex text in RavenDB

This is what we call a “mini feature”, something that you’ll probably not notice unless pointed out to you. Often, we want to store documents that contain multi line strings properties. JSON has a very simple way to handle that:

image

And it works, and if the text is small, it is even readable. But it isn’t really working on anything even remotely complex or long. So we have worked to fix that:

image

Now you can actually read this much more easily. We run into this when we look at stack trace information, where without line breaks, it is nearly impossible to see what is going on.

Tags:

Published at

Originally posted at

Comments (11)

RavenDB Replication Topology Visualizer

Even more goodies are coming in RavenDB 3.0. Below you can see how to visualize the replication topology in a RavenDB Cluster. You can also see that the t5 database is down (marked as red).

image

This is important, since this gives us the ability to check the status of the topology from the point of view of the actual nodes. So a node might be up for one server, but not for the other, and this will show up here.

Beside, it is a cool graphic that you can use in your system documentation and it is much easier to explain Smile.

Tags:

Published at

Originally posted at

Comments (4)

Geo distribution and high availability in RavenDB

A customer asks in the mailing list:

Due to data protection requirements, we have to store a users data closest to where they signed up. For example if I sign up and I’m in London, my data should be stored in the EU.

Given this, how do we ensure when replicating (we will have level 4 redundancy eventually), that any data originally written to a node within say the EU does not get replicated to a node in the states?

The answer here is to use to features of RavenDB together. Sharding and Replication. It is a good thing that they are orthogonal and can work together seamlessly.

Here is how it looks like:

image

The London based user will be sharded to the Ireland server. This server will be replicating to other Ireland based server (or to other servers in the EU). The data never leaves the EU (satisfying the data protection rules), but we get the high availability that we desire.

At the same time, Canadian customers will be served from a nearby states based servers, and they, too, will be replicating to nearby servers.

From a deployment standpoint, what we need to do is the following:

  • Setup a geo distributed sharded cluster using the user’s location.
  • Each shard would be a separate server, which is replicating to N other servers in the nearby geographical area.

And that is pretty much it…

Tags:

Published at

Originally posted at

Comments (6)

Avoid where in a reduce clause

We got a customer question about a map/reduce index that produced the wrong results. The problem was a problem between the conceptual model and the actual model of how Map/Reduce actually works.

Let us take the following silly example. We want to find all the animal owner’s that have more than a single animal. We define an index like so:

// map
from a in docs.Animals
select new { a.Owner, Names = new[]{a.Name} }

// reduce
from r in results
group r by r.Owner into g
where g.Sum(x=>x.Names.Length) > 1
select new { Owner = g.Key, Names = g.SelectMany(x=>x.Names) }

And here is our input:

{ "Owner": "users/1", "Name": "Arava" }    // animals/1
{ "Owner": "users/1", "Name": "Oscar" }    // animals/2
{ "Owner": "users/1", "Name": "Phoebe" }   // animals/3

What would be the output of this index?

At first glance, you might guess that it would be:

{ "Owner": "users/1", "Names": ["Arava", "Oscar", "Phoebe" ] }

But you would be wrong. The actual output of this index… It is nothing. This index actually have no output.

But why?

To answer that, let us ask the following question. What would be the output for the following input?

{ "Owner": "users/1", "Name": "Arava" } // animals/1

That would be nothing, because it would be filtered by the where in the reduce clause. This is the underlying reasoning why this index has no output.

If we feed it the input one document at a time, it has no output. It is only if we give it all the data upfront that it has any output. But that isn’t how Map/Reduce works with RavenDB. Map/Reduce is incremental and recursive. Which means that we can (and do) run it on individual documents or blocks of documents independently. In order to ensure that, we actually always run the reduce function on the output of each individual document’s map result.

That, in turn, means that the index above has no output.

To write this index properly, I would have to do this:

// map
from a in docs.Animals
select new { a.Owner, Names = new[]{a.Name}, Count = 1 }

// reduce
from r in results
group r by r.Owner into g
select new { Owner = g.Key, Names = g.SelectMany(x=>x.Names), Count = g.Sum(x=>x.Count) }

And do the filter of Count > 1 in the query itself.

Tags:

Published at

Originally posted at

Comments (2)

Map/Reduce visualizer, take II

Yes, I talked about this already, but we made some additional improvements that make it even cooler.

Here is a document:

image

And here is the index:

image

Now, let us look at what happens when we go to the map/reduce visualizer:

image

This is a highly zoomed out picture, let us zoom a little (click it for higher zoom):

image

As you can see, we can see all the documents that have any reduce keys shared with the reduce keys from the document we started from. That is going to make explaining, and debugging, map/reduce so much easier.

For that matter, here is an example of the visualizer showing us a multi step reduce. Which is an optimization that happens when we have a lot of entries for the same reduce key. Now we can actually show you how this works:

image

Pretty cool!

Tags:

Published at

Originally posted at

Comments (1)

Introducing inefficiencies into RavenDB, on purpose

Yes, I choose the title on purpose. The topic of this post is this issue. In RavenDB, we use replication to ensure high availability and load balancing. We have been using that for the past five years now, and in general, it has been great, robust and absolutely amazing when you need it.

But like all software, it can run into interesting scenarios. In this case, we had three nodes, call them A, B and C. In the beginning, we had just A & B and node A was the master node, for which all the data was written and node B was there as a hot spare. The customer wanted to upgrade to a new RavenDB version, and they wanted to do that with zero downtime. They setup a new node, with the new RavenDB server, and because A was the master server, they decided to replicate from node B to the new node. Except… nothing appear to be happening.

No documents were replicating to the new node, however, there was a lot of CPU and I/O. But nothing was actually happening. The customer opened a support call, and it didn’t take long to figure out what was going on. The setup the replication between the nodes with the default “replicate only documents that were changed on this node”. However, since this was the hot spare node, no documents were ever changed on that node. All the documents in the server were replicated from the primary node.

The code for that actually look like this:

public IEnumerable<Doc> GetDocsToReplicate(Etag lastReplicatedEtag)
{
foreach(var doc in Docs.After(lastReplicatedEtag)
{
if(ModifiedOnThisDatabase(doc) == false)
continue;
yield return doc;
}
}


var docsToReplicate = GetDocsToReplicate(etag).Take(1024).ToList();
Replicate(docsToReplicate);

However, since there are no documents that were modified on this node, this meant that we had to scan through all the documents in the database. Since this was a large database, this process took time.

The administrators on the server noted the high I/O and that a single thread was constantly busy and decided that this is likely a hung thread. This being the hot spare, they restarted the server. Of course, that aborted the operation midway, and when the database started, it just started everything from scratch.

The actual solution was to tell the database, “just replicate all docs, even those that were replicated to you”. That is the quick fix, of course.

The long term fix was to actually make sure that we abort the operation after a while, report to the remote server that we scanned up to a point, and had nothing to show for it, and go back to the replication loop. The database would then query the remote server for the last etag that was replicated, it would respond with the etag that we asked it to remember, and we’ll continue from that point.

The entire process is probably slower (we make a lot more remote calls, and instead of just going through everything in one go, we have to stop, make a bunch of remote calls, then resume). But the end result is that the process is now resumable. And an admin will be able to see some measure of progress for the replication, even in that scenario.

Tags:

Published at

Originally posted at

Comments (11)

The fallacy of distributed transactions

This can be a very short post, just: See CAP. Unfortunately, we have a lot of people who actually have experience in using distributed transactions, and have a good reason to believe that they work. The answer is that yes, they do, as long as you don’t run into some of the interesting edge cases.

By the way, that is not a new observation, see The Two Generals.

Allow me to demonstrate. Assume that we have a distributed system with the following actors:

image

This is a fairly typical setup. You have a worker that pull messages from a queue and read/write to a database based on those messages. To coordinate between them, it uses a transaction coordinator such as MSDTC.

Transaction coordinators use a two phase commit (or sometimes a three phase commit protocols) to ensure that either all the data would be committed, or none of it would be.

The general logics goes like this:

  • For each participant in the transaction, send a prepare message.
    • If the participant answered “prepared”, it is guaranteeing that the transaction can be committed.
  • If any of the participants failed on prepare, abort the whole transaction.
  • If all of the participants successfully prepared, send the commit message to all of them.

The actual details are a bit more involved, obviously, but that is pretty much it.

Now, let us take a look at an interesting scenario. Worker #1 is pulling (in a distributed transaction) a message from the queue, and based on that message, it modify the database. Then it tells the transaction coordinator that it can commit the transaction. At this point, the TC is sending the prepare message to the database and the queue. Both reply that they have successfully prepared the transaction to be committed. The TC sends a commit message to the queue, completing the transaction from its point of view, however, at this point, it is unable to communicate with the database.

What happens now?

Before I answer that, let us look at another such scenario. The TC needs to commit a transaction, it sends a prepare message to the database, and receive a successful reply. However, before it manages to send a prepare message to the queue, it becomes unavailable.

Note that from the point of view of the database, the situation looks exactly the same. It got (and successfully replied) to a Prepare message, then it didn’t hear back from the transaction coordinator afterward.

Now, that is where it gets interesting. In an abstract world, we can just wait with the pending transaction until the connection with the coordinator is resumed, and we can actually get a “commit / abort” notification.

But we aren’t in abstract world. When we have such a scenario, we are actually locking records in the database (because they are in the process of being modified). What happens when another client comes to us and want to modify the same record?

For example, it is quite common for to host the business logic, queue and transaction coordinator on the same worker instance, while the database is on a separate machine. That means that in the image above, if Worker #1 isn’t available, we recover by directing all the users to the 2nd worker. However, at that point, we have a transaction that was prepared, but not committed.

When the user continue to make requests to our system, the 2nd worker, which has its own queue and transaction coordinator is going to try and access the user’s record. The same user whose record are currently locked because of the ongoing transaction.

If we just let it hang in this manner, we have essentially created a situation where the user’s data become utterly unavailable (at least for writes). In order to resolve that, transactions comes with a timeout. So after the timeout has expired, we can roll back that transaction. Of course, that leads to a very interesting situation.

Let us go back to the first scenario we explored. In this scenario, the queue got both Prepare & Commit messages, while the database got just a Prepare message. The timeout has expired, and the database has rolled back the transaction.  In other words, as far as the queue is concerned, the transaction committed, and the message is gone. As far as the database is concerned, that transaction was rolled back, and never happened.

Of course, the chance that something like that can happen in one of your systems? Probably one in a million.

When a race condition is what you want…

I have an interesting situation that I am not sure how to resolve. We need to record the last request time for a RavenDB database. Now, this last request time is mostly used to show the user, and to decide when a database is idle, and can be shut down.

As such, it doesn’t have to be really accurate ( a skew of even a few seconds is just fine ). However, under load, there are many requests coming in (in the case presented to us, 4,000 concurrent requests), and they all need to update the last request.

Obviously, in such a scenario, we don’t really care about the actual value. But the question is, how do we deal with that? In particular, I want to avoid a situation where we do a lot of writes to the same value in an unprotected manner, mostly because it is likely to cause contentions between cores.

Any ideas?

It is actually fine for us to go slightly back (so thread A at time T+1 and thread B at time T+2 running concurrently, and the end result is T+1), which is why I said that a race is fine for us. But what I want to avoid is any form of locking / contention.

I wrote the following test code:

class Program
{
    static void Main(string[] args)
    {
        var threads = new List<Thread>();

        var holder = new Holder();

        var mre = new ManualResetEvent(false);

        for (int i = 0; i < 2500; i++)
        {
            var thread = new Thread(delegate()
            {
                mre.WaitOne();
                for (long j = 0; j < 500*1000; j++)
                {
                    holder.Now = j;
                }
            });
            thread.Start();
            threads.Add(thread);
        }

        mre.Set();

        threads.ForEach(t => t.Join());


        Console.WriteLine(holder.Now);
    }
}

public class Holder
{
    public long Now;
}

And it looks like it is doing what I want it to. This creates a lot of contention on the same value, but it is also giving me the right value. And again, the value of right here is very approximate. The problem is that I know how to write thread safe code, I’m not sure if this is a good way to go about doing this.

Note that this code (yes, even with 2,500 threads) runs quite fast, in under a second. Trying to use Interlocked.Exchange is drastically more expensive, and Interlocked.CompareExchange is even worse.

But it is just not sitting well with me.

Tags:

Published at

Originally posted at

Comments (28)

Message passing, performance–take 2

In my previous post, I did some rough “benchmarks” to see how message passing options behave. I got some great comments, and I thought I’ll expand on that.

The baseline for this was a blocking queue, and we managed to process using that we managed to get:

145,271,000 msgs in 00:00:10.4597977 for 13,888,510 ops/sec

And the async BufferBlock, using which we got:

43,268,149 msgs in 00:00:10 for 4,326,815 ops/sec.

Using LMAX Disruptor we got a disappointing:

29,791,996 msgs in 00:00:10.0003334 for 2,979,100 ops/sec

However, it was pointed out that I can significantly improve this if I changed the code to be:

var disruptor = new Disruptor.Dsl.Disruptor<Holder>(() => new Holder(), new SingleThreadedClaimStrategy(256), new YieldingWaitStrategy(), TaskScheduler.Default);

After which we get a very nice:
141,501,999 msgs in 00:00:10.0000051 for 14,150,193 ops/sec
Another request I got was for testing this with a concurrent queue, which is actually what it is meant to do. The code is actually the same as the blocking queue, we just changed Bus<string> to ConcurrentQueue<string>.
 
Using that, we got:
170,726,000 msgs in 00:00:10.0000042 for 17,072,593 ops/sec
And yes, this is pretty much just because I could. Any of those methods is quite significantly higher than anything close to what I actually need.

Message passing, performance

I got some replies about the async event loop post, mentioning LMAX Disruptor and performance. I decided to see for myself what the fuss was all about.

You can read about the LMAX Disruptor, but basically, it is a very fast single process messaging library.

I wondered what that meant, so I wrote my own messaging library:

public class Bus<T>
{
Queue<T> q = new Queue<T>();

public void Enqueue(T msg)
{
lock (q)
{
q.Enqueue(msg);
}
}

public bool TryDequeue(out T msg)
{
lock (q)
{
if (q.Count == 0)
{
msg = default(T);
return false;
}
msg = q.Dequeue();
return true;
}
}
}

I think that you’ll agree that this is a thing of beauty and elegant coding. I then tested this with the following code:

public static void Read(Bus<string> bus)
{
int count = 0;
var sp = Stopwatch.StartNew();
while (sp.Elapsed.TotalSeconds < 10)
{
string msg;
while (bus.TryDequeue(out msg))
{
count++;
}
}
sp.Stop();

Console.WriteLine("{0:#,#;;0} msgs in {1} for {2:#,#} ops/sec", count, sp.Elapsed, (count / sp.Elapsed.TotalSeconds));
}

public static void Send(Bus<string> bus)
{
var sp = Stopwatch.StartNew();
while (sp.Elapsed.TotalSeconds < 10)
{
for (int i = 0; i < 1000; i++)
{
bus.Enqueue("test");
}
}
}

var bus = new Bus<string>();

ThreadPool.QueueUserWorkItem(state => Send(bus));

ThreadPool.QueueUserWorkItem(state => Read(bus));

The result of this code?

145,271,000 msgs in 00:00:10.4597977 for 13,888,510 ops/sec

Now, what happens when we use the DataFlow’s BufferBlock as the bus?

public static async Task ReadAsync(BufferBlock<string> bus)
{
int count = 0;
var sp = Stopwatch.StartNew();
while (sp.Elapsed.TotalSeconds < 10)
{
try
{
await bus.ReceiveAsync(TimeSpan.FromMilliseconds(5));
count++;
}
catch (TaskCanceledException e)
{
}
}
sp.Stop();

Console.WriteLine("{0:#,#;;0} msgs in {1} for {2:#,#} ops/sec", count, sp.Elapsed, (count / sp.Elapsed.TotalSeconds));
}

public static async Task SendAsync(BufferBlock<string> bus)
{
var sp = Stopwatch.StartNew();
while (sp.Elapsed.TotalSeconds < 10)
{
for (int i = 0; i < 1000; i++)
{
await bus.SendAsync("test");
}
}
}

What we get is:

43,268,149 msgs in 00:00:10 for 4,326,815 ops/sec.

I then decided to check what happens with the .NET port of the LMAX Disruptor. Here is the code:

public class Holder
{
public string Val;
}

internal class CounterHandler : IEventHandler<Holder>
{
public int Count;
public void OnNext(Holder data, long sequence, bool endOfBatch)
{
Count++;
}
}

static void Main(string[] args)
{
var disruptor = new Disruptor.Dsl.Disruptor<Holder>(() => new Holder(), 1024, TaskScheduler.Default);
var counterHandler = new CounterHandler();
disruptor.HandleEventsWith(counterHandler);

var ringBuffer = disruptor.Start();


var sp = Stopwatch.StartNew();
while (sp.Elapsed.TotalSeconds < 10)
{
for (var i = 0; i < 1000; i++)
{
long sequenceNo = ringBuffer.Next();

ringBuffer[sequenceNo].Val = "test";

ringBuffer.Publish(sequenceNo);
}
}
Console.WriteLine("{0:#,#;;0} msgs in {1} for {2:#,#} ops/sec", counterHandler.Count, sp.Elapsed, (counterHandler.Count / sp.Elapsed.TotalSeconds));
}

And the resulting performance is:

29,791,996 msgs in 00:00:10.0003334 for 2,979,100 ops/sec

Now, I’ll be the first to agree that this is really and absolutely not even close to be a fair benchmark. It is testing wildly different things. Distruptor is using a ring buffer, and the BlockBuffer didn’t, and the original Bus implementation just used an unbounded queue.

But that is a very telling benchmark as well. Pretty much because it doesn’t matter. What I need this for is for network protocol handling. As such, even assuming that every single byte is a message, we would have to go far beyond what any reasonable pipe can be expected to be handle.

Tags:

Published at

Originally posted at

Comments (9)

Async event loops in C#

I’m designing a new component, and I want to reduce the amount of complexity involved in dealing with it. This is a networked component, and after designing several such, I wanted to remove one area of complexity, which is the use of explicitly concurrent code. Because of that, I decided to go with the following architecture:

image

 

The network code is just reading messages from the network, and putting them in an in memory queue. Then we have a single threaded event loop that simply goes over the queue and process those messages.

All of the code that is actually processing messages is single threaded, which make it oh so much easier to work with.

Now, I can do this quite easily with a  BlockingCollection<T>, which is how I usually did those sort of things so far. It is simple, robust and easy to understand. It also tie down a full thread for the event loop, which can be a shame if you don’t get a lot of messages.

So I decided to experiment with async approaches. In particular, using the BufferBlock<T> from the DataFlow assemblies.

I came up with the following code:

var q = new BufferBlock<int>(new DataflowBlockOptions
{
CancellationToken = cts.Token,
});

This just create the buffer block, but the nice thing here is that I can setup a “global” cancellation token for all operations on this. The problem is that this actually generate bad exceptions (InvalidOperationException, instead of TaskCancelledException). Well, I’m not sure if bad is the right term, but it isn’t the one I would expect here, at least. If you pass a cancellation token directly to the method, you get the behavior I expected.

At any rate, the code for the event loop now looks like this:

private static async Task EventLoop(BufferBlock<object> bufferBlock, CancellationToken cancellationToken)
{
while (true)
{
object msg;
try
{
msg = await bufferBlock.ReceiveAsync(TimeSpan.FromSeconds(3), cancellationToken);
}
catch (TimeoutException)
{
NoMessagesInTimeout();
continue;
}
catch (Exception e)
{
break;
}
ProcessMessage(msg);
}
}

And that is pretty much it. We have a good way to handle timeouts, and processing messages, and we don’t take up a thread. We can also be easily cancelled. I still need to run this through a lot more testing, in particular, to verify that this doesn’t cause issues when we need to debug this sort of system, but it looks promising.

RavenDB Events

In the following months, there are going to be quite a few RavenDB Events, and I have been remiss in talking about them.

  • The Triangle RavenDB User Group is meeting in Raleigh. Jul 30, you can register here.
  • Mauro is going to be giving a 3 days course in RavenDB in London. Aug 11 – Aug 13, you can register here.
  • The Arizona RavenDB User Group is meeting in Scottsdale. Aug 12, you can register here.
  • We are going to be giving 2 full days events in Sweden. Sep 18 – Sep 19, you can register here.
  • I’m also going to be speaking in NSB Conf in New York City. Sep 29 – Sep 30, you can register here.
  • The RavenDB in Action should come out in October. You can purchase the early access copy here.
  • We are going to show up for Oredev with a lot of exciting stuff. Nov 4 – Nov 7, you can register here.

We also have a couple of surprises, but I’ll keep them for later.

Tags:

Published at

Originally posted at

Comments (1)

DSLs in Boo, and a look back

About 6 years ago, I started writing the DSLs in Boo book, it came out in 2010, and today I got an email saying that this is now officially out of print. It was never a hugely popular book, so I’m not really surprised, but it really got me thinking.

I got to build several DSLs for production during the time I was writing this book, but afterward, I pretty much pivoted hard to RavenDB, and didn’t do much with DSLs since. However, the knowledge acquired during the writing of this book has actually been quite helpful when writing RavenDB itself.

I’m not talking about the design aspects of writing a DSLs, or the business decisions that are involved with that, although that is certainly a factor. I’m talking about the actual technical details of working with a language, a parser, etc.

In fact, you won’t see that, probably, but RavenDB indexes and transformers are actually DSLs, and they use a lot of the techniques that I talk about in the book. We start with something that looks like a C# code, but what ends up running is actually something that is far different. The Linq provider, too, rely heavily on those same techniques. We show you one thing but actually do something quite different under the cover.

It is interesting to see how the actual design of RavenDB was influenced by what my own history and the choices I made in various places. If I wasn’t well versed with abusing a language, I would probably have to go with something like CouchDB’s views, for example.

Tags:

Published at

Originally posted at

Comments (11)

Help us select a theme for RavenDB 3.0

We are getting closer & closer for a 3.0 release. And as I mentioned, we are doing a lot of UI work. I’m quite excited about that, even though I don’t think you’ll be aware of all of the changes that we are adding.

But here is one that will be very visible, the new theme for the studio. So far, we have gone with the default theme, and pushed the actual design for later. Now we have some options to consider:

Dark:

dark

Darkly:

darkly

Flatly:

Flatly

Lumen:

lumen
RClassic

RClassic
Space lab:

space lab

 

What do you think is best?

Tags:

Published at

Originally posted at

Comments (76)

European RavenDB Days

After the success of the RavenDB conference, we have teamed with Oredev to bring the RavenDB conference to Europe.

 

I’m very pleased to announce that we’ll be doing 2 full day events, in Malmo and in Stockholm in September. You can read all about it here.

We’re going to have talks by core team members, customer stories and actual on field experiences.

Looking forward to seeing you there…

Tags:

Published at

Originally posted at

SQL Injection & Optimizations Path

It is silly, but I just had a conversation with one of our developers on SQL Injection. In RavenDB we support replicating to a relational database, which obviously require using SQL. We are doing things properly, with parameters and everything.

No chance for SQL Injection from there. Great, and end of a very short blog post if it was everything.

As it turned out, there is a significant performance difference between:

@p1 = 'users/1'
@p2 = 'users/2'

DELETE FROM Users WHERE Id IN (@p1,@p1)

And:

DELETE FROM Users WHERE Id IN ('users/1', 'users/2')

Enough that we added that as an option. The reason why related to the vagaries of the database query optimizer, and not really relevant.

This is off by default, obviously. And we use parameters by choice & preference. But we still added a minimal “protection” by adding:

sqlValue.Replace("'", "''")

Considering that this isn’t meant for user’s input (it is for document ids), that is something that is annoying.

Any suggestions on how to improve this?

Tags:

Published at

Originally posted at

Comments (13)

Visualizing Map/Reduce in RavenDB 3.0

We are working hard on RavenDB 3.0 and a lot of the work now goes into the spit & polish phase. Usually this is a pretty boring part. Making sure that intellisense works in the right places, that the UI makes sense, that we give the right results, that the logs are helpful, etc.

But occasionally we get to actually do really cool stuff in this phase, and one of them is this guy:

image

Map/Reduce is a very powerful technique, but it can be hard to figure out issues sometimes. We always had debug endpoints to help actually figuring this out, but we are now putting a lot of attention on surfacing them to the user and making this easy to understand.

The map/reduce visualizer is a great example of that.

Tags:

Published at

Originally posted at

Comments (7)

Compression finale

After a fairly long road, we are done. We have all the pieces, generating a shared dictionary, writing using Huffman encoding and getting the results back out.

Hopefully by now the theory behind it is fairly clear to you, and it is time to actually put this into practice.

I’ve 100,000 random users documents in this file, and I want to see what kind of compression I can get from a shared dictionary approach. The project with the code for all of this can be found here: Rhea Compression (most of it is basically a port of FemtoZip to .NET).

The actual file size is 8.49 MB, and when compressing it with Zip (Windows’ send to compress folder) it turn into a 1.93 MB file.

The original file size in bytes is: 8,809,353.

I then tried to compress each document individually using GZipStream, resulting in a total of: 10,004,614 bytes used. Or 9.5 MB! In other words, and not to anyone surprise (I hope), we see an increase in the file size.

However, when using Rhea’s compression, we do the following:

var trainer = new CompressionTrainer();

for (int i = 0; i < json.Length/100; i++)
{
    trainer.TrainOn(json[i*100]);
}

var compressionHandler = trainer.CreateHandler();

This creates a shared dictionary from every 100th document. So we have 1,000 documents as our sampling data. Then, I compressed all the individual documents one at a time.

The result took: 2,593,235 bytes or just 2.47 MB. We got 29% compression ratio! Note that we did this with a 34Kb shared dictionary.

Here is the actual compression code:

foreach (var dic in docs)
{
    size += s.Length;
    ms.SetLength(0);
    compressedSize += compressionHandler.Compress(s, ms);
}

And that is pretty much it. Rhea Compression is on github, and that concludes my spike into compression. In general, Rhea (and FemtoZip, obviously) are meant for very specific scenarios. I have high hopes to be able to use it in the future for doing great things Smile.

Tags:

Published at

Originally posted at

Huffman coding and encoding compressed data

So far, we have dealt with relatively straightforward topics. Given a corpus, find a shared dictionary that we can then use to extract repeated patterns. This is the classic view of compression, even if real world compression schemes are actually a lot more convoluted. Now, that we have compressed text like the following, what comes next?

<-43,6>11,'n<-60,6>Anna Nepal<-40,13><-18,8><-68,8>awest@twinte.gov'}

Obviously, it would be pretty important to encoding this properly. Here is an example of a bad encoding scheme:

[prefix – byte – 1 for compressed, 0 for literal]

[length – int – length of compressed / literal]

[offset – int – back reference if the prefix was set to 1]

The problem here is that we need 6 bytes to encode a one letter literal, and 9 bytes to encode any back reference. Using this inefficient method, encoding the compressed string above actually takes more space than the original one.

As you can imagine, there has been a lot of research into this. The RFC 1951 specification, for example, set us how to do that in detail, although I find this explanation much easier to go through.

Let us see a simple Huffman encoding scheme. We will use “this is an example of a huffman tree” as our text, and first, we’ll build the table.

image

And now we can encoding the above sentence in 135 bits, instead of 288.

For decoding, we just run over the table again, and stop the first time that we hit a leaf.

For example, let us see how we would decode ‘ ‘ and ‘r’.

Space is very common, appearing 7 times in the text. As such, it is encoded as 111. So, start from the root, and move right three times. We are in a leaf node, and we output space.

With the letter R, it only appear in the text 4 times, and its encoding is 10111. So move right from the root node, then left, then three times right, resulting in the leaf node R.

So the results of Huffman encoding using the above table is:

[t] = 0001
[h] = 1010
[i] = 1011
[s] = 0000
[ ] = 111
[i] = 1011
[s] = 0000
[ ] = 111
[a] = 010
[n] = 0011
[ ] = 111
[e] = 011
[x] = 10001
[a] = 010
[m] = 0010
[p] = 10010
[l] = 110010
[e] = 011
[ ] = 111
[o] = 110011
[f] = 1101
[ ] = 111
[a] = 010
[ ] = 111
[h] = 1010
[u] = 10000
[f] = 1101
[f] = 1101
[m] = 0010
[a] = 010
[n] = 0011
[ ] = 111
[t] = 0001
[r] = 10011
[e] = 011
[e] = 011

However, we have a problem here, this result in 135 bits or 17 bytes (vs 36 bytes for the original code). However, 17 bytes contains 136 bits. How do we avoid corruption in this case? The answer is that we have to include an EOF Marker in there. We do that by just adding a new item to our Huffman table and recording this normally.

So, that is all set, and we are ready to go, right?  Almost, we still need to decide how to actually encode the literals and compressed data. GZip (and FemtoZip) uses a really nice way of handling that.

The use of Huffman table means that it contains the frequencies of individual bytes values, but instead of having just 0 – 255 entries in the Huffman table, they uses 513 entries.

256 entries are for the actual byte entries.

256 entries are just lengths.

1 entry for EOF.

What does it means, length entries? It basically means, we store the values 256 – 511 for lengths. When we read the Huffman value, it will give us a value in the range of 0 – 512.

If it is 512, this is the EOF marker, and we are done.

If it is 0 – 255, it is a byte literal, and we can treat it as such.

But if it is in the range of 256 – 511, it means that this is a length. Note that because lengths also cluster around common values, it is very likely that we’ll be able to store most lengths in under a byte.

Following a length, we’ll store the actual offset back. And again, FemtoZip is using Huffman encoding to do that. This is actually being done by encoding the offset into multiple 4 bits entries. The idea is to gain as much as possible from commonalities in the actual byte patterns for the offset.

Yes, it is confusing, and it took me a while to figure it out. But it is also very sleek to see this in action.

On my next post, I’ll bring it all together and attempt to actually generate something that produces compressed data and can decompress it back successfully…

Tags:

Published at

Originally posted at

Comments (6)

Using shared dictionary during compression

After the process of creating the actual dictionary, it is time for us to actually make use of it, isn’t it?

Again, I’m following the code from FemtoZip here, and I’m going to explain how it actually uses the dictionary for compression. The magic is happening in the PrefixHash class. Let us see the calling code:

var dic = Encoding.UTF8.GetBytes("asonerryson@eterson','.mil'ame':'{'id':','country':'P','email':','country':'");

var text = Encoding.UTF8.GetBytes("{'id':11,'name':'Anna West','country':'Nepal','email':'awest@twinte.gov'}");

var prefixHash = new PrefixHash(dic, true);

var bestMatch = prefixHash.GetBestMatch(0, text);
Console.WriteLine(Encoding.UTF8.GetString(dic, bestMatch.BestMatchIndex, bestMatch.BestMatchLength));

The output of this code is: {‘id’:

How does this work?

When we create the prefixHash, it generate the following table by hashing every 4 bytes and storing the relevant position in them.

hash[  5] =  58;
hash[ 11] =  71;
hash[ 13] =  22;
hash[ 14] =  11;
hash[ 17] =  23;
hash[ 30] =  16;
hash[ 34] =  61;
hash[ 35] =   5;
hash[ 37] =  70;
hash[ 41] =  65;
hash[ 45] =  54;
hash[ 49] =  66;
hash[ 57] =  36;
hash[ 58] =  29;
hash[ 63] =  57;
hash[ 65] =  60;
hash[ 66] =  56;
hash[ 67] =   2;
hash[ 72] =   0;
hash[ 73] =  28;
hash[ 78] =  62;
hash[ 79] =  35;
hash[ 80] =  51;
hash[ 87] =  55;
hash[ 89] =  15;
hash[ 91] =   9;
hash[ 96] =   7;
hash[ 99] =  67;
hash[100] =  52;
hash[105] =  21;
hash[108] =  25;
hash[109] =  69;
hash[110] =  68;
hash[111] =  64;
hash[118] =  17;
hash[120] =   4;
hash[125] =  33;
hash[126] =   3;
hash[127] =  26;
hash[130] =  18;
hash[131] =  31;
hash[132] =  59;

The hash of {‘id (the first 4 bytes) is 125. And as you can see, that maps to position 33. That means that there is a likelihood that there in position 33 there is a the value {‘id. What we do then is check, and continue to run through the code as long as we have a match.

That is how we can figure out that there is a 6 character match starting at position 33. The actual code is more involved, of course, and we need to figure out if there might be another match, elsewhere in the dictionary, that might serve better. Another issue when we need to actually compress is that while we can use the dictionary for compression, it is also actually possible to use the plain text we are compressing as another dictionary as well. Which is what FemtoZip is doing.

Basically, the logic goes like this. Check the current position for an entry in the dictionary, then check if we already had this value in the plain text we have seen so far. Select the largest match, then output that.

Here is actually using it all:

var dic = Encoding.UTF8.GetBytes("asonerryson@eterson','.mil'ame':'{'id':','country':'P','email':','country':'");

var text = Encoding.UTF8.GetBytes("{'id':11,'name':'Anna Nepal','country':'Nepal','email':'awest@twinte.gov'}");

var substringPacker = new SubstringPacker(dic);
substringPacker.Pack(text, new DebugPackerOutput(), Console.Out);

The output is meant to be human readable, and the compressed text is:

<-43,6>11,'n<-60,6>Anna Nepal<-40,13><-18,8><-68,8>awest@twinte.gov'}

Note that we saved a total of 41 characters due to compression (assuming we don’t count the cost of actually encoding this.

Now, what about those references? They look very strange with those negative numbers. The reason those numbers are negative is actually quite simple. They aren’t dictionary entries, like you would think. Instead, they are back references. In other words, the first <-43,6> call is actually saying: go backward 43 bytes, then copy 6 bytes.

But we just started reading the compressed text, where do we go backward to? The answer is that we go backward into the dictionary. So all the references in the text are always relative to our current position. Let us resolve this compressed string one step at a time.

<-43,6> means go back 43 bytes into the dictionary and copy 6 bytes, giving us a string of:

{‘id’:

Then we have “11,n” literal that we append to the string:

{‘id’:11,’n

Now we need to go 60 bytes back (from the current end of the string) and copy 6 bytes giving us:

{‘id’:11,’name’:’

The literal “Anna Nepal” giving us:

{‘id’:11,’name’:’Anna Nepal

Then we have to go 40 characters back, and copy 13 bytes:

{‘id’:11,’name’:’Anna Nepal’,’country’:’

Now this is fun, we have to go 18 chars back, and for the first time, we aren’t hitting the dictionary, we are using the actual string that we uncompressed to generate the rest of the string:

{‘id’:11,’name’:’Anna Nepal’,’country’:’Nepal’,

Another backward reference, 68 steps and copying 8 bytes (again to the dictionary):

{‘id’:11,’name’:’Anna Nepal’,’country’:’Nepal’,’email’:’

The literal awest@twinte.gov’} completes the picture, giving us the full text:

{‘id’:11,’name’:’Anna Nepal’,’country’:’Nepal’,’email’:’awest@twinte.gov’}

And that is how FemtoZip works. And that is pretty neat.

The actual implementation is doing Huffman compression as well, but I’ll touch on that in a later post.

Tags:

Published at

Originally posted at