Ayende @ Rahien

It's a girl

Application analysis: Northwind.NET

For an article I am writing, I wanted to compare a RavenDB model to a relational model, and I stumbled upon the following Northwind.NET project.

I plugged in the Entity Framework Profiler and set out to watch what was going on. To be truthful, I expected it to be bad, but I honestly did not expect what I got. Here is a question, how many queries does it take to render the following screen?

image

The answer, believe it or no, is 17:

image

You might have noticed that most of the queries look quite similar, and indeed, they are. We are talking about 16(!) identical queries:

SELECT [Extent1].[ID]           AS [ID],
       [Extent1].[Name]         AS [Name],
       [Extent1].[Description]  AS [Description],
       [Extent1].[Picture]      AS [Picture],
       [Extent1].[RowTimeStamp] AS [RowTimeStamp]
FROM   [dbo].[Category] AS [Extent1]

Looking at the stack trace for one of those queries led me to:

image

And to this piece of code:

image

You might note that dynamic is used there, for what reason, I cannot even guess. Just to check, I added a ToArray() to the result of GetEntitySet, and the number of queries dropped from 17 to 2, which is more reasonable. The problem was that we passed an IQueryable to the data binding engine, which ended up evaluating the query multiple times.

And EF Prof actually warns about that, too:

image

At any rate, I am afraid that this project suffer from similar issues all around, it is actually too bad to serve as the bad example that I intended it to be.

Windows 7 Conflict Resolution Dialog: How I hate thee

Do you see this thing?

image

I hate it. (Yes, I know that Win 8 has a better one)

The reason that I hate it, there is no shortcut keys for “Move and Replace”, so I have to use the freaking mouse, which takes forever.

Published at

Originally posted at

Comments (19)

Expanding your horizons: Actions

In theory, there is no difference between theory and real life.

In my previous blog post, I discussed my belief that the best value you get from learning is learning the very basic of how our machines operate. From learning about memory management in operating systems to the details of how network protocols like TCP/IP work.

Some of that has got to be theoretical study, actually reading about how those things work, but theory isn’t enough. I don’t care if you know the TCP specs by heart, if you haven’t actually built a real system with it, and experienced the pain points, it isn’t really meaningful. The best way to learn, at least from my own experiences, is to actually do something.

Because that teaches you several very interesting things:

  • What are the differences between the spec and what is actually implemented?
  • How to resolve common (and not so common problems)?

The later is probably the most important thing. I think that I learned most of what I know about HTTP in the process of building an RSS feed reader. I learned a lot about TCP from implementing a proxy system, and I did a lot of learning from a series of failed projects regarding distributed programming in general.

I learned a lot about file systems and how to work with file based storage from Practical File System Design and from building Rhino Queues and Rhino DHT. In retrospect, I did a lot of very different projects in various areas and technologies.

The best way that I know to get better is to do, to fail, and to learn from what didn’t work. I don’t know of any shortcuts, although I am familiar with plenty of ways of making the road much longer (and usually not very pleasant).

In short, if you want to get better, pick something that you don’t know how to do, and then do it. You might fail, you likely will, but you’ll learn a lot from failing.

I keep drawing a blank when people ask me to suggest options for things to try building, so I thought that I would ask the readers of this blog. What sort of things do you think would be useful to build? Things that would push most people out of their comfort zone and make them learn the fundamentals of how things work.

Tags:

Published at

Originally posted at

Comments (13)

Expanding your horizons

One of the questions that I routinely get asked is “how do you learn”. And the answer that I keep giving is that I had accidently started learning things from the basic building blocks. I still count a C/C++ course that I took over a decade ago as one of the chief reasons why I have a good grounding in how computers actually operate. During that course, we had to do anything from building parts of the C standard library on our own to construct much of the foundation of C++ features in plain C.

That gave me enough understanding of how things are actually implemented to be able to grasp how things are behaving elsewhere. Digging deep into the implementation is almost never a wasted effort. And if you can’t peel away the layer of abstractions, you can’t really say that you know what you are doing.

For example, I count myself ignorant in all manners about WCF, but I have full confidence that I can build a system using it. Not because I understand WCF itself, but because I understand the arena in which it plays. I don’t need to really understand how a certain technology works, if I already know what are the rules it has to play with.

Picking on WCF again, if you don’t know firewalls and routers, you can’t really build a WCF system, regardless of how good your memory is about the myriad ways of configuring WCF to do you will. If you can’t use WireShark to figure out why the system is slow to respond to requests, it doesn’t matter if you can compose an WCF envelope message literally on the back of a real world envelope.  If you don’t grok the Fallacies of Distributes Computing, you shouldn’t be trying to build a real system where WCF is used, regardless of whatever certificate you have from Microsoft.

The interesting bit is that for most of what we do, the rules are fairly consistent. We all have to play in Turing’s sand box, after all.

What this means is that learning the details of IP and TCP will be worth it over and over again. Understanding things like memory fetch latencies would be relevant in 5 years and in ten. Knowing what actually goes on in the system, even if it at a somewhat abstracted level is important. That is what make you the master of the system, instead of its slave.

Some of the things that I especially value, and that is of the top of my head and isn’t a closed list are:

  • TCP / UDP – how do they actually work.
  • HTTP – and implications (for example, state management).
  • The Fallacies of Distributed Computing.
  • Disk based storage – efficiently working with it, how file system works.
  • Memory management in OS and your environment.

Obviously, this is a very short list, and again, it isn’t comprehensive.  It is just meant to give you some indications for things that I have found to be useful over and over and over again.

That kind of knowledge isn’t something that is replaced often, and it will help you understand how anyone else has to interact with the same constraints. In fact, it often allows you to accurately guess how they solve a certain problem, because you are aware of the same alternatives that the other side had to solve.

In short, if you seek to be a better developer, dig deep and learn the real basic building blocks for our profession.

In my next post, I’ll discuss strategies for doing that.

Tags:

Published at

Originally posted at

Comments (25)

Transitive Replication in RavenDB

TLDR;

Replication topologies make my head hurt.

One of our customers had an interesting requirement, several months ago:

image

Basically, he wanted to write a document at node #1, and have it replicate, through node #2, to node #3. That was an easy enough change, and we did that. But then we got another issue from a different customer, who had the following topology:

image

And that client problem is that when making a write to node #1, it would be replicated to nodes 2 – 4, each of which would then try to update the other two with the new replication information (it would skip node #1 because it is the source). That would cause… issues, because they already had that document in place.

In order to resolve that, I added a configuration option, which controls whatever the node that we replicate to should receive only documents that were modified on the current node, or whatever we need to include documents that were replicated to us from other nodes as well.

It is a relatively small change, code wise. Of course, documenting this, and all of the options that follows is going to be a much bigger task, because now you have to make a distinction between replicating nodes, gateway nodes, etc.

Tags:

Published at

Originally posted at

Comments (10)

What is up with all those 404?

Recently I had a spate of posts showing up, and then reporting 404. The actual reason behind that was that the server and the database disagreed with one another with regards to the timezone of the post (one was using UTC, the other local). Sorry for the trouble, and this shouldn’t happen any longer.

Tags:

Published at

Originally posted at

Comments (8)

Mixing Integrated Authentication and Anonymous Authentication with PreAuthenticated = true doesn’t work

This StackOverflow question indicate that it is half a bug and half a feature, but that it sure as hell looks like a bug to me.

Let us assume that we have a couple of endpoints in our application, called http://localhost:8080/secure and http://localhost:8080/public. As you can imagine, the secure endpoint is… well, secure, and requires authentication. The public endpoint does not.

We want to optimize the number of request we make, so we specify PreAuthenticated = true; And that is where all hell break lose.

The problem is that it appears that when using request with entity body (in other words, PUT / POST) with PreAuthenticate = true, the .NET framework will issue a PUT / POST request with empty body to the server. Presumably to get the 401 authentication information. At that point, if the endpoint that it happened to have reached is public, it will be accepted as a standard request, and processing will be tried. The problem here is that it has an empty body, so that has a very strong likelihood of failing.

This error cost me a day and a half or so. Here is the full repro:

static void Main()
{
    new Thread(Server)
    {
        IsBackground = true
    }.Start();

    Thread.Sleep(500); // let the server start

    bool secure = false;
    while (true)
    {
        secure = !secure;
        Console.Write("Sending: ");
        var str = new string('a', 621);
        var req = WebRequest.Create(secure ? "http://localhost:8080/secure" : "http://localhost:8080/public");
        req.Method = "PUT";

        var byteCount = Encoding.UTF8.GetByteCount(str);
        req.UseDefaultCredentials = true;
        req.Credentials = CredentialCache.DefaultCredentials;
        req.PreAuthenticate = true;
        req.ContentLength = byteCount;

        using(var stream = req.GetRequestStream())
        {
            var bytes = Encoding.UTF8.GetBytes(str);
            stream.Write(bytes, 0, bytes.Length);
            stream.Flush();
        }

        req.GetResponse().Close();

    }

}

And the server code:

public static void Server()
{
    var listener = new HttpListener();
    listener.Prefixes.Add("http://+:8080/");
    listener.AuthenticationSchemes = AuthenticationSchemes.IntegratedWindowsAuthentication | AuthenticationSchemes.Anonymous;
    listener.AuthenticationSchemeSelectorDelegate = request =>
    {

        return request.RawUrl.Contains("public") ? AuthenticationSchemes.Anonymous : AuthenticationSchemes.IntegratedWindowsAuthentication;
    };

    listener.Start();

    while (true)
    {
        var context = listener.GetContext();
        Console.WriteLine(context.User != null ? context.User.Identity.Name : "Anonymous");
        using(var reader = new StreamReader(context.Request.InputStream))
        {
            var readToEnd = reader.ReadToEnd();
            if(string.IsNullOrEmpty(readToEnd))
            {
                Console.WriteLine("WTF?!");
                Environment.Exit(1);
            }
        }

        context.Response.StatusCode = 200;
        context.Response.Close();
    }
}

If we remove pre authenticate is set to false, everything works, but then we have twice as many requests. The annoying thing is that if it would be trying to authenticate to a public endpoint, nothing would happen, if it were sending the bloody entity body along as well.

This is quite annoying.

Tags:

Published at

Originally posted at

Comments (13)

Stupid smart code: Solution

The reason that I said that this is very stupid code?

public static void WriteDataToRequest(HttpWebRequest req, string data)
{
    var byteCount = Encoding.UTF8.GetByteCount(data);
    req.ContentLength = byteCount;
    using (var dataStream = req.GetRequestStream())
    {
        if(byteCount <= 0x1000) // small size, just let the system allocate it
        {
            var bytes = Encoding.UTF8.GetBytes(data);
            dataStream.Write(bytes, 0, bytes.Length);
            dataStream.Flush();
            return;
        }

        var buffer = new byte[0x1000];
        var maxCharsThatCanFitInBuffer = buffer.Length / Encoding.UTF8.GetMaxByteCount(1);
        var charBuffer = new char[maxCharsThatCanFitInBuffer];
        int start = 0;
        var encoder = Encoding.UTF8.GetEncoder();
        while (start < data.Length)
        {
            var charCount = Math.Min(charBuffer.Length, data.Length - start);

            data.CopyTo(start, charBuffer, 0, charCount);
            var bytes = encoder.GetBytes(charBuffer, 0, charCount, buffer, 0, false);
            dataStream.Write(buffer, 0, bytes);
            start += charCount;
        }
        dataStream.Flush();
    }
}

Because all of this lovely code can be replaced with a simple:

public static void WriteDataToRequest(HttpWebRequest req, string data)
{
    req.ContentLength = Encoding.UTF8.GetByteCount(data);

    using (var dataStream = req.GetRequestStream())
    using(var writer = new StreamWriter(dataStream, Encoding.UTF8))
    {
        writer.Write(data);
        writer.Flush();
    }
}

And that is so much better.

Tags:

Published at

Originally posted at

Comments (12)

Stupid smart code

We had the following code:

public static void WriteDataToRequest(HttpWebRequest req, string data)
{
    var byteArray = Encoding.UTF8.GetBytes(data);

    req.ContentLength = byteArray.Length;

    using (var dataStream = req.GetRequestStream())
    {
        dataStream.Write(byteArray, 0, byteArray.Length);
        dataStream.Flush();
    }
}

And that is a problem, because it allocates the memory twice, once for the string, once for the buffer. I changed that to this:

public static void WriteDataToRequest(HttpWebRequest req, string data)
{
    var byteCount = Encoding.UTF8.GetByteCount(data);
    req.ContentLength = byteCount;
    using (var dataStream = req.GetRequestStream())
    {
        if(byteCount <= 0x1000) // small size, just let the system allocate it
        {
            var bytes = Encoding.UTF8.GetBytes(data);
            dataStream.Write(bytes, 0, bytes.Length);
            dataStream.Flush();
            return;
        }

        var buffer = new byte[0x1000];
        var maxCharsThatCanFitInBuffer = buffer.Length / Encoding.UTF8.GetMaxByteCount(1);
        var charBuffer = new char[maxCharsThatCanFitInBuffer];
        int start = 0;
        var encoder = Encoding.UTF8.GetEncoder();
        while (start < data.Length)
        {
            var charCount = Math.Min(charBuffer.Length, data.Length - start);

            data.CopyTo(start, charBuffer, 0, charCount);
            var bytes = encoder.GetBytes(charBuffer, 0, charCount, buffer, 0, false);
            dataStream.Write(buffer, 0, bytes);
            start += charCount;
        }
        dataStream.Flush();
    }
}

And I was quite proud of myself.

Then I realized that I was stupid. Why?

Tags:

Published at

Originally posted at

Comments (49)

I am turning 0x1E tomorrow

In hex, I am still a teenager Smile.

To celebrate that, starting from the 20 Dec all the way to the new year, I decided to offer a 30% discount on all the profilers. All you need to do is to use the following coupon code:

01E-45K2D46V6K

The offer is valid for:

Published at

Originally posted at

Comments (9)

The best argument for scale out

I am writing a presentation, and I thought it would be interesting to get some numbers:

Server

Cost

PowerEdge T110 II (Basic) – 8 GB, 3.1 Ghz Quad 4T

$1,350.00

PowerEdge T110 II (Basic) – 32 GB, 3.4 Ghz Quad 8T

$12,103.00

PowerEdge C2100 - 192 GB, 2 x 3 Ghz

$19,960.00

IBM System x3850 X5 – 8 x 2.4 Ghz, 2048 GB

$645,605.00 

Blue Gene/P – 14 teraflops, 4,096 cpus

$1,300,000

K Computer (fastest super computer) - 10 petaflops, 705,024 cores, 1,377 TB

$10,000,000 annual operating cost

No data on actual cost to build

And then what?

Tags:

Published at

Originally posted at

Comments (17)

Win Free Copies of Nhibernate 3 Beginner's Guide

I have teamed up with Packt Publishing and we are organizing a Giveaway for  three lucky who winners stand a chance to win a copy of the NHibernate 3 Beginner’s Guide.

Overview of Nhibernate 3 Beginner's Guide

clip_image002

· Clear, precise step-by-step directions to get you up and running quickly

· Test, profile, and monitor data access to tune the performance and make your applications fly

· Reduce hours of application development time and get better application architecture and performance

Read more about this book and download free: Sample Chapter

How to Enter?

All you need to do is head on over to the book page and look through the product description of this book and drop a line via the comments below to let us know what interests you the most about this book. It’s that simple

Product description for NHibernate book: http://www.packtpub.com/nhibernate-3-beginners-guide/book

Winners from the U.S. and Europe can either choose a physical copy of the book or the eBook. Users from other locales are limited to the eBook only.

Deadline


The contest will close on 31/12/11 PT. Winners will be contacted by email, so be sure to use your real email address when you comment!

Tags:

Published at

Originally posted at

Comments (117)

Rhino Service Bus & RavenDB integration

One of the interesting things about Rhino Service Bus is that I explicitly designed it to work nicely with Unit of Work style data access libraries. When I did that, I worked mainly with NHibernate, but it turns out that this is really easy to integrate RavenDB as well, all you need is the following message module:

public class RavenDbMessageModule : IMessageModule
{
    private readonly IDocumentStore documentStore;

    [ThreadStatic]
    private static IDocumentSession currentSession;

    public static IDocumentSession CurrentSession
    {
        get { return currentSession; }
    }

    public RavenDbMessageModule(IDocumentStore documentStore)
    {
        this.documentStore = documentStore;
    }

    public void Init(ITransport transport, IServiceBus serviceBus)
    {
        transport.MessageArrived += TransportOnMessageArrived;
        transport.MessageProcessingCompleted += TransportOnMessageProcessingCompleted;
    }

    public void Stop(ITransport transport, IServiceBus serviceBus)
    {
        transport.MessageArrived -= TransportOnMessageArrived;
        transport.MessageProcessingCompleted -= TransportOnMessageProcessingCompleted;
    }

    private static void TransportOnMessageProcessingCompleted(CurrentMessageInformation currentMessageInformation, Exception exception)
    {
        if (currentSession != null)
        {
            if (exception == null)
                currentSession.SaveChanges();
            currentSession.Dispose();
        }
        currentSession = null;
    }

    private bool TransportOnMessageArrived(CurrentMessageInformation currentMessageInformation)
    {
        if (currentSession == null)
            currentSession = documentStore.OpenSession();
        return false;
    }
}

This is fairly simple. Register to the message arrive and message processing completed events. When a message arrive, create a new session for the consumers. When the message processing is completed, and there hasn’t been any error, we call SaveChanges, and then dispose.

The rest of it is pretty simple as well, we need to provide a BootStrapper:

public class BootStrapper : CastleBootStrapper
{
    IDocumentStore store;

    protected override void ConfigureContainer()
    {
        store = new DocumentStore
        {
            ConnectionStringName = "RavenDB"
        }.Initialize();

        IndexCreation.CreateIndexes(typeof(BootStrapper).Assembly, store);

        Container.Register(
            Component.For<IDocumentStore>()
                .Instance(store),
            Component.For<IMessageModule>()
                .ImplementedBy<RavenDbMessageModule>(),
            Component.For<IDocumentSession>()
                .UsingFactoryMethod(() => RavenDbMessageModule.CurrentSession)
            );

        base.ConfigureContainer();
    }
}

Which basically simply need to create the document store and expose it to the container. We get the document session from the current one (the one managed by the module).

All in all, it is quite a thing, and it takes very little time / complexity to setup.

Implementing RavenDB Indexes

I got a couple of interesting questions about RavenDB implementation, and I thought it would make a good blog post.

Is it somewhat correct to say: When doing a map and reduce you use a dynamic or static query-index (not sure what you call them) that has compiled a c# class which will be used when deserializing the JSON to this class. This is done at server side, in memory, right? You run the query against the Lucene indexes and then deserialize the JSON and apply the projection/reduce to create the result?

No, this isn’t the case. We take the linq expression that makes up the statement, and then we compile that as a class. That class doesn’t represent the documents we index, it represent the indexing operation. It would probably be easier to explain with an example. Here is a simple index definition:

from user in docs.Users select new { user.Name }

RavenDB is going to take this code and translate into something like this:

using Raven.Abstractions;
using Raven.Database.Linq;
using System.Linq;
using System.Collections.Generic;
using System.Collections;
using System;
using Raven.Database.Linq.PrivateExtensions;
using Lucene.Net.Documents;
public class Index_MyIndex : AbstractViewGenerator
{
    public Index_MyIndex()
    {
        this.ViewText = @"from user in docs.Users select new { user.Name }";
        this.ForEntityNames.Add("Users");
        this.AddMapDefinition(docs => from user in docs
            where user["@metadata"]["Raven-Entity-Name"] == "Users"
            select new { user.Name, __document_id = user.__document_id });
        this.AddField("__document_id");
        this.AddField("Name");
        this.AddQueryParameterForMap("__document_id");
        this.AddQueryParameterForMap("Name");
        this.AddQueryParameterForReduce("__document_id");
        this.AddQueryParameterForReduce("Name");
    }
}

There is a lot going on in here, but most of it are just stuff used for internal bookkeeping for RavenDB. The important thing is the AddMapDefinition. You can see that we have taken the index definition, processed it a bit, and then we treat it like a lambda. That is how RavenDB is able to go from having an index in text to processing that index in memory.

Note that this has nothing whatsoever to do with deserialization. That is handled by another part of RavenDB, where we use the dynamic feature (along with a host of other stuff) to make it possible to run linq queries over schema less information.

Another thing to remember is that indexing is run over the documents stored in the database (input) and the results goes to Lucene (output). We never read information from Lucene as input for an index.

Could you describe how a drop of a property and a rename of a property will affect the Query-indexes?

If the index isn’t modified, it would try to index a missing property. That is basically a no op. In fact, we can even do nested indexing into a missing property and still have no issues, because the index code inside RavenDB is using Null Objects for most things, so you can do things like user.HelloWorld.NiceToMeeYou.Too and that would basically be translated into “don’t index me” value, instead of throwing.

For more general information about RavenDB migrations, you can see:

Tags:

Published at

Originally posted at

Async tests in Silverlight

One of the things that we do is build a lot of stuff in Silverlight, usually, those things are either libraries or UI. Testing Silverlight was always a problem, but at least there is a solution OOTB for that.

Unfortunately, the moment that you start talking about async tests (for example, you want to run a web server to check things), you need to do things like this, EnqueueCallback, EnqueueConditional and other stuff that makes the test nearly impossible to read.

Luckily for us, Christopher Bennage stopped here for a while and created a solution.

It allows you to take the following sync test:

[Fact]
public void CanUpload()
{
    var ms = new MemoryStream();
    var streamWriter = new StreamWriter(ms);
    var expected = new string('a',1024);
    streamWriter.Write(expected);
    streamWriter.Flush();
    ms.Position = 0;

    var client = NewClient(); 
    client.UploadAsync("abc.txt", ms).Wait();

    var downloadString = webClient.DownloadString("/files/abc.txt");
    Assert.Equal(expected, downloadString);
}

And translate it to:

[Asynchronous]
public IEnumerable<Task> CanUpload()
{
    var ms = new MemoryStream();
    var streamWriter = new StreamWriter(ms);
    var expected = new string('a', 1024);
    streamWriter.Write(expected);
    streamWriter.Flush();
    ms.Position = 0;

    yield return client.UploadAsync("abc.txt", ms);

    var async = webClient.DownloadStringTaskAsync("/files/abc.txt");
    yield return async;

    Assert.AreEqual(expected, async.Result);
}

It makes things so much easier. To set this us, just reference the project and add the following in the App.xaml.cs file:

private void Application_Startup(object sender, StartupEventArgs e)
{
    UnitTestSystem.RegisterUnitTestProvider(new RavenCustomProvider());
    RootVisual = UnitTestSystem.CreateTestPage();
}

And you get tests that are now easy to write and run in Silverlight.

Setting up a Rhino Service Bus Application: Part II–One way bus

One really nice feature of Rhino Service Bus is the notion of the one way bus. What is that? It is a miniature implementation of the bus that supports only sending messages, not receiving them. In what world is this useful?

It turn out, it quite a few. One way bus is usually used for web apps that just send commands / events to another system, and have no need to interact with the bus other than that, or for command line tools that just send a message, etc. The advantage of the one way bus is that you don’t need your own endpoint to use it, you can just start it, send some messages, and go away.

Here is how you set it up. As usual, we start from the configuration (this assumes you have the Raven.ServiceBus.Castle nuget package):

<?xml version="1.0"?>
<configuration>
  <configSections>
    <section name="rhino.esb" type="Rhino.ServiceBus.Config.BusConfigurationSection, Rhino.ServiceBus"/>
  </configSections>
  <rhino.esb>
    <messages>
      <add name="HibernatingRhinos.Orders.Backend.Messages"
           endpoint="msmq://localhost/Orders.Backend"/>
    </messages>
  </rhino.esb>
</configuration>

You might note that we don’t have any bus/endpoint configuration, only the list of message owners.

Now, the next step is just to create the actual one way bus:

var container = new WindsorContainer();
new OnewayRhinoServiceBusConfiguration()
    .UseCastleWindsor(container)
    .Configure();

var onewayBus = container.Resolve<IOnewayBus>();
onewayBus.Send(new TestMsg
{
    Name = “ayende”
});

And that is all…