Ayende @ Rahien

Hi!
My name is Oren Eini
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,953 | Comments: 44,410

filter by tags archive

Implementing a document database: simple queries


And now this passes:

public class PerformingQueries
{
    const string query = @"
var pagesByTitle = 
from doc in docs
where doc.type == ""page""
select new { Key = doc.title, Value = doc.content, Size = (int)doc.size };
";

    [Fact]
    public void Can_query_json()
    {
        var serializer = new JsonSerializer();
        var docs = (JArray)serializer.Deserialize(
                new JsonTextReader(
                    new StringReader(
                        @"[
{'type':'page', title: 'hello', content: 'foobar', size: 2},
{'type':'page', title: 'there', content: 'foobar 2', size: 3},
{'type':'revision', size: 4}
]")));
        var compiled = new LinqTransformer(query, "docs", typeof(JsonDynamicObject)).Compile();
        var compiledQuery = (AbstractViewGenerator<JsonDynamicObject>)Activator.CreateInstance(compiled);
        var actual = compiledQuery.Execute(docs.Select(x => new JsonDynamicObject(x)))
            .Cast<object>().ToArray();
        var expected = new[]
        {
            "{ Key = hello, Value = foobar, Size = 2 }",
            "{ Key = there, Value = foobar 2, Size = 3 }"
        };

        Assert.Equal(expected.Length, actual.Length);
        for (var i = 0; i < expected.Length; i++)
        {
            Assert.Equal(expected[i], actual[i].ToString());
        }
    }
}

You wouldn’t believe how much effort it took, and all in all, implementing this is about 500 lines of code or so.

It depends on what you are optimizing for…


Today I was writing this code:

public class FakeRandomValueGenerator : IRandomValueGenerator
{
    private readonly int valueToReturn;

    public FakeRandomValueGenerator(int valueToReturn)
    {
        this.valueToReturn = valueToReturn;
    }

    public int Next(int min, int max)
    {
        return valueToReturn;
    }
}

This caused some concern to my pair, who asked me why I was hand rolling a stub instead of using a mocking framework. My answer was simple.

It will take me less time to write the class than it would take me to bring up the Add Reference dialog.

Challengeprobability based selection


Here is an interesting problem that I just run into. I need to select a value from a (small) set based on percentage. That seems like it would be simple, but for some reason I can’t figure out an elegant way of doing this.

Here is my current solution:

var chances = new Page[100];
int index = 0;
foreach (var page in pages)
{
    for (int i = index; i < index + page.PercentageToShow; i++)
    {
        chances[i] = row;
    }
    index += page.PercentageToShow;
}
return chances[new Random().Next(0, 100)];

This satisfy the requirement, but it is… not as elegant as I would wish it to be.

I may have N number of values, for small N. There isn’t any limitation on the percentage allocation, so we may have (50%, 10%, 12%, 28%). We are assured that the numbers will always match to a 100.

Microsoft Connect - FAIL (yet again)


I really think that Microsoft should close Connect. Because it is pretty obvious that they aren't managing that properly.

Let us take a look at yet another interesting bug report. This is related to a bug in System.Data that makes using System.Transactions even trickier than you would initially believe.

It was acknowledged as a bug by Miguel Gasca (from Microsoft), and a connect was reported.

That was in 2007(!), it was resolved, a month later, by "Microsoft", because it is "External" issue.

That bug is till here today, two years later, and still impacting customers. That is after a full release and SP1. The situation is that now I have to work around this bug because Microsoft cannot manage its own bugs database in a way that would allow it to... oh, I don't know, fix bugs!

FAIL

And you know what, I wouldn't be annoyed with this if this wasn't an ongoing pattern with Connect.

FAIL

M is to DSL as Drag & Drop is to programming


image

I have been sitting on this post for a while now, because that was my first impression after seeing Oslo & M and all the hype around it from the PDC. To be frank, I had a hard time believing my own gut feeling. I kept having the feeling that I am missing something, which is why I avoided talking about this so far.

But, as time passed, and as we started to see more and more about Oslo and M, it validated my initial thinking. Now, just to be clear, I don’t intend to even touch on the whole of Oslo in this post. I don’t have a problem stating that I still don’t see the whole point there, but that is beside the point (no pun intended). What I would like to talk about is the M language, its usage, and the DSL that Microsoft shows as samples.

I see M as a whole lot of effort trying to optimize something that is really not that interesting, complex or really very hard. I look at the M language, the way that you worked with, the tooling and the API and I would fully agree that it is a nice parser generator.

What it is not, I have to say, is a DSL toolkit. It is just one, small, part of building a DSL. And, to be perfectly honest, M is the drag & drop of DSL. It looks good, on first glance, but then you dig just a little deeper and you see what actually going on, and you realize that you are probably not where you wanted to be.I  see it as trying very hard to optimize opening the car’s door. While I assume that this is interesting to someone in the world, optimizing the opening of a car door is crucial, I don’t really see it as an important feature. More to the point, it has negligible effect on the time taken for the primary task for which we use a car, the actual driving!

Why am I saying that?

Well, M is used for defining the syntax of the language, which is what most people look at. It does a good job there, but it also stops there. And there is a lot of stuff other then the syntax that you really care about.

Here is a snippet from MisBehave, which was an attempt to build a BDD framework on top of M:

image

Pretty impressive syntax, right?

The problem is that there isn’t really a good way to take this and translate that into something that is executable. Not without doing a lot of work. And that is why I am saying that M isn’t really an important piece of the stack. The actual syntax definition isn’t really that important. It is all the other things that you do with the DSL that matters.

Let us take a look at MUrl:

image

I am looking at this, and after looking at the source code, I still can’t figure out the point.

Yes, this is a demo DSL. But it is a good example that shows how you can completely miss the point with regards to a DSL. What problem does this DSL solve? What benefits do I get from integrating that into my system?

How does this helps me solve a real problem?

The answer is that it doesn’t. The only remotely useful case that I can think of is if I really want to be able to issue REST calls from the command line, and even then, there are better ways of doing that on the command line. MUrl is an exercise in abstraction for the sake of abstraction. More than that, it gives the impression that it is something that is is not.

If you want to show me a DSL, show me one that has logic, not one that is a glorified serialization format. That is the sweet spot for a DSL, to extract policy decisions from your systems, so you can work with them at a higher level and have easier time making change.

M is not a language for creating DSL. It is a language to define a serialization format, that is all.

I don’t do errors, and my name isn’t Google


image

For some reason, a lot of people come to me with a phrase that is like fingernails on a blackboard to me: “I have an error".

I try hard to be polite, so I usually don’t say what comes to mind immediately. Usually a variation on any of the following:

  • Good for you
  • That is nice
  • So do I, let get them together and see if we can breed them
  • And what do you want me to do about it?
  • ARGH!

The main problem is that this is usually all that is said. So it is a declaration of totally useless information.

Something that I find almost as offensive is asking me for information that is easily findable in Google.

Here is an example that demonstrate both of those issues:

image

I am not going to actually comment further. Suffice to say that this post is my way of taking out the frustration of having to do this yet again. And don’t get to gung ho on the actual example, that isn’t actually that important.

I wonder if I can get a title of Chief Google Searcher or something like that.

MEF & Open Generic Types


I read Glenn' s post about MEF's not supporting open generic types with somewhat resembling shock. The idea that it isn't supporting this never even crossed my mind, it was a given that this is a mandatory feature for any container in the .NET land.

Just to give you an idea, what this means is that you can't register Repository<T> and then resolve Repository<Order>. In 2006, I wrote an article for MSDN detailing what has since became a very common use of this pattern. Generic specialization is not something that I would consider optional, it is one of the most common usage patterns of containers in the .NET land. IRepository<T> is probably the most common example that people quote, but there are others as well.

This is not a simple feature, let me make it clear. Not simple at all. I should know, I implement that feature for both Object Builder and Windsor. But that is not what I would consider an optional one.

I am even more disturbed by the actual reasoning behind not supporting this. It is a technical limitation of MEF because internally all components are resolved by string matching, rather than CLR Types. This decision is severely limiting the things that MEF can do. Not supporting what is (in my opinion) is a pretty crucial feature is one example of that, but there are other implications. It means that you can't really do things like polymorphic resolutions, that your choices in extending the container are very limited, because the container isn't going to carry the information that is required to make those decision.

I would advice the MEF team to rethink the decision to base the component resolution on strings. At this point in time, it is still possible to change things ( and yes, I know it isn't as easy as I make it seems ), because not supporting open generic types is bad, but not having the ability to do so, and the reason for that (not keeping CLR Type information) are even worse. I get that MEF needs to work with DLR objects as well, but that means that MEF makes the decision to have lousier support for CLR idioms for the benefit of the DLR.

Considering the usage numbers for both of them, I can't see this being a good decision. It is certainly possible to support them both, but if there are any tradeoffs that have to be made, I would suggest that it should be the DLR, and not the CLR, which would be the second class role.

NH-1711 – Inappropriate error handling with NH 2.1 Alpha 1 when distributed transaction fails can cause application crashes


The actual problem has been fixed and it will be part of NH 2.1 Alpha 2. That is why we call them alphas, after all :-)

The actual bug is pretty convulsed mess, to be frank. And it is no wonder that it slipped by me. Yes, I am the one responsible for that, so I guess I am making excuses. Let me tell you about the actual scenario. When you are using NHibernate 2.1 Alpha 1 (it does not affect NHibernate 2.0 or 2.0.1) with a System.Transaction.Transaction, there is a slightly different code path that we have to go through, because the actual work that has to be done is no longer controlled by NHibernate, but by the DTC infrastructure.

So far, so good. The problem is that most of the time, this is done on the same thread as the application that we are running on. There are cases, specifically, when using a DTC with multiple durable enlistments, that the actual work is done on a worker thread. The problem is that if there is an error during that phase, for example, if we are trying to execute invalid command, or run into transaction deadlock, NHibernate wouldn’t properly handle this error, and it would bubble up. The result of unhandled thread exception is, of course, an application crash.

That is considered to be a bad thing, I understand, so after being able to isolate the problem, I went ahead and fix this. You can get the trunk now and get the fix, or wait until Alpha 2 is released.

Who is impacted by this?

You have to use multiple different durable enlistments inside a distributed transaction for the error condition to even be applicable. The problem is that there is one very common scenario that will run into this every single time. The .NET Service Buses all wrap their processing in a TransactionScope, and then tend to have multiple durable enlistments (the DB and MSMQ). This means that if you are using NServiceBus, Rhino ServiceBus or MassTransit alongside with NHibernate 2.1 Alpha1, you are probably impacted by this issue.

As I mentioned, a fix has already been committed (r4149) and it will be part of NHibernate 2.1 Alpah2.

3rd party integration assumption - other system was written by a drunken monkey typing with his feet


image If you have heard me speak, you are probably are aware that I tend to use this analogy a lot. Any 3rd party system that I have to integrate with was written by a drunken monkey typing with his feet.

So far, I am sad to say, that assumption has been quite accurate over a large range of projects.

You are probably already familiar with concepts of System Boundary and Anti Corruption Layer and how to apply them in order to keep the crazy monkey shtick away from your system, so I am not going to talk about those.

What I do want to talk about now is something slightly different, it is not related to the actual 3rd party system itself, it is related to its management.

One of the more annoying things about 3rd party stuff is that it is usually broken in… interesting ways. So you really want to be able to test that thing during the integration phase, so you would know what to expect. That seems like a very simple concept, right? All you have to do is to hit the QA env. and have a ball. That bring us back to the system management, and the really big screw ups that are happening there.

Before we continue, I want to state that usually, when I am integrating with a 3rd party, I usually do so out of some business reason, so it tend to be the case that I want to give them money, customers, pay per operation, or something else that is to the benefit of that 3rd party! Making integration with your software hard has a direct affect on the number of people who are trying to give you money! 

That is why I am so surprised by the amount of trouble that you have to go through in some organizations (I would say, the majority of the organization). Here is a partial list of things that I run into recently:

  • Having no QA env. – We were told to basically just work off of the spec and push it to production. You can imagine how big a success that was.
  • Having a QA env. that was a different version than the one in production, don’t tell the people that are integrating with you anything about that.
  • A variation of the above, have QA env. that is significantly different than production.
  • Having a QA env. that has real world effects. For example, if I am testing my bulk mail integration, I do not expect all those emails to be actually sent! Or, in another memorable case, if I am integrating with a merchant provider and testing authorization, I do not want to see those in my real credit report!
  • Having a QA env. that requires frequent human intervention. For example:
    • In order to validate that your integration has been successful, you have to call someone and wait until they verify that yes, the appropriate values are there in the appropriate system. Each and every time.
    • The integration is a one way operation (imagine something like CreateUser, which would fail if it already exists), and you cannot use any dummy values (imagine that you need to pass a valid credit card to that function), you have to have real ones. So every time you test the integration, you have to call someone and have them reset that information.
  • Having a QA env. that is down for two weeks just as we were suppose to test the integration.

That is why I am saying that the system management is such a crucial thing. And why I am so surprised and disappointed to see so many organizations get it wrong in ways that are so not funny.

If you are building a system that people will integrate with, consider, as soon as possible, the implications of not having a good testing environment for your system. As a matter of fact, I suggest building QA hooks from day one, so you can pass a flag to the system that would tell it “this is only for tests”, which would mean that any external action on the system would be omitted, but all the logic and behavior are retained. 

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats