Ayende @ Rahien

Refunds available at head office

Silverlight Queues: Design

I got a question about the feasibility of porting Rhino Queues to Silverlight, but that isn’t really something that can fit in that scenario. So I set down to write the design for a Silverlight queuing system. Just to be clear, I have no intention currently to actually build this at this time (well, not unless someone is willing to pay for it or I get a project that requires it). This is merely a way to get the design problem off my head.

Why can’t we just port Rhino Queues to Silverlight?

There are two reasons, one is technical, the second relates to the different way that Silverlight applications are used. The first problem is that Rhino Queues requires a way to listen to incoming messages, but Silverlight offers no way of doing such (reasonably so, FWIW, since that would turn the Silverlight app to a server, and there are probably security issues with that).

The second problem is simply that the way most Silverlight solutions are structured, there is going to be a WAN between the Silverlight app and the server, and WAN between each client. Direct connections between the Server & Silverlight client is likely to be possible only when using HTTP, and there isn’t going to be any direct communication between different clients.

This usually looks something like this:

image

Rhino Queues is built on the notion of independent agents, each of them containing a full queuing stack. That model will not work for Silverlight. In Silverlight, we need to support all the usual queuing features, but each agent (Silverlight application) is no longer independent.

The Silverlight Queues model will be slightly different. We introduce a Queues server component which will hold the queues & messages internally.

image

On the client side, the process of sending messages is:

  • Queues.Send(“server”, “Orders”, new Buy{ … });
  • Message is saved to isolated storage
  • A background thread send the messages from isolated storage to the Queues server.

The process of receiving messages is:

  • A background thread periodically pools (an alternative is a Comet using something like WebSync) the server for new messages.
  • Messages are written to isolated storage
  • The Silverlight acknowledge the receipt of the messages.

There are some twists here, though.

In a Silverlight application, there are going to be several possible use cases:

  • Each Silverlight application has its own “queue” in the Queues server. That allows a one-way messaging platform with point to point notification. That is likely to be common in Silverlight applications that handle business transactions.
  • Topics, in addition to a queue per client, we might also want to allow subscription to topics, for example, I may be interested in getting notifications when the system settings have changed. That is something that is shared among all (or a lot) of clients. The Queues server should support this as well.

An interesting approach to make the Queues server even more stateless is to remove the process of acknowledgement and change it to “give me all messages after…”

Yes, I know that this is a very high level design, but there really isn’t much here that I can call very complex.

How much interest is there in Queuing system in Silverlight?

The false myth of encapsulating data access in the DAL

This is a question that I get routinely, both from random strangers and when I am at clients.

I would like to design a system/application using NHibernate. But I also want to so flexible that in future ,if I unplug the NHibernate and use ADO.NET Entity framework or other framework then my application should not
crash.

In short, I am completely opposed for even trying doing something like that.

It is based on flawed assumptions

A lot of the drive behind this is based on the historical drive built in the time where data access layers directly accessed a database using its own dialect, resulting in the need to create just such an encapsulation in order to support multiple databases.

The issue with this drive is that it is no longer a factor, all modern OR/Ms can handle multiple databases effectively. Moreover, modern OR/M are no longer just ways to execute some SQL and get a result back, which is how old style DAL were written. An OR/M takes on a lot more responsibilities, from things like change tracking to cache management, from ensuring optimistic concurrency  to managing optimal communication with the database.

And those features matter, a lot. Not only that, but they are different between each OR/M.

It doesn’t work, and you’ll find that out too late

The main problem is that no matter how hard you try, there are going to be subtle and not so subtle differences between different OR/Ms, those changes can drastically affect how you build your application.

Here are a few examples, using NHibernate and EF.

Feature NHibernate Entity Framework
Futures Yes No
Batching Yes No
Transaction handling Requires explicit code Implicitly handled
Caching 1st & 2nd level caching 1st level caching only

This isn’t intended to be a NH vs. EF, and it doesn’t even pretend to be unbiased, I am simply pointing out a few examples of features that you can take advantage of which can greatly benefit you in one situation, which do not exists in another.

It has a high cost

In order to facilitate this data access encapsulation, you have to do one of two things:

  • Use the lowest common denominator, preventing you from using the real benefits of the OR/M in question.
  • Bleed those features through the DAL, allowing you to make use of those features, but preventing you from switching at a later time.

Either of those add complexity, reduce flexibility and creates confusion down the road. And in general, it still doesn’t work.

There are other barriers than the API

Here is an example from a real client, which insists on creating this encapsulation and hiding NHibernate inside their DAL. They run into the previously mentioned problems, where there are NHibernate features specifically designed to solve some of their problems, but which they have hard time to implement through their DAL.

Worse, from the migration perspective, most of the barrier for moving from NHibernate isn’t in the API. Their entity model make a heavy use on NHibernate’s <any/> feature, which large percentage other OR/Ms do not support. And that is merely the example that spring most forcibly to mind, there are others.

The real world doesn’t support it, even for the simplest scenarios

A while ago I tried porting the NerdDinner application to NHibernate. Just to point it out, that application have a single entity, and was designed with a nice encapsulation between the data access and the rest of the code. In order to make the port, I had to modify significant parts of the codebase. And that is about the simplest example that can be.

The role of encapsulation

Now, I know that some people would read this as an attack of encapsulation of data access, but that isn’t the case. By all mean, encapsulate to your heart’s content. But the purpose of this encapsulation is important. Trying to encapsulate to make things easier to work with, great. Trying to encapsulate so that you can switch OR/Ms? Won’t work, will be costly and painful.

So how do you move between OR/Ms?

There are reasons why some people want to move from one data access technology to the other. I was involved in several such efforts, and the approach that we used in each of those cases was porting, rather than trying to drop a new IDaataAccess implementation.

EF Prof and Code Only

I just finish touching up a new feature for EF Prof, support for Entity Framework’s Code Only feature. What you see below is EF Prof tracking the Code Only Nerd Dinner example:

image

I tried to tackle the same thing in CTP3, but I was unable to resolve it. Using CTP4, it was about as easy as I could wish it.

Just for fun, the following screen shot looks like it contains a bug, but it doesn’t (at least, not to my knowledge). If you can spot what the bug is, I am going to hand you a 25% discount coupon for EF Prof. If you can tell me why it is not a bug, I would double that.

As an aside, am I the only one that is bothered by the use of @@IDNETITY by EF? I thought that we weren’t supposed to make use of that. Moreover, why write this complex statement when you can write SELECT @@IDENTITY?

A question of an untenable situation

One of the most common issues that I run into in my work is getting all sort of questions which sound really strange. For example, I recently got a question that went something like this:

What is the impact of reflection on NHibernate’s performance?

I started to answer that there isn’t one that you would notice, even beyond the major optimizations that NH had in that role, you are accessing a remote database, which is much more expensive. But then they told me that they profiled the application and found some stuff there about that.

I asked them what their scenario was, and the exchange went like that:

Well, when we load a million rows…

And that is your problem…

To be fair, they actually had a reasonable reason to want to do that. I disagree with the solution that they had, but it was a reasonable approach to the problem at hand.

Reviewing CommunityCourses: A RavenDB application

The code for CommunityCourses can be found here: http://github.com/adam7/CommunityCourses

This is a RavenDB application whose existence I learned about roughly an hour ago. The following is a typical code review about the application, not limited to just RavenDB.

Tests

When I first opened the solution, I was very happy to see that there are tests for the application, but I was disappointed when I actually opened it.

image

The only tests that exists seems to be the default ones that comes with ASP.Net MVC.

I would rather have no test project than a test project like that.

Account Controller, yet again

Annoyed with the test project, I headed for my favorite target for frustration, the MVC AccountController and its horrid attempt at creating DI or abstractions.

Imagine my surprise when I found this lovely creature instead:

[HandleError]
public partial class AccountController : Controller
{
    public virtual ActionResult LogOn()
    {
        return View();
    }

    [AcceptVerbs(HttpVerbs.Post)]
    public virtual ActionResult LogOn(string userName, string password)
    {
        if (FormsAuthentication.Authenticate(userName, password))
        {
            FormsAuthentication.SetAuthCookie(userName, false);
            return RedirectToAction("Index", "Home");
        }
        else
        {
            ModelState.AddModelError("logon", "Invalid username or password");
            return View();
        }
    }

    public virtual ActionResult LogOff()
    {
        FormsAuthentication.SignOut();
        return RedirectToAction("Index", "Home");
    }
}

Yes, simple, work, simple, uncomplicated! You don’t have to think when reading this code, I like it. That is how the Account Controller should have been, by default.

The model

I then turned into the model. Community Courses is a RavenDB application, so I am very interested in seeing how this is handled. The first thing that I noticed was this:

image

That was interesting, I am not used to seeing static classes in the model. But then I looked into those classes, and it all became clear:

image

This is essentially a lookup class.

Then I switched to the rest of the model, the following image shows a partial view of the model, annotated a bit:

image

The classes with the Id property (highlighted) are presumed to be Root Entities (in other words, they would each reside in their own document). I am not absolutely sure that this is the case yet, but I am sure enough to point out a potential problem.

Did you notice that we have references to things like Address, Person and Centre in the model? They are marked with red and green points.

A green point is when we reference a class that doesn’t have an id, and is therefore considered to be a value type which is embedded in the parent document. A red point, however, indicate what I believe will be a common problem for people coming to RavenDB from an OR/M background.

RavenDB doesn’t support references (this is by design), and the result of referencing a Root Entity in another Root Entity is that the referenced entity is embedded inside the referencing document. This is precisely what you want for value types like Address, and precisely what you don’t want for references. You can see in TasterSession that there are actually two references to Tutor, one for the Tutor data and one for the TutorId. I think that this is an indication for hitting that problem.

For myself, I would prefer to not denormalize the entire referenced entity, but only key properties that are needed for processing the referencing entity. That make is easier to understand the distinction between the Person instance that is mapped to people/4955 and the Person instance that is help in Centre.Contact.

Session management

Next on the list, because it is so easy to get it wrong (I saw so many flawed NHibernate session management):

image

This is great, we have a session per request, which is what I would expect in a we application.

I might quibble with the call to SaveChanges, though. I like to make that one an explicit one, rather than implicit, but that is just that, a quibble.

Initializing RavenDB

This is worth speaking about (it happens in Application_Start):

image

RavenDB is pretty conventional, and Community Courses override one of those conventions to make it easier to work with MVC. By default, RavenDB will use ids like: “people/391”, “centres/912”, by changing the identity parts separator, it will ids like: “people-952”, “centres-1923”. The later are easier to work with in MVC because they don’t contain a routing character.

Centre Controller

This is a simple CRUD controller, but it is worth examining nonetheless:

image

Those are all pretty simple. The only thing of real interest is the Index action, which uses a query on an index to get the results.

Currently the application doesn’t support paging, but it probably should, which wouldn’t complicate like all that much (adding Skip & Take, that is about it).

Next, the Create/Edit actions:

image

They are short & to the point, nothing much to do here. I like this style of coding very much. You could fall asleep while writing it. The only comment beyond that is that those methods are so similar that I would consider merging them into a single action.

Course Controller

Things are starting to get much more interesting here, when we see this method:

image

Overall, it is pretty good, but am very sensitive to making remote calls, so I would change the code to make only a single remote call:

CourseViewModel ConvertToCourseViewModel(Course course)
{
    var ids = new List<string> { course.CentreId, course.TutorId, course.UnitId };
    if(course.VerifierId != null)
        ids.Add(course.VerifierId);
    ids.AddRange(course.StudentIds);

    var results = MvcApplication.CurrentSession.Load<object>(ids.ToArray());

    var courseViewModel = new CourseViewModel
    {
        Centre = (Centre)results[0],
        CentreId = course.CentreId,
        EndDate = course.EndDate,
        Id = course.Id,
        Name = course.Name,
        StartDate = course.StartDate,
        StudentIds = course.StudentIds,
        Tutor = (Person)results[1],
        TutorId = course.TutorId,
        Unit = (Unit)results[2],
        UnitId = course.UnitId,                
        VerifierId = course.VerifierId
    };
    int toSkip = 3;
    if (course.VerifierId != null)
    {
        toSkip += 1;
        courseViewModel.Verifier = (Person)results[3];
    }

    courseViewModel.Students = results.Skip(toSkip).Cast<Person>().ToList();

    return courseViewModel;
}

This is slightly more complex, but I think that the benefits outweigh the additional complexity.

5N+1 requests ain’t healthy

The following code is going to cause a problem:

image

It is going to cause a problem because it makes a single remote call (the Query) and for every result from this query it is going to perform 5 remote calls inside ConvertToCourseViewModel.

In other words, if we have twenty courses to display, this code will execute a hundred remote calls. That is going to be a problem. Let us look at how the document courses-1 looks like:

{
    "CentreId": "centres-1",
    "Status": "Upcoming",
    "Name": "NHibernate",
    "StartDate": "/Date(1280361600000)/",
    "EndDate": "/Date(1280534400000)/",
    "UnitId": "units-1",
    "TutorId": "people-1",
    "VerifierId": null,
    "StudentIds": [
        "people-1"
    ]
}

And here is how the UI looks like:

image

I think you can figure out what I am going to suggest, right? Instead of pulling all of this data at read time (very expensive), we are going to denormalize the data at write time, leading to a document that looks like this:

{
    "CentreId": { "Id":  "centres-1", "Name": "SkillsMatter London" },
    "Status": "Upcoming",
    "Name": "NHibernate",
    "StartDate": "/Date(1280361600000)/",
    "EndDate": "/Date(1280534400000)/",
    "UnitId": { "Id": "units-1", "Name": "1 - Introduction to Face Painting"},
    "TutorId": { "Id": "people-1", "Name": "Mr Oren Eini" },
    "VerifierId": null,
    "StudentIds": [
        { "Id": "people-1", "Name": "Ayende Rahien" }
    ]
}

Using this approach, we can handle the Index action shown above with a single remote call. And that is much better.

I am going to ignore actions whose structure we already covered (Edit, Details, etc), and focus on the interesting ones, the next of which is:

image

This is an excellent example of how things should be. (Well, almost, I would remove the unnecessary Store call and move the StudentIds.Add just before the foreach, so all the data access happens first, it makes it easier to scan). Using an OR/M, this code would generate 8 remote calls, but because Raven’s documents are loaded as a single unit, we have only 3 here (and if we really wanted, we can drop it to one).

Next, we update a particular session / module in a student.

image

We can drop the unnecessary calls to Store, but beside that, it is pretty code. I don’t like that moduleId / sessionId are compared to the Name property, That seems confusing to me.

Charting with RavenDB 

I am showing only the parts that are using RavenDB here:

image

There is one problem with this code, it doesn’t work. Well, to be bit more accurate, it doesn’t work if you have enough data. This code ignores what happen if you have enough people to start paging, and it does a lot of work all the time. It can be significantly improved by introducing a map/ reduce index to do all the hard work for us:

image

This will perform the calculation once, updating it whenever a document changes. It will also result in much less traffic going on the network, since the data that we will get back will look like this:

image

Person Controller doesn’t contain anything that we haven’t seen before. So we will skip that and move directly to…

Taster Session Controller

image

I sincerely hope that those comments were generated by a tool.

The following three methods all shared the same problem:

image

They all copy Centre & Person to the TasterSession entity. The problem with that is that it generate the following JSON:

image

References in RavenDB are always embedded in the referencing entity. This should be a denomralized reference instead (just Id & Name, most probably).

Other aspects

I focused exclusively on the controller / Raven code, and paid absolutely no attention to any UI / JS / View code. I can tell you that the UI looks & behaves really nice, but that is about it.

Summary

All in all, this was a codebase that was a pleasure to read. There are some problems, but they are going to be easy to fix.

RavenDB Authorization Bundle Design

I used to be able to just sit down and write some code, and eventually things would work. Just In Time Design. That is how I wrote things like Rhino Mocks, for example.

Several years ago (2007, to be exact) I started doing more detailed upfront design, those designs aren’t curved in stone, but they are helpful in setting everything in motion properly. Of course, in some cases those design need a lot of time to percolate. At any rate, this is the design for the Authorization Bundle for RavenDB. I would welcome any comments about it. I gave some background on some of the guiding thoughts about the subject in this post.

Note: This design is written before the code, it reflect the general principles of how I intend to approach the problem, but it is not a binding design, things will change.

Rhino Security design has affected the design of this system heavily. In essence, this is a port (of a sort) of Rhino Security to RavenDB, with the necessary changes to make the move to a NoSQL database. I am pretty happy with the design and I actually think that we might do back porting to Rhino Security at some point.

Important Assumptions

The most important assumption that we make for the first version is that we can trust the client not to lie about whose user it is executing a certain operation. That one assumes the following deployment scenario:

image

In other words, only the application server can talk to the RavenDB server and the application server is running trusted code.

To be clear, this design doesn’t not apply if users can connect directly to the database and lie about who they are. However, that scenario is expected to crop up, even though it is out of scope for the current version. Our design need to be future proofed in that regard.

Context & User

Since we can trust the client calling us, we can rely on the client to tell us which user a particular action is executed on behalf of, and what is the context of the operation.

From the client API perspective, we are talking about:

using(var session = documentStore.OpenSession())
{
     session.SecureFor("raven/authorization/users/8458", "/Operations/Debt/Finalize");
    
    var debtsQuery =   from debt in session.Query<Debt>("Debts/ByDepartment")
                       where debt.Department == department
                       select debt
                       orderby debt.Amount;
    
     var debts = debtsQuery.Take(25).ToList();

    // do something with the debts
}

I am not really happy with this API, but I think it would do for now. There are a couple of things to note with regards to this API:

  • The user specified is using the reserved namespace “raven/”. This allows the authorization bundle to have a well known format for the users documents.
  • The operation specified is using the Rhino Security conventions for operations. By using this format, we can easily construct hierarchical permissions.

Defining Users

The format of the authorization user document is as follows:

// doc id /raven/authorization/users/2929
{ "Name": "Ayende Rahien", "Roles": [ "/Administrators", "/DebtAgents/Managers"], "Permissions": [ { "Operation": "/Operations/Debts/Finalize", "Tag": "/Tags/Debts/High", "Allow": true, "Priority": 1, } ] }

There are several things to note here:

  • The format isn’t what an application needs for a User document. This entry is meant for the authorization bundle’s use, not for an application’s use. You can use the same format for both, of course, by extending the authorization user document, but I’ll ignore this for now.
  • Note that the Roles that we have are hierarchical as well. This is important, since we would use that when defining permissions. Beyond that, Roles are used in a similar manner to groups in something like Active Directory. And the hierarchical format allows to manage that sort of hierarchical grouping inside Raven easily.
  • Note that we can also define permissions on the user for documents that are tagged with a particular tag. This is important if we want to grant a specific user permission for a group of documents.

Roles

The main function of roles is to define permissions for a set of tagged documents. A role document will look like this:

// doc id /raven/authorization/roles/DebtAgents/Managers
{
   "Permissions": [
       { "Operation": "/Operations/Debts/Finalize", "Tag": "/Tags/Debts/High", "Allow": true, "Priority": 1, }
    ]
}

Defining permissions

Permissions are defined on individual documents, using RavenDB’s metadata feature. Here is an example of one such document, with the authorization metadata:

//docid-/debts/2931
{
  "@metadata": {
    "Authorization": {
      "Tags": [
        "/Tags/Debts/High"
      ],
      "Permissions": [
        {
          "User": "raven/authorization/users/2929",
          "Operation": "/Operations/Debts",
          "Allow": true,
          "Priority": 3
        },
        {
          "User": "raven/authorization/roles/DebtsAgents/Managers",
          "Operation": "/Operations/Debts",
          "Allow": false,
          "Priority": 1
        }
      ]
    }
  },
  "Amount": 301581.92,
  "Debtor": {
    "Name": "Samuel Byrom",
    "Id": "debots/82985"
  }
  //more document data
}

Tags, operations and roles are hierarchical. But the way they work is quite different.

  • For Tags and Operations, having permission for “/Debts” gives you permission to “/Debts/Finalize”.
  • For roles, it is the other way around, if you are a member of “/DebtAgents/Managers”, you are also a memeber of “/DebtAgents”.

The Authorization Bundle uses all of those rules to apply permissions.

Applying permissions

I think that it should be pretty obvious by now how the Authorization Bundle makes a decision about whatever a particular operation is allowed or denied, but the response for denying an operation are worth some note.

  • When performing a query over a set of documents, some of which we don’t have the permission for under the specified operation, those documents are filtered out from the query.
  • When loading a document by id, when we don’t have the permission to do so under the specified operation, an error is raised.
  • When trying to write to a document (either PUT or DELETE), when we don’t have the permission to do so under the specified operation, an error is raised.

That is pretty much as detailed as I want things to be at this stage. Thoughts?

RavenDB Index Management

When I wrote RavenDB, I started from the server, and built the client last. That had some interesting affects on RavenDB, for example, you can see detailed docs about the HTTP API, because that is what I had when I wrote most of the docs.

In the context of indexes, that meant that I thought a lot more about defining and working with indexes from the WebUI perspective, rather than the client perspective. Now that Raven have users that actually put it through its paces, I found that most people want to be able to define their indexes completely in code, and want to be able to re-create those indexes from code.

And that calls for a integral solution from Raven for this issue. Here is how you do this.

  • You define your index creation as a class, such as this one:
    public class Movies_ByActor : AbstractIndexCreationTask
    {
        public override IndexDefinition CreateIndexDefinition()
        {
            return new IndexDefinition<Movie>
            {
                Map = movies => from movie in movies
                                select new {movie.Name}
            }
            .ToIndexDefinition(DocumentStore.Conventions);
        }
    }
  • Somewhere in your startup routine, you include the following line of code:
    IndexCreation.CreateIndexes(typeof(Movies_ByActor).Assembly, store);

And that is it, Raven will scan the provided assembly (you can also provide a MEF catalog, for more complex scenarios) and create all those indexes for you, skipping the creation if the new index definition matches the index definition in the database.

This also provide a small bit of convention, as you can see, the class name is Movies_ByActor, but the index name will be Movies/ByActor. You can override that by overriding the IndexName property

Find the bug: Accidental code reviews

I was working with a client about a problem they had in integrating EF Prof to their application, when my caught the following code base (anonymized, obviously):

public static class ContextHelper
{
     private static Acme.Entities.EntitiesObjectContext _context;

     public static Acme.Entities.EntitiesObjectContext CurrentContext
     {
           get { return _context ?? (_context = new Acme.Entities.EntitiesObjectContext()); }
      }

}

That caused me to stop everything and focus the client’s attentions on the problem that this code can cause.

What were those problems?

A series of posts about NHibernate tooling

I intend to write a series of posts about NHibernate tooling, and I thought that before I start, I should ask people to point me to tools that I might not be familiar with.

Tools that are currently on the list to post about:

  • LLBLGen 3.0
  • Pleasant Modeler
  • Active Writer
  • NHibernate Query Analyzer

Any others that you’ll like me to check out?

Real world authorization implementation considerations

Nitpicker corner: this post discusses authorization, which assumes that you already know who the user is. Discussion of authentication methods, how we decide who the user is, would be outside the scope of this post.

I had a lot of experience with building security systems. After all, sooner or later, whatever your project is, you are going to need one. At some point, I got tired enough of doing that that I wrote Rhino Security, which codify a lot of the lessons that I learned from all of those times. And I learned a lot from using Rhino Security in real world projects as well.

When coming to design the authorization bundle for RavenDB, I had decided to make a conscious effort to detail the underlying premise that I have when I am approaching the design of a security system.

You can’t manage authorization at the infrastructure level

That seems to be an instinctual response by most developers when faced with the problem, “we will push it to the infrastructure and handle this automatically”. The usual arguments is that we want to avoid the possibility of the developer forgetting to include the security checks and that it makes it easier to develop.

The problem is that when you put security decisions in the infrastructure, you are losing the context in which a certain operation is performed. And context matters. It matters even more when we consider the fact that there are actually two separate levels of security that we need to consider:

  • Infrastructure related – can I read / write to this document?
  • Business related – can I perform [business operation] on this entity?

Very often, we try to use the first to apply the second. This is often the can when we have a business rule that specify that a user shouldn’t be able to access certain documents which we try to apply at the infrastructure level.

For a change, we will use the example of a debt collection agency.

As a debt collector, I can negotiate a settlement plan with a debtor, so the agency can resolve the debt.

  • Debt collectors can only negotiate settlement plans for debts under 50,000$
  • Only managers can negotiate settlement plans for debts over 50,000$

Seems simple, right? We will assume that we have a solution in store and say that the role of DebtCollectors can’t read/write to documents about settlement plans of over 50K$. I am not sure how you would actually implement this, but let us say that we did just that. We solved the problem at the infrastructure level and everyone is happy.

Then we run into a problem, a Debt Collector may not be allow to do the actual negotiation with a heavy debtor, but there is a whole lot of additional work that goes on that the Debt Collector should do (check for collateral, future prospects, background check, etc).

The way that the agency works, the Debt Collector does a lot of the preliminary work, then the manager does the actual negotiation. That means that for the same entity, under different contexts, we have very different security rules. And these sort of requirements are the ones that are going to give you fits when you try to apply them at the infrastructure level.

You can argue that those sort of rules are business logic, not security rules, but the way the business think of them, that is exactly what they are.

The logged on user isn’t the actual user

There is another aspect for this. Usually when we need to implement security system like this, people throw into the ring the notion of Row Level Security and allowing access to specific rows by specific logins. That is a non starter from the get go, for several reasons. The previous point about infrastructure level security applies here as well, but the major problem is that it just doesn’t work when you have more than a pittance of users.

All Row Level Security solutions that I am aware of (I am thinking specifically of some solutions provided by database vendors) requires you to login into the database using a specific user, from which your credentials can be checked against specific rows permissions.

Consider the case where you have a large number of users, and you have to login to the database for each user using their credentials. What is going to be the affect on the system?

Well, there are going to be two major problems. The first is that you can wave goodbye to small & unimportant things like connection pooling, since each user have their own login, they can’t share connections, which is going to substantially increase the cost of talking to the database.

The second is a bit more complex to explain. When the system perform an operation as a result of a user action, there are distinct differences between work that the system performs on behalf of the user and work that the system performs on behalf of the system.

Let us go back to our Debt Collection Agency and look at an example:

As a Debt Collector, I can finalize a settlement plan with a debtor, so the agency can make a lot of money.

  • A Debt Collector may only view settlement plans for the vendors that they handle debt collection for.
  • Settlement plan cannot be finalized if (along with other plans that may exists) the settlement plan would result in over 70% of the debtor salary going into paying debts.

This is pretty simple scenario. If I am collecting debts for ACME, I can’t take a peek and see how debts handle be EMCA, ACME’s competitor, are handled. And naturally, if the debtor’s income isn’t sufficient to pay the debt, it is pretty obvious that the settlement plan isn’t valid, and we need to consider something else.

Now, let us look at how we would actually implement this, the first rule specifies that we can’t see other settlement plans, but for us to enforce the second rule, we must see them, even if they belong to other creditors. In other words, we have a rule where the system need to execute in the context of the system and not in the context of the user.

You will be surprised how often such scenarios come up when building complex systems. When your security system is relying on the logged on user for handling security filtering, you are going to run into a pretty hard problem when it comes the time to handle those scenarios.

Considerations

So, where does this leave us? It leave us with the following considerations when the time comes to build  an authorization implementation:

  • You can’t handle authorization in the infrastructure, there isn’t enough context to make decisions there.
  • Relying on the logged on user for row/document level security is a good way to have a wall hit your head in a considerable speed.
  • Authorization must be optional, because we need to execute some operations to ensure valid state outside the security context of a single user.
  • Authorization isn’t limited to the small set of operations that you can perform from infrastructure perspective (Read / Write) but have business meaning that you need to consider.

An interesting RavenDB bug

I got a very strange bug report recently,

The following index:

from movie in docs.Movies
from actor in movie.Actors
select new { Actor = actor }

Will produce multiple results from a single document, which poses a pretty big problem when you try to page through that. Imagine that each movie has 10 actors, and you are trying to page through this index for the first two documents of movies by Charlie Chaplin. The first movie that matches Charlie Chaplin will have ten results returned from the index, and simple paging at the index level will give us the wrong results.

Here is my solution for that, which works, but make me just a tad uneasy:

public IEnumerable<IndexQueryResult> Query(IndexQuery indexQuery)
{
    IndexSearcher indexSearcher;
    using (searcher.Use(out indexSearcher))
    {
        var previousDocuments = new HashSet<string>();
        var luceneQuery = GetLuceneQuery(indexQuery);
        var start = indexQuery.Start;
        var pageSize = indexQuery.PageSize;
        var skippedDocs = 0;
        var returnedResults = 0;
        do
        {
            if(skippedDocs > 0)
            {
                start = start + pageSize;
                // trying to guesstimate how many results we will need to read from the index
                // to get enough unique documents to match the page size
                pageSize = skippedDocs * indexQuery.PageSize; 
                skippedDocs = 0;
            }
            var search = ExecuteQuery(indexSearcher, luceneQuery, start, pageSize, indexQuery.SortedFields);
            indexQuery.TotalSize.Value = search.totalHits;
            for (var i = start; i < search.totalHits && (i - start) < pageSize; i++)
            {
                var document = indexSearcher.Doc(search.scoreDocs[i].doc);
                if (IsDuplicateDocument(document, indexQuery.FieldsToFetch, previousDocuments))
                {
                    skippedDocs++;
                    continue;
                }
                returnedResults++;
                yield return RetrieveDocument(document, indexQuery.FieldsToFetch);
            }
        } while (skippedDocs > 0 && returnedResults < indexQuery.PageSize);
    }
}

iTunes full screen movies incompatible with Large Font sizes?

I have a PC hooked to my TV, but the problem is that there seems to be a bug in iTunes, when I set the font site to be large enough to actually be readable, like so:

image

I lose the ability to view full screen movies in iTunes, when I switch the movie to full screen, it continues playing (I can hear it) but the display switch back to the iTunes library, rather than the movie.

I verified the resetting the font size fixes this problem, and this is in iTunes 9.2.1.

Anyone run into this? Any solutions?

Tikal .NET forum: Introduction to NHibernate

I’ll be presenting on July 25th 10:00-11:30 in the Tikal .NET forum:

Tikal .NET forum is delighted to present an introduction to NHibernate, the  leading and most advanced open source ORM  (Object Relational Mapping) in the .NET domain with integrated support for concurrency, distribution, fault tolerance and incremental code loading.  ORM  takes care of the burden of mapping between your .NET entities and the underlying relational database.

Following a concise introduction into the motivation for the rising interests ORM, we will provide an general idea of the key features of the  NHibernate framework compared to other frameworks , and talk about the impact they have on the production of highly scalable and fault-tolerant systems.

See you all on July 25th at 10:00 in Krypton,  Hakfar-Hayarok Ramat-Hasharon.

Tags:

Published at

Buy vs. Build & YAGNI

I was recently at the Israeli ALT.Net tools night, and I had a very interesting discussion on installers. Installers are usually a very painful part of the release procedure. The installer format for Windows is MSI, which is… strange. It takes time to understand how MSI work, and even after you got that, it is still painful to work with. Wix is a great improvement when it comes to building MSI installations, but that doesn’t make it good. Other installer builders, such as InstallSheild and NSIS are just as awkward.

The discussion that I had related to the complexity of building an installer on those technologies.

My argument was that it simply made no sense to try to overcome the hurdles of the installer technologies, instead, we can write our own installer more easily than fussing about the existing ones. The installer already assumes the presence of the .NET framework, so that make things even easier.

This is an application of a principle that I strongly believes in: Single purpose, specially built tools & components can be advantageous over more generic ones, for your specific scenarios.

Case in point, the installer. Installers are complicated beasts because they must support a lot of complex scenarios (upgrading from 5.3.2 to 6.2.1, for example), be transactional, support installation, etc. But for the installer in question, upgrade is always an uninstall of the previous version & install of the new one, and the only tasks it requires is copying files and modifying registry entries.

Given that set of requirements, we can design the following installer framework:

public interface IInstallerTask
{
     void Install();
     void Uninstall();
}

public class FileCopyTask : IInstallerTask
{
    public string Source { get;set; }
    public string Destination { get;set; }
    
    public void Install()
    {
        File.Copy(Source, Destination,overwrite:true);
    }
    
    public void Uninstall()
    {
        File.Delete(Destination);
    }
}

And building a particular installer would be:

ExecuteInstaller(
    Directory.GetFiles(extractedTempLocation)
        .Select( file =>
            new FileCopyTask
            {
                Source = file,
                Destination = Path.Combine(destinationPath, file)
            }        
        ),
    new RegistryKeyTask
    {
        Key = "HKLM/Windows/CurrentVersion...",
        Value = 9
    }
);

This gives the ExecuteInstaller method a list of tasks to be executed, which can then be used to installer or uninstall everything.

Yes, it is extremely simple, and yes, it wouldn’t fit many scenarios. But, it is quick to do, match the current and projected requirements, doesn’t introduce any new technology to the mix and it works.

Contrast that with having someone on the team that is the Installer expert (bad) or having to educate the entire team about installer (expensive).

NHProf new feature: Expensive queries report

It has been a while since we had a new major feature for the profiler, but here it is:

image

The expensive queries report will look at all your queries and surface the most expensive ones across all the sessions. This can give you a good indication on where you need to optimize things.

Naturally, this feature is available across all the profiler profiles (NHibernate Profiler, Entity Framework Profiler, Linq to SQL Profiler and Hibernate Profiler).

Find the issue

There is a design issue that is revealed in the following tests, can you figure out why I changed the behavior and removed the tests?

image

image

image

NoSQL and Data Warehousing

I recently got this question on email, and I thought it would be a good subject for a post.

I wanted to get your thoughts about using NoSQL for data warehouse solutions. I have read mixed thoughts about this and curious where you stand.

Before we can talk about this, we need to understand what data warehousing is, using wise geek definition, that is:

Data warehousing is combining data from multiple and usually varied sources into one comprehensive and easily manipulated database. Common accessing systems of data warehousing include queries, analysis and reporting. Because data warehousing creates one database in the end, the number of sources can be anything you want it to be, provided that the system can handle the volume, of course. The final result, however, is homogeneous data, which can be more easily manipulated.

And if you follow that definition, it make an absolute sense to ask about data warehousing in a NoSQL situation. But remember, one of the things that tend to lead people to the NoSQL land is the desire to scale in some manner (more data, more users, higher concurrency, cheaper TCO) than is possible using a SQL solution. In order to achieve that goal, you have to be willing to accept the tradeoff associated with that, which is reduced flexibility. You can query a relational database every which way, but most NoSQL solutions have very strict rules about how you can query them, for example.

By the way, I am probably abusing the term SQL here. I meant the whole set of technologies generally associated with relational databases, so in this case, I am talking about OLAP data stores, which are the typical solution for data warehousing scenarios. OLAP is usually queried with MDX, which looks like this:

SELECT
    { [Measures].[Sales Amount], 
        [Measures].[Tax Amount] } ON COLUMNS,
    { [Date].[Fiscal].[Fiscal Year].&[2002], 
        [Date].[Fiscal].[Fiscal Year].&[2003] } ON ROWS
FROM [Adventure Works]
WHERE ( [Sales Territory].[Southwest] )

OLAP & MDX, like the relational database & SQL, gives us a lot of flexibility and power. But like relational databases, those come at a cost. At some point, if you have enough data, it gets impractical to store it all in a single server, and the usual arguments for NoSQL solutions come to the fore.

At that point, we have to decide what is it that we want to get from the data warehouse. In other words, we need to design our solution to match the kind of reports that we want to get out. Of the NoSQL solutions out there (Key/Value stores, Document Databases, Graph Databases, Column Family Databases) I would probably choose a Column Family database for such a task, since my primary concern is probably being able to handle large amount of data.

The type of reports that I would need would dictate how I would store the data itself, but once I built the schema, everything else should just work.

In short, for data warehousing, I think that the relational / OLAP world has significant advantages, mostly because in many BI scenarios, you want to allow the users to explore the data, which is easy with the SQL toolset, and harder with NoSQL solutions. But when you get too large (and large in OLAP scenarios is really large), you might want to consider limiting the users’ options and going with a NoSQL solution tailor to what they need.