Ayende @ Rahien

Refunds available at head office

My Passover Project: Introducing Rattlesnake.CLR

Okay, after spending quite a lot of time digging through the leveldb codebase, and with several years of working with RavenDB, I can say with confidence that the CLR make it extremely hard to build high performance server side systems using the CLR.

Mostly, the issues are related to GC and memory. In particular, not having any way to control memory allocation and/or the GC means that we can’t optimize those scenarios in any meaningful way. At the same time, I do not want to go back to the unmanaged world. As mentioned ,I just came back from a very deep dive into a non trivial C++ codebase ,and while I consider that codebase a really good one, that ain’t to say it is a pleasure to always be thinking about all the stuff that the CLR just takes away.

Therefor, I decided that I’m going to be doing something about it. And Rattlesnake.CLR was born:

image

The major features of the Rattlesnake.CLR include explicit memory management when required. Let us say that we know that we are going to be needing some amount of memory for a while, and then all of that can be thrown away. This is extremely common in scenarios such as a web request, pretty much all the memory that you generate during the processing web request can be safely free immediately. In RavenDB’s case, the memory we consume during indexing can be free immediately when we stop indexing. Right now this is a painful process of making sure that we allocate within the same gen0 and hoping that it won’t be too expensive, or that we won’t get a complete halt of the entire server while it is releasing memory. It also make it really hard to do things like limit the amount of memory your code uses.

Another requirement that I have is that Rattlesnake.CLR should be able to execute existing .NET assemblies without any additional steps. Since I don’t fancy doing ports of stuff that already exists.

In order to handle this scenario with the given constraints, we have:

   1: var heap = Heap.Create(HeapOptions.None, 
   2:     1024 * 1024,
   3:     512 * 1024 * 1024);
   4:  
   5: using(MemoryAllocations.AllocateFrom(heap))
   6: {
   7:    var sb = new StringBuilder();
   8:    for(var i = 0; i < 100; i ++ )
   9:          sb.AppendLine(i);
  10:    Console.WriteLine(sb.ToString());
  11: }
  12:  
  13: heap.Destroy(); 

All the code within the using statement is allocated in our own heap. In line 13, we are destroying all of that memory in one fell swoop.

There are a few notes about this that we probably should address:

  • By default, memory allocated by this form is not subject to any form of GC. The idea is that this whole heap is getting released immediately.
  • Note that last two parameters for the Heap.Create. The first is the initial size of the heap, and the second is  the max size. We now have a real way to actually limit the amount of memory a piece of code will use. This is really important on server applications where avoiding paging is critical.
  • For that matter, we can now figure out how much memory a particular piece of code uses, and allocate our resources accordingly.
  • You can use multiple heaps at the same time, although only one can be installed as the default allocation at a given point in time.

There is the explicit heap.GarbageCollect() method that will do GC only on that heap, and which you can schedule at your own convenience.  You can have two heaps, and allocate from one while you are GCing from the other. And yes ,that means that GCs using this methods will not stop the process!

Memory allocated on the heap is obviously only valid as long as the heap is valid. That means that once the heap is destroyed, you can’t access any of the objects that were created there. This has implications for things like cache. We provide MemoryAllocations.AllocateOnGlobalHeap<T>(args) method to force you to use the global heap, instead, if you want this memory to be always available and subject to GC.

This is early days yet, but we already see some really interesting performance improvements!

How does this work?

While an early experiment with Rattlensake.CLR was based on the Mono runtime. I quickly decided that I wanted to keep using the MS CLR. Now, it order to handle this I had to do some unnatural things (to say the least), but I think that I even managed to make this a supported option. Essentially, we are using the CLR Hosting API for this. In particular:

  • ICLRGCManager
  • IHostMalloc
  • IHostMemoryManager

You can use Rattlesnake.CLR like this:

.\Rattlesnake.exe Raven.Server.exe

Just for fun, we also allowed to place limits on the default heap, so you can be sure that you aren’t allocating too much there.

.\Rattlesnake.exe Raven.Server.exe --max-default-heap-size=256MB

We are still running some tests, but this is looking really good.

Hibernating Rhinos Practices: A Sample Project

I have previously stated that one of the things that I am looking for in a candidate is the actual candidate code. Now, I won’t accept “this is a project that I did for a client / employee”, and while it is nice to be pointed at a URL from the last project the candidate took part of, it is not a really good way to evaluate someone’s abilities.

Ideally, I would like to have someone that has an OSS portfolio that we can look at, but that isn’t always relevant. Instead, I decided to sent potential candidates the following:

Hi,

I would like to give you a small project, and see how you handle that.

The task at hand is to build a website for Webinars questions. We run bi-weekly webinars for our users, and we want to do the following:

  • Show the users a list of our webinars (The data is here: http://www.youtube.com/user/hibernatingrhinos)
  • Show a list of the next few scheduled webinar (in the user’s own time zone)
  • Allow the users to submit questions, comment on questions and vote on questions for the next webinar.
  • Allow the admin to mark specific questions as answered in a specific webinar (after it was uploaded to YouTube).
  • Manage Spam for questions & comments.

The project should be written in C#, beyond that, feel free to use whatever technologies that you are most comfortable with.

Things that we will be looking at:

  • Code quality
  • Architecture
  • Ease of modification
  • Efficiency of implementation
  • Ease of setup & deployment

Please send us the link to a Git repository containing the project, as well as any instructions that might be necessary.

Thanks in advance,

     Oren Eini

This post will go live about two weeks after I started sending this to candidates, so I am not sure yet what the response would be.

Defensive coding is your friend

We just had a failing test:

image

As you can see we assumed that fiddler is running, when it isn’t. Here is the bug:

image

Now, this is great when I am testing things out, and want to check what is going on the wire using Fiddler, but I always have to remember to revert this change, otherwise we will have a failing test and a failing build.

That isn’t very friction free, so I added the following:

image

Now the code is smart enough to not fail the test if we didn’t do things right.

Hibernating Rhinos Practices: Pairing, testing and decision making

We actually pair quite a lot, either physically (most of our stations have two keyboards & mice for that exact purpose) or remotely (Skype / Team Viewer).

2013-01-27 14.05.24 HDR

And yet, I would say that for the vast majority of cases, we don’t pair. Pairing is usually called for when we need two pairs of eyes to look at a problem, for non trivial debugging and that is about it.

Testing is something that I deeply believe in, at the same time that I distrust unit testing. Most of our tests are actually system tests. That test the system end to end. Here is an example of such a test:

[Fact]
public void CanProjectAndSort()
{
    using(var store = NewDocumentStore())
    {
        using(var session = store.OpenSession())
        {
            session.Store(new Account
            {
                Profile = new Profile
                {
                    FavoriteColor = "Red",
                    Name = "Yo"
                }
            });
            session.SaveChanges();
        }
        using(var session = store.OpenSession())
        {
            var results = (from a in session.Query<Account>()
                           .Customize(x => x.WaitForNonStaleResults())
                           orderby a.Profile.Name
                           select new {a.Id, a.Profile.Name, a.Profile.FavoriteColor}).ToArray();


            Assert.Equal("Red", results[0].FavoriteColor);
        }
    }
}

Most of our new features are usually built first, then get tests for them. Mostly because it is more efficient to get things done by experimenting a lot without having tests to tie you down.

Decision making is something that I am trying to work on. For the most part, I have things that I feel very strongly about. Production worthiness is one such scenario, and I get annoyed if something is obviously stupid, but a lot of the time decisions can fall into the either or category, or are truly preferences issues. I still think that too much goes through me, including things that probably should not.  I am trying to encourage things so I wouldn’t be in the loop so much. We are making progress, but we aren’t there yet.

Note that this post is mostly here to serve as a point of discussion. I am not really sure what to put in here, the practices we do are pretty natural, from my point of view. And I would appreciate any comments asking for clarifications.

Hibernating Rhinos Practices: We are hiring again

As part of this series, I wanted to take the time and let you know that we are hiring full time developers again.

This is applicable solely for developers in Israel.

We are working with C# (although I’ll admit that sometime we make it scream a little bit:

image

Candidate should be able to provide a project (and preferably more than one) that we can look at to see their code. It has got to be your code. It is ain’t yours (if it is code that you wrote for an employer, or if it is a university code project) I don’t wanna see it.

We are talking about a full time employee position, working on RavenDB, Uber Profiler, RavenFS and a bunch of other stuff that I don’t want to talk about yet.

Ping me with your CV if you are interested.

Hibernating Rhinos Practices: Development Workflow

The development workflow refers to how a developer decides what to do next, how tasks are organized, assigned and worked on.

Typically, we dedicate a lot of the Israeli’s team time to doing ongoing support and maintenance tasks. So a lot of the work are things that show up on the mailing lists. We usually triage them to one of four levels:

  • Interesting stuff that is outside of core competencies, or stuff that is nice to have that we don’t have resources for. We would usually handle that by requesting a pull request, or creating a low priority issue.
  • Feature requests / ideas – usually go to the issuer tracker and wait there until assigned / there is time to do them.
  • Bugs in our products – depending on severity, usually they are fixed on the spot, sometimes they are low priority and get to the issue tracker.
  • Priority Bugs – usually get to the top of the list over anything and everything else.

It is obviously a bit more complex, because if we are working on a particular area already, we usually also take the time to cover the easy-to-do stuff from the issue tracker.

Important things:

  • We generally don’t pay attention to releases, unless we have one pending for a product (for example, upcoming stable release for RavenDB).
  • We don’t usually try to prioritize issues. Most of them are just there, and get picked up by whoever gets them first.

We following slightly different workflows for Uber Prof & RavenDB. With Uber Prof, every single push generate a client visible build, and we have auto update to make sure that most people run on the very latest.

With RavenDB, we have the unstable builds, which is what every single push translates to, and the stable builds, which have a much more involved release process.

We tend to emphasize getting things out the door over the Thirteen Steps to Properly Release Software.

An important rule of thumb, if you are still the office by 7 PM, you have better showed up at 11 or so, just because zombies are cool nowadays doesn’t mean you have to be one. I am personally exempted from the rule, though.

Next, I’ll discuss pairing, testing and decision making.

Hibernating Rhinos Practices: Intro

I was asked to comment a bit on our internal practices in Hibernating Rhinos. Before I can do that, I have to explain about how we are structured.

  • The development team in Israel compose the core of the company.
  • There are additional contractors that do work in Poland, the states and the UK.

We rarely make distinctions between locations for work, although obviously we have specializations. Samuel is our go to guy for “Make things pretty” and “Silverlight hairloss”, for example, Arek is the really good in pointing to the right direction when there is a problem, and so on.

We currently have the following projects in place:

  • RavenDB
  • Uber Profiler
  • RavenFS
  • License / Orders Management
  • RavenDB.Net
  • HibernatingRhinos.com
  • ayende.com

Note that this is probably a partial list. And you might have noticed that I also included internal stuff, because that is also work, and something that we do.

In general, there isn’t a lot of “you work on this, or you work on that”, although again, there are areas of specialization. Fitzchak has been doing a lot of the work on Uber Prof, and Daniel is spending a lot of time on the RavenDB Studio. That doesn’t mean that tomorrow you wouldn’t find Fitzchak hacking on RavenDB indexes or Pawel working on exporting the profiler data to excel, and so on.

Next, I’ll discuss how we deal with the development workflow.

Riddle me this, why won’t this code work?

The following code will not result in the expected output:

using(var mem = new MemoryStream())
{
    using(var gzip = new GZipStream(mem, CompressionMode.Compress, leaveOpen:true))
    {
        gzip.WriteByte(1);
        gzip.WriteByte(2);
        gzip.WriteByte(1);
        gzip.Flush();
    }
    
    using (var gzip = new GZipStream(mem, CompressionMode.Compress, leaveOpen: true))
    {
        gzip.WriteByte(2);
        gzip.WriteByte(1);
        gzip.WriteByte(2);
        gzip.Flush();
    }

    mem.Position = 0;

    using (var gzip = new GZipStream(mem, CompressionMode.Decompress, leaveOpen: true))
    {
        Console.WriteLine(gzip.ReadByte());
        Console.WriteLine(gzip.ReadByte());
        Console.WriteLine(gzip.ReadByte());
    }


    using (var gzip = new GZipStream(mem, CompressionMode.Decompress, leaveOpen: true))
    {
        Console.WriteLine(gzip.ReadByte());
        Console.WriteLine(gzip.ReadByte());
        Console.WriteLine(gzip.ReadByte());
    }
}

Why? And what can be done to solve this?

Tags:

Published at

Originally posted at

Comments (15)

Tooling shout out: .NET Memory Profiler

image

To start with, I don’t have any association with them, I got nothing (no money, free license, promise of goodwill or anything else at all) from the SciTech Software (the creators of .NET Memory Profiler.

This tool has been instrumental in figuring out our recent memory issues. I have tried dotTrace Memory, JustTrace and WinDBG, but this tool outshone them all and was able to point us quite quickly to the root cause that we had to deal with, and from there, it was quite easy to reach a solution.

Highly recommended.

Tags:

Published at

Originally posted at

Comments (7)

Implementing LRU cache

In my last post I mentioned that checking whatever a user is an administrator or not using Active Directory query can be slow. That means that we can just make use of that, we have to cache that.

When caching is involved, we have to consider a few things. When do we expire the data? How much memory are we going to use? How do we handle concurrency?

The first thing that pops to mind is the usage of MemoryCache, now part of the .NET framework and easily accessible. Sadly, this is a heavy weight object, it creates its own threads to manage its state, which probably means we don’t want to use it for a fairly simple feature like this.

Instead, I implemented the following:

public class CachingAdminFinder
{
    private class CachedResult
    {
        public int Usage;
        public DateTime Timestamp;
        public bool Value;
    }

    private const int CacheMaxSize = 25;
    private static readonly TimeSpan MaxDuration = TimeSpan.FromMinutes(15);
    private readonly ConcurrentDictionary<SecurityIdentifier, CachedResult> cache =
        new ConcurrentDictionary<SecurityIdentifier, CachedResult>();


    public bool IsAdministrator(WindowsIdentity windowsIdentity)
    {
        if (windowsIdentity == null) throw new ArgumentNullException("windowsIdentity");
        if (windowsIdentity.User == null)
            throw new ArgumentException("Could not find user on the windowsIdentity", "windowsIdentity");

        CachedResult value;
        if (cache.TryGetValue(windowsIdentity.User, out value) && (DateTime.UtcNow - value.Timestamp) <= MaxDuration)
        {
            Interlocked.Increment(ref value.Usage);
            return value.Value;
        }
        bool isAdministratorNoCache;
        try
        {
            isAdministratorNoCache = IsAdministratorNoCache(windowsIdentity.Name);
        }
        catch (Exception e)
        {
            log.WarnException("Could not determine whatever user is admin or not, assuming not", e);
            return false;
        }
        var cachedResult = new CachedResult
            {
                Usage = value == null ? 1 : value.Usage + 1,
                Value = isAdministratorNoCache,
                Timestamp = DateTime.UtcNow
            };

        cache.AddOrUpdate(windowsIdentity.User, cachedResult, (_, __) => cachedResult);
        if (cache.Count > CacheMaxSize)
        {
            foreach (var source in cache
                .OrderByDescending(x => x.Value.Usage)
                .ThenBy(x => x.Value.Timestamp)
                .Skip(CacheMaxSize))
            {
                if (source.Key == windowsIdentity.User)
                    continue; // we don't want to remove the one we just added
                CachedResult ignored;
                cache.TryRemove(source.Key, out ignored);
            }
        }

        return isAdministratorNoCache;
    }

    private static bool IsAdministratorNoCache(string username)
    {
       // see previous post
    }
}

Amusingly enough, properly handling the cache takes (much) more code than it takes to actually get the value.

We use ConcurrentDictionary as the backing store for our cache, and we enhance the value with usage & timestamp information. Those come in handy when the cache grows too big and need to be trimmed.

Note that we also make sure to check the source every 15 minutes or so, because there is nothing as annoying as “you have to restart the server for it to pick the change”. We also handle the case were we can’t get this information for some reason.

In practice, I doubt that we will ever hit the cache max size limit, but I wouldn’t have been able to live with myself without adding the check Smile.

Are you an administrator?

In RavenDB vNext, we tightened the security story a bit. Some operations that used to be possible for standard users are now administrator operations. For example, creating a new database require you to be admin.

Figuring out whatever you are admin is a bit tough, though. In particular, we use the following logic to determine that:

  • If you logged in using OAuth, the credentials will tell us whatever you are admin or not.
  • If you are logged in using Windows Auth, we make the following assumption:
    • If you are a Windows Admin, you are an administrator (ouch!).
    • If you are running on the same user as the one RavenDB is using, you are an administrator (debug / dev scenarios).
  • If you are running embedded, you are admin.

You might have noticed that there is an “ouch” on the Windows Admin line. The reason for that is that it is actually quite hard to figure that one out. RavenDB is running as a web server, and when we use Windows Auth, we get a WindowsIdentity that we can use. The problem is with UAC. When that is turned on, what we get is the non elevated user. But that user is not an Admin in the Windows sense of the word. We don’t actually care about that (it isn’t like we need to impersonate the user), we just use that as a “yes/no” for certain ops.

This is documented here: https://connect.microsoft.com/VisualStudio/feedback/details/679546/problem-with-windowsprincipal-isinrole-when-uac-is-enabled

The resolution is by design.

So… we need another way to check for this. Luckily, since we don’t need impersonation, we can just check Active Directory for that. Here is how we do so:

private static bool IsAdministratorNoCache(string username)
{
    PrincipalContext ctx;
    try
    {
        Domain.GetComputerDomain();
        try
        {
            ctx = new PrincipalContext(ContextType.Domain);
        }
        catch (PrincipalServerDownException)
        {
            // can't access domain, check local machine instead 
            ctx = new PrincipalContext(ContextType.Machine);
        }
    }
    catch (ActiveDirectoryObjectNotFoundException)
    {
        // not in a domain
        ctx = new PrincipalContext(ContextType.Machine);
    }
    var up = UserPrincipal.FindByIdentity(ctx, username);
    if (up != null)
    {
        PrincipalSearchResult<Principal> authGroups = up.GetAuthorizationGroups();
        return authGroups.Any(principal =>
                              principal.Sid.IsWellKnown(WellKnownSidType.BuiltinAdministratorsSid) ||
                              principal.Sid.IsWellKnown(WellKnownSidType.AccountDomainAdminsSid) ||
                              principal.Sid.IsWellKnown(WellKnownSidType.AccountAdministratorSid) ||
                              principal.Sid.IsWellKnown(WellKnownSidType.AccountEnterpriseAdminsSid));
    }
    return false;
}

Here we check whatever the user is directly or indirectly and admin. Note that we have to take care of cases in which we are running inside & outside a domain, as well as cases where the domain controller is down.

This works, but there is just one problem with that, it is sloooow. As in, multiple seconds slow. Even on the local machine without any domain involved.

I’ll discuss how we solved that on my next post.

OH: This is beautiful

I just said that when I got this on the debugger:

image

I think that I need to get off the computer, and I’ll do so, just as soon as I am done with this feature.

I hate Hello World questions

That is what I call things like this:

image

It was part of a question I was asked, and the question contained things like:

I've a field (Field2) in MyClass as nested Dictionary and i have an index for Field 1.

There are several issues with this style of question:

  • It make it harder to answer, because you have to keep a mapping of that in your head.
  • There is no meaning to the question, we can’t figure out whatever this is a good or bad scenario.
  • Usually the problem description contains references to the real model, which I haven’t seen and don’t know anything about.

Annoying.

Tags:

Published at

Originally posted at

Comments (9)

It uses async, run for the hills (On .Net 4.0)

One of the major problems in .NET 4.0 async operation stuff is the fact that an unobserved exception will ruthlessly kill your application.

Let us look at an example:

image

On startup, check the server for any updates, without slowing down my system startup time. All well and good, as long as that server is reachable.

When it doesn’t, it will throw an exception, but not on the current thread, it will be thrown on another thread, and when the task is finalized, it will raise an UnobservedTaskException. Okay, so I’ll fix that and write code like this:

CheckForUpdatesAsync().ContinueWith(task=> GC.KeepAlive(task.Exception));

And that would almost work, except the implementation of CheckForUpdateAsync is:

private static Task CheckForUpdatesAsync()
{
    var webRequest = WebRequest.Create("http://myserver.com/update-check");
    webRequest.Method = "POST";
    return webRequest.GetRequestStreamAsync()
        .ContinueWith(task => task.Result.WriteAsync(CurrentVersion))
        .ContinueWith(task => webRequest.GetResponseAsync())
        .ContinueWith(task => new StreamReader(task.Result.GetResponseStream()).ReadToEnd())
        .ContinueWith(task =>
                          {
                              if (task.Result != "UpToDate")
                                  ShowUpdateDialogToUser();
                          });
}

Note the highlighted line, where we are essentially ignoring the failure to write to the server. That task is going to go away unobserved, the result, when GC happens, you’ll have an unobserved task exception.

This sort of error has all of the fun aspects of a good problem:

  • Only happen during errors
  • Async in nature
  • Bring down your application
  • Error location and error notification are completely divorced from one another

It is actually worse than having a memory leak!

This post explains some of the changes made with regards to unobserved exceptions in 4.5, and I wholeheartedly support this, but in 4.0, writing code that uses the TPL is easy and fun, but require careful code review to make sure that you aren’t leaking an unobserved exception.

Tags:

Published at

Originally posted at

Comments (20)

Why I LOVE ReSharper

image

I mean, just look at this. This is magic.

More to the point, think about what this means to try to implement something like that. I wouldn’t know where to even start.

Tags:

Published at

Originally posted at

Comments (17)

Truth in advertising

So I am at a client site discussing things about their new version of the software. And while it is easy to promise things, we did a POC to make sure that things will work out.

Because I know users, I made sure to have this ready:

image

I do software behavior, but I know my limitations, and doing good looking UI is one of them.

I had too many software demos blow up because of silly UI issues.

Tags:

Published at

Originally posted at

Comments (11)

When using the Task Parallel Library, Wait() is a BAD warning sign

Take a look at the following code:

public static Task ParseAsync(IPartialDataAccess source, IPartialDataAccess seed, Stream output, IEnumerable<RdcNeed> needList)
{
    return Task.Factory.StartNew(() =>
    {
        foreach (var item in needList)
        {
            switch (item.BlockType)
            {
                case RdcNeedType.Source:
                    source.CopyToAsync(output, Convert.ToInt64(item.FileOffset), Convert.ToInt64(item.BlockLength)).Wait();
                    break;
                case RdcNeedType.Seed:
                    seed.CopyToAsync(output, Convert.ToInt64(item.FileOffset), Convert.ToInt64(item.BlockLength)).Wait();
                    break;
                default:
                    throw new NotSupportedException();
            }
        }
    });
}

Do you see the problem in here?

It is a result of a code review comment about improper use of async in a project. This resulted in a lot of Task showing up in the return methods, but not in any measurable improvement in the actual codebase use of asynchronicity.

The problem is that when you need to work with such things in C# 4.0, you have to do some annoying things to get the code to work properly. In particular, this method was modified to be:

public static Task ParseAsync(IPartialDataAccess source, IPartialDataAccess seed, Stream output, IList<RdcNeed> needList, int position = 0)
{
  if(position>= needList.Count)
  {
        return new CompletedTask();
  }
  var item = needList[position];
  Task task;
            
  switch (item.BlockType)
  {
        case RdcNeedType.Source:
            task = source.CopyToAsync(output, Convert.ToInt64(item.FileOffset), Convert.ToInt64(item.BlockLength));
            break;
        case RdcNeedType.Seed:
            task = seed.CopyToAsync(output, Convert.ToInt64(item.FileOffset), Convert.ToInt64(item.BlockLength));
            break;
        default:
            throw new NotSupportedException();
  }

  return task.ContinueWith(resultTask =>
    {
        if (resultTask.Status == TaskStatus.Faulted)
            resultTask.Wait(); // throws
        return ParseAsync(source, seed, output, needList, position + 1);
    }).Unwrap();
}

This code is more complex, but it is actually making proper use of the TPL. We have changed the loop into a recursive function, so we can take advantage of ContinueWith to the next iteration of the loop.

And no, I can’t wait to get to C# 5.0 and have proper await work.

Security decisions: Separate Operations & Queries

The question came up several times in the mailing list with regards to how the RavenDB Authorization Bundle operates, and I think it serves a broader discussion.

Let us imagine a system where we have contracts, which may be in several states:

  • Mine – Contracts that an employee signed.
  • Done – Standard users can view, Lawyers assigned to the company can sign.
  • Draft – Lawyers can view / edit, Partners can approve.
  • Proposed – Lawyers can create / edit, but only the lawyer that created it can view it, Partners can accept.

So far, fairly simple, right? Except the pure hell that you are going to get into when you are trying to show the users all of the contracts that they can see, sorted by edit date and in the NDA category.

Why am I being so negative here? Well, let us look at what we are going to have to do in the most trivial of cases:

image

In this sort of system, we are going to have to show the user all of the contracts that they are allowed to see, and show them some indication what operations they can do on each.

The problem is that generating this sort of view is expensive. Especially when you have large amount of data to work through. More interesting, from a UX perspective, it also doesn’t really work that well. Most users would want a better separation of the things that they can do, probably something like this:

image

This allows us to do a first level filtering on the data itself, rather than try to apply security rules to it.

In the first case, we need to get all the contracts that we are allowed to see. The security rules above are really simple, mind. But trying to translate them into an efficient query is going to be pretty hard. Both in terms of the code requires and the cost to actually perform the query on the server. There are other things that are involved as well, such as paging and sorting in such an environment.  I have created several such systems in the past, Rhino Security is probably the most well known of them, and it gets really hard to optimize things and make sure that everything works when you start getting more complex security rules (especially when you have a user editable security system, which is a common request).

The second case is cheaper because we can limit the choices that we see in the query itself. We may still need to apply security concerns, but those goes through the query directly, rather than a security sub system. This kind of change usually force people to be more explicit in what they want, and it result in a system that tends to be simpler. The security rules aren’t just something arbitrary that can be defined, they are actually visible on the screen (My Contracts, Drafts, etc). Changing them isn’t something that is done on an administrator’s whim.

Yes, this is a way to manage the client and their expectations, but that is important. But what about the complex security that they want?

That might still be there, certainly, but that would be active mostly for operations (stuff that happen on a single entity), not on things that happen over all entities. It is drastically easier to make a single entity security decisions work efficiently than make it work over the whole set inside the database.

Hiding values, API keys and other fun stuff

This post is mostly about fun ideas. In one scenario, we had the need to show data to the user, but there was some concern with regards to the hackability of the URL.

In general, you should be handle such things within your code, checking permissions, etc. But I decided to see if I can do something nice with things, and I got this:

private static object HideValues(string entityId, string tenantId, byte[] key, byte[] iv)
{
    using (var rijndael = Rijndael.Create())
    {
        rijndael.Key = key;
        rijndael.IV = iv;
        var memoryStream = new MemoryStream();
        using (var cryptoStream = new CryptoStream(memoryStream, rijndael.CreateEncryptor(), CryptoStreamMode.Write))
        using (var binaryWriter = new BinaryWriter(cryptoStream))
        {
            binaryWriter.Write(entityId);
            binaryWriter.Write(tenantId);
            binaryWriter.Flush();

            cryptoStream.Flush();
        }
        var bytes = memoryStream.ToArray();
        var sb = new StringBuilder();
        for (int index = 0; index < bytes.Length; index++)
        {
            var b = bytes[index];
            sb.Append(b.ToString("X"));
            if (index % (bytes.Length/4) == 0 && index > 0)
                sb.Append('-');
        }
        return sb;
    }
}

This will generate a “guid looking” value that we can send to the user. When they send it back to us, we can decrypt it and figure out what is actually going on in there.

Because it is encrypted, we know that this is a valid key, because otherwise we wouldn’t be able to decrypt it to valid data.

Passing 15 and 32 as the first two values, I got the following value back: 2A8AC8888-46B92092-BFD81393-7A6FB1

And it handle larger values as easily, of course. Quite fun, even if I say so myself. Not sure if this is useful, but I got into writing code because it is a great hobby.

Tags:

Published at

Originally posted at

Comments (15)

The economics of continuous deployment

One of the things that I did, almost by accident, when we started Hibernating Rhinos was to create a CI server and a public daily build server. And every single successful build ended up in customer hands. That was awesome in many respects, it removed a lot of the “we have got to make a new release” pressure, because we were making new releases, sometimes multiple times a day.

When we started with RavenDB, it was obvious to me that this was what we were going to do with it as well, because the advantages to this approach as so clear. With RavenDB, we needed a two stage system, but still, every single build gets to the customer hands.

Awesome, great, outstanding, exceptional and other such synonyms. As long as you look at this from one angle, the one in which we are only concerned about the technical challenges of delivering software .The problem is that there are additional things to note here. Economic challenges.

Let us take the profiler as a good example. It was released in beta on the Jan 1, 2009, and since then we had 920 separate builds, adding a ton of new features, capabilities, improving performance, making things smoother and in general making it a better product.

That is over 3 years without a major release, mostly because we never had the need to do this, we kept delivering software on a day to day basis.

During that time, we delivered features such as viewing the result set, checking the query plan of a query (in all major databases), exporting the entire session to HTML so you can send it to your DBA, CI integration and so much more. It has been wonderful.

Except… this has one implications that I didn’t think of at the time. If you bought NH Prof on the 1st Jan, 2009 you got 3 years of product updates, for no additional costs. And unless we create a new major version, you can keep using the software, including all the updates and improvements, without paying.

That is great for the very early customers, but not so good for the people who need to eat so they can work on the profiler. Let us think about the implications of this a bit more, okay?

In order for us to actually make money, we have to:

  • keep expanding our one-off customer base, which is going to hit a limit at some point.
  • create a new version, getting the old customer to purchase the updates.

Seems simple, right? This is what most companies do, and how most software is sold. You get a license for version 1 and you buy a license for version 2.

So far, so good. But let us consider the implications of that. In order to get the old users to buy the new one, I have to put some really nice stuff in the next version. Which means that I have to do a lot of “secret” development because I can’t just release it on our usual continuous deployment mode. That sucks. And it also means that features that are already coded are actually disabled because we defer them to the next version.

So, the next version of the profilers is going to have to have some interesting features to get people to buy it. One of them is production profiling. It has actually been around for quite a while. It has simply been #ifdef’ed out of the product, because it is something that we keep for the next version.

I just checked, and I was acutely surprised by what I found. The initial work for production profiling was done in Jan 2010, it is working since then. I got side tracked with RavenDB so I never had the chance to actually complete the rest of the features for 2.x and release them all.

In mid 2010 we started experimenting with subscriptions. Instead of having a one time payment model, we moved to a pay as you go. So as long as you were using the profiler, you were paying for it, and in return, we provided all of those new features.

I have been thinking about this a lot lately. I strongly lean toward making the next version of the profiler (coming soon, and it will have a bunch of nice features) subscription only.

My current thinking it to allow two modes of buying the product. Monthly / yearly subscription and a one time fee that give you 18 months of usage (and doesn’t re-charge). That would allow us to keep producing software in incremental steps, without having to go away for a while and work in secret on big ticket features just so we can have enough stuff to put on “why you should buy 2.x” list.

I would appreciate any feedback that you may have.

Tags:

Published at

Originally posted at

Comments (54)

Architecture > Code

Steve Py asks an interesting question in one of the comments to my On Infinite Scalability post:

Can you elaborate more on: "Note, those changes are not changes to the code, they are architectural and system changes. Where before you had a single database, now you have many. Where before you could use ACID, now you have to use BASE. You need to push a lot more tasks to the background, the user interaction changes, etc."

When you talk about jumping from 1 server to multiple servers, ACID to BASE, and how user interaction changes, how do you quantify that this is done without code changes?

The answer to that is that there is a mistaken assumption here. Changing the architecture is going to change the code. But usually that is rarely relevant, because changing the architecture is a big change. If you are moving from a single DB to multiple database, for example, there are going to be code changes, but that isn’t what you worry about. The major change is the architecture differences (how do you split the data, how do you do reporting, can some of the dbs be down, etc).

Moving from ACID to BASE is an even greater change. The code might change a little or change drastically, but that isn’t where a lot of the effort is. Just defining the new system behavior on those scenarios is going to be much more complex. For example, taking something as simple as “user names are unique” would move from being a unique constraint in the database to something that needs to be able to handle those sort of things in a reasonable fashion.

Depending on your original architecture, it might be anything from replacing a single service implementation to re-writing significant parts of the code.

Expanding your horizons: Actions

In theory, there is no difference between theory and real life.

In my previous blog post, I discussed my belief that the best value you get from learning is learning the very basic of how our machines operate. From learning about memory management in operating systems to the details of how network protocols like TCP/IP work.

Some of that has got to be theoretical study, actually reading about how those things work, but theory isn’t enough. I don’t care if you know the TCP specs by heart, if you haven’t actually built a real system with it, and experienced the pain points, it isn’t really meaningful. The best way to learn, at least from my own experiences, is to actually do something.

Because that teaches you several very interesting things:

  • What are the differences between the spec and what is actually implemented?
  • How to resolve common (and not so common problems)?

The later is probably the most important thing. I think that I learned most of what I know about HTTP in the process of building an RSS feed reader. I learned a lot about TCP from implementing a proxy system, and I did a lot of learning from a series of failed projects regarding distributed programming in general.

I learned a lot about file systems and how to work with file based storage from Practical File System Design and from building Rhino Queues and Rhino DHT. In retrospect, I did a lot of very different projects in various areas and technologies.

The best way that I know to get better is to do, to fail, and to learn from what didn’t work. I don’t know of any shortcuts, although I am familiar with plenty of ways of making the road much longer (and usually not very pleasant).

In short, if you want to get better, pick something that you don’t know how to do, and then do it. You might fail, you likely will, but you’ll learn a lot from failing.

I keep drawing a blank when people ask me to suggest options for things to try building, so I thought that I would ask the readers of this blog. What sort of things do you think would be useful to build? Things that would push most people out of their comfort zone and make them learn the fundamentals of how things work.

Tags:

Published at

Originally posted at

Comments (13)

Expanding your horizons

One of the questions that I routinely get asked is “how do you learn”. And the answer that I keep giving is that I had accidently started learning things from the basic building blocks. I still count a C/C++ course that I took over a decade ago as one of the chief reasons why I have a good grounding in how computers actually operate. During that course, we had to do anything from building parts of the C standard library on our own to construct much of the foundation of C++ features in plain C.

That gave me enough understanding of how things are actually implemented to be able to grasp how things are behaving elsewhere. Digging deep into the implementation is almost never a wasted effort. And if you can’t peel away the layer of abstractions, you can’t really say that you know what you are doing.

For example, I count myself ignorant in all manners about WCF, but I have full confidence that I can build a system using it. Not because I understand WCF itself, but because I understand the arena in which it plays. I don’t need to really understand how a certain technology works, if I already know what are the rules it has to play with.

Picking on WCF again, if you don’t know firewalls and routers, you can’t really build a WCF system, regardless of how good your memory is about the myriad ways of configuring WCF to do you will. If you can’t use WireShark to figure out why the system is slow to respond to requests, it doesn’t matter if you can compose an WCF envelope message literally on the back of a real world envelope.  If you don’t grok the Fallacies of Distributes Computing, you shouldn’t be trying to build a real system where WCF is used, regardless of whatever certificate you have from Microsoft.

The interesting bit is that for most of what we do, the rules are fairly consistent. We all have to play in Turing’s sand box, after all.

What this means is that learning the details of IP and TCP will be worth it over and over again. Understanding things like memory fetch latencies would be relevant in 5 years and in ten. Knowing what actually goes on in the system, even if it at a somewhat abstracted level is important. That is what make you the master of the system, instead of its slave.

Some of the things that I especially value, and that is of the top of my head and isn’t a closed list are:

  • TCP / UDP – how do they actually work.
  • HTTP – and implications (for example, state management).
  • The Fallacies of Distributed Computing.
  • Disk based storage – efficiently working with it, how file system works.
  • Memory management in OS and your environment.

Obviously, this is a very short list, and again, it isn’t comprehensive.  It is just meant to give you some indications for things that I have found to be useful over and over and over again.

That kind of knowledge isn’t something that is replaced often, and it will help you understand how anyone else has to interact with the same constraints. In fact, it often allows you to accurately guess how they solve a certain problem, because you are aware of the same alternatives that the other side had to solve.

In short, if you seek to be a better developer, dig deep and learn the real basic building blocks for our profession.

In my next post, I’ll discuss strategies for doing that.

Tags:

Published at

Originally posted at

Comments (25)

Frictionless development: Web.config and connection strings

This is something that I actually run into a lot at customer sites. They have a lot of friction during development between different connection strings that developers use during development. For example, we may have one developer using:

<add name="RavenDB" connectionString="Url=http://localhost:8080" />

While another is using:

<add name="RavenDB" connectionString="Url=http://localhost:8191;Database=TheApp" />

This usually causes a hell of a lot of trouble in most teams (or maybe you have developers that use SQL Express, and some who installed SQL Development, etc).

That is friction, and you want to deal with that as soon as possible. The easiest thing to do is actually:

<add name="Ayende-PC" connectionString="Url=http://localhost:8080" />
<add name="Ayende-Laptop" connectionString="Url=http://localhost:8191;Database=TheApp" />

This works, because the connection string name is now the machine name (System.Environment.MachineName). That is a great first step, because it means that you can get things done without fighting over the connection string in the web.config.

Another alternative is to have a default connection string, and allow to “override” it with the specification of a connection string specific for the machine.

It is a small thing, but it actually helps quite a lot. You can extend this to other settings as well. For apps that have a lot of settings, I usually take them out of the Web.config into a Default.config file, and the configuration reader is set to look for [MachineName].config first, and only then at the Default.config file.

Tags:

Published at

Originally posted at

Comments (30)

Negative hiring decisions, Part II

Another case of a candidate completing a task at home and sending it to me which resulted in a negative hiring decision is this:

protected void Button1_Click(object sender, EventArgs e)
{
    string connectionString = @"Data Source=OFFICE7-PC\SQLEXPRESS;Integrated Security=True";
    string sqlQuery = "Select UserName From  [Users].[dbo].[UsersInfo] Where UserName = ' " + TextBox1.Text + "' and Password = ' " + TextBox2.Text+"'";
    
    using (SqlConnection connection = new SqlConnection(connectionString))
    {
        SqlCommand command = new SqlCommand(sqlQuery, connection);
        connection.Open();
        SqlDataReader reader = command.ExecuteReader();
        try
        {
            while (reader.Read())
            {
           
            }
        }
        finally
        {
            if (reader.HasRows)
            {
                reader.Close();
                Response.Redirect(string.Format("WebForm2.aspx?UserName={0}&Password={1}", TextBox1.Text, TextBox2.Text));
            }
            else
            {
                reader.Close();
                Label1.Text = "Wrong user or password";
            }
        }
    }
}

The straw that really broke the camel’s back in this case was the naming of WebForm2. I could sort of figure out the rest, but not bothering to give a real name to the page was over the top.