Saturday, March 20, 2010 #

My Erlang Religious Moment

It is not often that a piece of code cause me to have a religious moment, but this one managed to:

image

posted @ Saturday, March 20, 2010 12:00 PM | Feedback (2)

Friday, March 19, 2010 #

Challenge: What does this code do?

What language is it? What is it doing? Why?

image

posted @ Friday, March 19, 2010 12:00 PM | Feedback (17)

Thursday, March 18, 2010 #

Rhino ETL Video

Paul Barriere has a video up of a presentation about Rhino ETL:

ETL stands for Extract, Transform, Load. For example, you receive files or other data from vendors or other third parties which you need to manipulate in some way and then insert into your own database. Rhino ETL is an open source C# package that I have used for dozens of production processes quite successfully. By using C# for your ETL tasks you can create testable, reusable components more easily than with tools like SSIS and DTS.

It is good to see more information available on Rhino ETL.

posted @ Thursday, March 18, 2010 12:00 PM | Feedback (2)

Wednesday, March 17, 2010 #

Software pricing - Don’t Just Roll the Dice

I found this ebook very informative. I very happy to learn that I at least thought about most of the things that the book recommend, but I wish I read it two years ago.

If you are an ISVN, I highly recommended, because it is not just some dry academic reading, it contains examples that you can easily relate to, probably examples that you are already familiar with.

posted @ Wednesday, March 17, 2010 12:00 PM | Feedback (1)

Sunday, March 14, 2010 #

What is map/reduce for, anyway?

Yesterday I gave a visual explanation about map/reduce, and the question came up about how to handle computing navigation instructions using map/reduce. That made it clear that while (I hope) what map/reduce is might be clearer, what it is for is not.

Map/reduce is a technique aimed to solve a very simple problem, you have a lot of data and you want to go through it in parallel, probably on multiple machines. The whole idea with the concept is that you can crunch through massive data sets in realistic time frame. In order for map/reduce to be useful, you need several things:

  • The calculation that you need to run is one that can be composed. That is, you can run the calculation on a subset of the data, and merge it with the result of another subset.
    • Most aggregation / statistical functions allow this, in one form or another.
  • The final result is smaller than the initial data set.
  • The calculation has no dependencies external input except the dataset being processed.
  • Your dataset size is big enough that splitting it up for independent computations will not hurt overall performance.

Now, given those assumptions, you can create a map/reduce job, and submit it to a cluster of machines that would execute it. I am ignoring data locality and failover to make the explanation simple, although they do make for interesting wrinkles in the implementation.

Map/reduce is not applicable, however, in scenarios where the dataset alone is not sufficient to perform the operation. In the case of the navigation computation example, you can’t really handle this via map/reduce because you lack key data point (the starting and ending points). Trying to computing paths from all points to all other points is probably a losing proposition, unless you have a very small graph. The same applies if you have a 1:1 mapping between input and output. Oh, Map/Reduce will still work, but the resulting output is probably going to be too big to be really useful. It also means that you have a simple parallel problem, not a map/reduce sort of problem.

If you need fresh results, map/reduce isn’t applicable either, it is an inherently a batch operation, not an online one. Trying to invoke map/reduce operation for a user request is going to be very expensive, and not something that you really want to do.

Another set of problems that you can’t really apply map/reduce to are recursive problems. Fibonacci being the most well known among them. You cannot apply map/reduce to Fibonacci for the simple reason that you need the previous values before you can compute the current one. That means that you can’t break it apart to sub computations that can be run independently.

If you data size is small enough to fit on a single machine, it is probably going to be faster to process it as a single reduce(map(data)) operation, than go through the entire map/reduce process (which require synchronization). In the end, map/reduce is just a simple paradigm for gaining concurrency, as such it is subject to the benefits and limitations of all parallel programs. The most basic one of them is Amdahl's law.

Map/reduce is a very hot topic, but you need to realize what it is for. It isn’t some magic formula from Google to make things run faster, it is just Select and GroupBy, run over a distributed network.

posted @ Monday, March 15, 2010 12:00 PM | Feedback (4)

Map / Reduce – A visual explanation

Map/Reduce is a term commonly thrown about these days, in essence, it is just a way to take a big task and divide it into discrete tasks that can be done in parallel. A common use case for Map/Reduce is in document database, which is why I found myself thinking deeply about this.

Let us say that we have a set of documents with the following form:

{
  "type": "post",
  "name": "Raven's Map/Reduce functionality",
  "blog_id": 1342,
  "post_id": 29293921,
  "tags": ["raven", "nosql"],
  "post_content": "<p>...</p>",
  "comments": [
    { 
      "source_ip": '124.2.21.2',
      "author": "martin",
      "text": "..."
  }]
}

And we want to answer a question over more than a single document. That sort of operation requires us to use aggregation, and over large amount of data, that is best done using Map/Reduce, to split the work.

Map / Reduce is just a pair of functions, operating over a list of data. In C#, LInq actually gives us a great chance to do things in a way that make it very easy to understand and work with. Let us say that we want to be about to get a count of comments per blog. We can do that using the following Map / Reduce queries:

from post in docs.posts
select new {
  post.blog_id, 
  comments_length = comments.length 
  };

from agg in results
group agg by agg.key into g
select new { 
  agg.blog_id, 
  comments_length = g.Sum(x=>x.comments_length) 
  };

There are a couple of things to note here:

  • The first query is the map query, it maps the input document into the final format.
  • The second query is the reduce query, it operate over a set of results and produce an answer.
  • Note that the reduce query must return its result in the same format that it received it, why will be explained shortly.
  • The first value in the result is the key, which is what we are aggregating on (think the group by clause in SQL).

Let us see how this works, we start by applying the map query to the set of documents that we have, producing this output:

image

The next step is to start reducing the results, in real Map/Reduce algorithms, we partition the original input, and work toward the final result. In this case, imagine that the output of the first step was divided into groups of 3 (so 4 groups overall), and then the reduce query was applied to it, giving us:

image

You can see why it was called reduce, for every batch, we apply a sum by blog_id to get a new Total Comments value. We started with 11 rows, and we ended up with just 10. That is where it gets interesting, because we are still not done, we can still reduce the data further.

This is what we do in the third step, reducing the data further still. That is why the input & output format of the reduce query must match, we will feed the output of several the reduce queries as the input of a new one. You can also see that now we moved from having 10 rows to have just 7.

image

And the final step is:

image

And now we are done, we can reduce the data any further because all the keys are unique.

There is another interesting property of Map / Reduce, let us say that I just added a comment to a post, that would obviously invalidate the results of the query, right?

Well, yes, but not all of them. Assuming that I added a comment to the post whose id is 10, what would I need to do to recalculate the right result?

  • Map Doc #10 again
  • Reduce Step 2, Batch #3 again
  • Reduce Step 3, Batch #1 again
  • Reduce Step 4

What is important is that I did not have to touch quite a bit of the data, making the recalculation effort far cheaper than it would be otherwise.

And that is (more or less) the notion of Map / Reduce.

posted @ Sunday, March 14, 2010 12:00 PM | Feedback (19)

Saturday, March 13, 2010 #

NHibernate symbol & source server support

A symbol server allows you to debug into code that you don’t have on your machine by querying a symbol server and a source server for the details.

SymbolSource.org has added support for NHibernate, and you can download an unofficial build of the current trunk which can be used with the symbol server here.

Moving on, all of NHibernate’s releases will be tracked on Symbol Source, so even if you don’t have the source available, you can just hit F11 and debug into NHibernate’s source.

posted @ Saturday, March 13, 2010 12:00 PM | Feedback (1)

Friday, March 12, 2010 #

You can’t learn the basics from the pros

There was a question recently in the NHibernate mailing list from a guy wanting to learn how to write database agnostic code from the NHibernate source code.

While I suppose that this is possible, I can’t really think of a worst way to learn how to write database agnostic code than reading an OR/M code. The reason for that is quiet simple, an OR/M isn’t just about one thing, it is about doing a lot of things together and bringing them together into a single whole. Yes, most OR/M are database agnostic, but trying to figure out the principles of that from the code base is going to be very hard.

That leads to an even more interesting problem, it is very hard for a beginner to actually learn something useful from a professional. That is true in any field, of course. In software, the problem is that most pros would simply skip whole steps that beginners are thought to be crucial. It isn’t from neglect, it is because they are going through those steps, but not in a conscious level.

Very often, I’ll come up with a design, and when it is only when I need to justify it to someone else that I actually realize the finer points of what have actually gone through my own head (another reason that having a blog is useful).

I think that there is a reason that we have names like code smells, you can immediately sense a problem in a smelly codebase, although it may take you a while to actually articulate it.

posted @ Friday, March 12, 2010 12:00 PM | Feedback (10)

Thursday, March 11, 2010 #

Sometimes you really need a profiler handy

As part of the performance work I am doing for Uber Prof, I fixed a couple of issues related to profiling very busy applications.

Here is the result on processing a 4GB input file.

image 

I can’t say that the profiler is happy about it, but it works.

posted @ Thursday, March 11, 2010 12:00 PM | Feedback (2)

Wednesday, March 10, 2010 #

Using Lucene – External Indexes

Lucene is a document indexing engine, that is its sole goal, and it does so beautifully. The interesting bit about using Lucene is that it probably wouldn’t be your main data store, but it is likely to be an important piece of your architecture.

The major shift in thinking with Lucene is that while indexing is relatively expensive, querying is free (well, not really, but you get my drift). Compare that to a relational database, where it is usually the inserts that are cheap, but queries are usually what cause us issues. RDBMS are also very good in giving us different views on top of our existing data, but the more you want from them, the more they have to do. We hit that query performance limit again. And we haven’t started talking about locking, transactions or concurrency yet.

Lucene doesn’t know how to do things you didn’t set it up to do. But what it does do, it does very fast.

Add to that the fact that in most applications, reads happen far more often than write, and you get a different constraint system. Because queries are expensive on the RDBMS, we try to make few of them, and we try to make a single query do most of the work. That isn’t necessarily the best strategy, but it is a very common one.

With Lucene, it is cheap to query, so it makes a lot more sense to perform several individual queries and process their results together to get the final result that you need. It may require somewhat more work (although there are things like Solr that would do it for you), but it is results in a far faster system performance overall.

In addition to that, since the Lucene index is important, but can always be re-created from source data (it may take some time, though), it doesn’t require all the ceremony associated with DB servers. Instead of buying an expensive server, get a few cheap ones. Lucene scale easily, after all. And since you only use Lucene for indexing, your actual DB querying pattern shift. Instead of making complex queries in the database, you make them in Lucene, and you only hit the DB with queries by primary key, which are the fastest possible way to get the data.

In effect, you outsourced your queries from the expensive machines to the cheap ones, and for a change, you actually got better performance overall

posted @ Wednesday, March 10, 2010 12:00 PM | Feedback (18)

Tuesday, March 09, 2010 #

Hibernate Profiler New Feature: Parameters Values

One of the annoying things about the Hibernate port of the profiler was that JDBC didn’t provide us with the parameters values.

Eric has just fixed and that is now live:

hibernate-parameters

Enjoy…

posted @ Tuesday, March 09, 2010 2:12 PM | Feedback (0)

Cut the abstractions by putting test hooks

I have been hearing about testable code for a long time now. It looks somewhat like this, although I had to cut on the number of interfaces along the way.

We go through a lot of contortions to be able to do something fairly simple, avoid hitting a data store in our tests.

This is actually inaccurate, we are putting in a lot of effort into being able to do that without changing production code. There are even a lot of explanations how testable code is decoupled, and easy to change, etc.

In my experience, one common problem is that we put in too much abstraction in our code. Sometimes it actually serve a purpose, but in many cases, it is just something that we do to enable testing. But we still pay the hit in the design complexity anyway.

We can throw all of that away, and keep only what we need to run production code, but that would mean that we would have harder time with the tests. But we can resolve the issue very easily by making my infrastructure aware of testing, such as this example:

image

But now your production code was changed by tests?!

Yes, it was, so? I never really got the problem people had with this, but at this day and age, putting in the hooks to make testing easier just make sense. Yes, you can go with the “let us add abstractions until we can do it”, but it is much cheaper and faster to go with this approach.

Moreover, notice that this is part of the infrastructure code, which I don’t consider as production code (you don’t touch it very often, although of course it has to be production ready), so I don’t have any issue with this.

Nitpicker corner: Let us skip the whole TypeMock discussion, shall we?

TDD fanatic corner: I don’t really care about testable code, I care about tested code. If I have a regression suite, that is just what i need.

posted @ Tuesday, March 09, 2010 12:00 PM | Feedback (15)

Monday, March 08, 2010 #

Lessons learned from building the NHibernate Profiler

Last week I gave a talk about the things I learned from building NH Prof.

Skills Matter had recorded the session and made it available.

Looking forward for your comments, but I should disclaimer that this was after a full day of teaching and on 50 min of sleep in the last 48 hours

posted @ Monday, March 08, 2010 12:00 PM | Feedback (5)

Slaying relational hydras (or dating them)

Sometimes client confidentiality can be really annoying, because the problem sets & suggested solutions that come up are really interesting. That said, since I am interesting in having future clients, it is pretty much a must have. As such, the current post represent a real world customer problem, but probably in a totally different content. In fact, I would be surprised if the customer was able to recognize the problem as his.

That said, the problem is actually quite simple. Consider a dating site, where you can input your details and what you seek, and the site will match you with the appropriate person. I am going to ignore a lot of things here, so if you actually have built a dating site, try not to cringe.

At the most basic level, we have two screens, the My Details screen, where the users can specifies their stats and their preferences:

image

And the results screen, which shows the user the candidate matching their preferences.

There is just one interesting tidbit, the list of qualities is pretty big (hundreds or thousands of potential qualities).

Can you design a relational model that would be a good fit for this? And allow efficient searching?

I gave it some thought, and I can’t think of one, but maybe you can.

I’ll follow up on this post in a day or two, showing how to implement the problem using Raven.

posted @ Monday, March 08, 2010 10:46 AM | Feedback (26)

Monday, March 01, 2010 #

Rhino Divan DB – Performance

The usual caveat applies, I am running this in process, using small data size and small number of documents.

This isn’t so much as real benchmarks, but they are good enough to give me a rough indication about where i am heading, and whatever or not i am going in completely the wrong direction.

Those two should be obvious:

image image

This one is more interesting, RDB doesn’t do immediate indexing, I chose to accept higher CUD throughput and make indexing a background process. That means that the index may be inconsistent for a short while, but it greatly reduce the amount of work required to insert/update a document.

But, what is that short while in which the document and the DB may be inconsistent. The average time seems to be around 25ms in my tests, with some spikes toward 100 ms in some of the starting results. In general, it looks like things are improving the longer the database run. Trying it out over a 5,000 document size give me an average update duration of 27ms, but note that I am testing the absolute worst usage pattern, lot of small documents inserted one at a time with index requests coming in as well.

image

Be that as it may, having inconsistency period measured in a few milliseconds seems acceptable to me. Especially since RDB is nice enough to actually tell me if there are any inconsistencies in the results, so I can chose whatever to accept them or retry the request.

posted @ Sunday, March 07, 2010 12:00 PM | Feedback (10)

Saturday, March 06, 2010 #

Getting code ready for production

I am currently doing the production-ready pass through the Rhino DivanDB code base, and I thought that this change was interesting enough to post about:

public void Execute()
{
    while(context.DoWork)
    {
        bool foundWork = false;
        transactionalStorage.Batch(actions =>
        {
           var task = actions.GetFirstTask();
           if(task == null)
           {
               actions.Commit(); 
               return;
           }
           foundWork = true;

           task.Execute(context);

           actions.CompleteCurrentTask();

           actions.Commit();
        });
        if(foundWork == false)
            context.WaitForWork();
    }
}

This is “just get things working” phase. When getting a piece of code ready for production, I am looking for several things:

  • If this is running in production, and I get the log file, will I be able to understand what is going on?
  • Should this code handle any exceptions?
  • What happens if I send values from a previous version? From a future version?
  • Am I doing unbounded operations?
  • For error handling, can I avoid any memory allocations?

The result for this piece of code was:

public void Execute()
{
    while(context.DoWork)
    {
        bool foundWork = false;
        transactionalStorage.Batch(actions =>
        {
            var taskAsJson = actions.GetFirstTask();
            if (taskAsJson == null)
            {
                actions.Commit();
                return;
            }
            log.DebugFormat("Executing {0}", taskAsJson);
            foundWork = true;

            Task task;
            try
            {
                task = Task.ToTask(taskAsJson);
                try
                {
                    task.Execute(context);
                }
                catch (Exception e)
                {
                    if (log.IsWarnEnabled)
                    {
                        log.Warn(string.Format("Task {0} has failed and was deleted without completing any work", taskAsJson), e);
                    }
                }
            }
            catch (Exception e)
            {
                log.Error("Could not create instance of a task from " + taskAsJson, e);
            }

            actions.CompleteCurrentTask();
            actions.Commit();
        });
        if(foundWork == false)
            context.WaitForWork();
    }
}

The code size blows up really quickly.

posted @ Saturday, March 06, 2010 12:00 PM | Feedback (24)

Friday, March 05, 2010 #

Sometimes I have code blinders on

This is a piece of code that I am using in RDB, at some point, it threw a null reference exception:

image

I am ashamed to admit that I started doing some really deep debugging to understand the bug (this happen only under very strange circumstances).

When I figured out what it was, I was deeply ashamed, this is easy.

posted @ Friday, March 05, 2010 12:00 PM | Feedback (31)

Actual scenario testing with Raven

Yesterday I posted about doing scenario testing with Raven, and I showed the concept of what i am doing. This time, I wanted to show what I am actually talking about, and how this is implemented. Here are the current scenarios for Raven.

image

Each scenario is looks something like this (showing PutAndGetDocument here):

image

And the second request:

image

The scenarios are being picked up using:

public class AllScenariosWithoutExplicitScenario
{
    [Theory]
    [PropertyData("ScenariosWithoutExplicitScenario")]
    public void Execute(string file)

        new Scenario(Path.Combine(ScenariosPath, file+".saz")).Execute();
    }

    public static string ScenariosPath
    {
        get
        {
            return Directory.Exists(@"..\..\bin") // running in VS
                       ? @"..\..\Scenarios" : @"..\Raven.Scenarios\Scenarios";
        }
    }

    public static IEnumerable<object[]> ScenariosWithoutExplicitScenario
    {
        get
        {
            foreach (var file in Directory.GetFiles(ScenariosPath,"*.saz"))
            {
                if (typeof(Scenario).Assembly.GetType("Raven.Scenarios." +
                          Path.GetFileNameWithoutExtension(file) +"Scenario") != null)
                    continue;
                yield return new object[] {Path.GetFileNameWithoutExtension(file)};
            };
        }
    }
}

There are two reasons why I am ignoring explicit scenarios. Adding a class for a specific scenario allows me to run the scenario in the debugger, and also allow me to selectively skip certain scenarios if I need to.

Scenario.Execute is fairly involved, it parse the Fiddler’s saz file, build appropriate request and compare to the expect response, it is also smart enough to handle changing things like ETags and pass them along.

The end result is that I can very easily add new scenarios as I get new features to that requires tests.

posted @ Friday, March 05, 2010 2:09 AM | Feedback (8)

Thursday, March 04, 2010 #

Is select (System.Uri) broken?

I can’t really figure out what is going on!

Take a look:

image

The value :

http://localhost:58080/indexes/categoriesByName?query=CategoryName%3ABeverages&start=0&pageSize=25

And the problem is that I can’t figure out why calling this once would fail, but calling it the second time would fail. That is leaving aside the fact this looks like a pretty good url to me.

Any ideas? This is perfectly reproducible on one project, but I can’t reproduce this on another project.

Updates:

  • This is System.Uri
  • The issue that it fails the first time, and works the second!
  • The exception is:
  • System.ArgumentNullException: Value cannot be null.
    Parameter name: str
       at System.Security.Permissions.FileIOPermission.HasIllegalCharacters(String[] str)
       at System.Security.Permissions.FileIOPermission.AddPathList(FileIOPermissionAccess access, AccessControlActions control, String[] pathListOrig, Boolean checkForDuplicates, Boolean needFullPath, Boolean copyPathList)
       at System.Security.Permissions.FileIOPermission..ctor(FileIOPermissionAccess access, String path)
       at System.Uri.ParseConfigFile(String file, IdnScopeFromConfig& idnStateConfig, IriParsingFromConfig& iriParsingConfig)
       at System.Uri.GetConfig(UriIdnScope& idnScope, Boolean& iriParsing)
       at System.Uri.InitializeUriConfig()
       at System.Uri.InitializeUri(ParsingError err, UriKind uriKind, UriFormatException& e)
       at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
       at System.Uri..ctor(String uriString)
       at Raven.Scenarios.Scenario.GetUri_WorkaroundForStrangeBug(String uriString) in C:\Work\ravendb\Raven.Scenarios\Scenario.cs:line 155

  • This is a console application.

Okay, I can reproduce this now, here it how it got there:

public class Strange : MarshalByRefObject
{
    public void WTF()
    {
        Console.WriteLine(AppDomain.CurrentDomain.SetupInformation.ConfigurationFile);
        new Uri("http://localhost:58080/indexes/categoriesByName?query=CategoryName%3ABeverages&start=0&pageSize=25");
    }
}

public class Program
{
    private static void Main()
    {
        var instanceAndUnwrap = (Strange) AppDomain.CreateDomain("test", null, new AppDomainSetup
        {
            ConfigurationFile = ""
        }).CreateInstanceAndUnwrap("ConsoleApplication5", "ConsoleApplication5.Strange");
        instanceAndUnwrap.WTF();
    }
}

That took some time to figure out.

The reason that I got this issue is that I am running this code as part of a unit test, and the xUnit seems to be running my system under the following conditions.

posted @ Thursday, March 04, 2010 9:05 PM | Feedback (20)

Scenario based testing in Rhino DivanDB

Here is a unit test testing Rhino DivanDB:

image

Here is a test that tests the same thing, using scenario based approach:

image

What are those strange files? Well, let us take a pick at the first one:

0_PutDoc.request 0_PutDoc.response

PUT /docs HTTP/1.1
Content-Length: 283

{
    "_id": "ayende",
    "email": "ayende@ayende.com",
    "projects": [
        "rhino mocks",
        "nhibernate",
        "rhino service bus",
        "rhino divan db",
        "rhino persistent hash table",
        "rhino distributed hash table",
        "rhino etl",
        "rhino security",
        "rampaging rhinos"
    ]
}

HTTP/1.1 201 Created
Connection: close
Content-Length: 15
Content-Type: application/json; charset=utf-8
Date: Sat, 27 Feb 2010 08:12:08 GMT
Server: Kayak

{"id":"ayende"}

Those are just test files, corresponding to the request and the expected response.

RBD’s turn those into tests, by issuing each request in turn and asserting on the actual output. This is slightly more complicated than it seems, because some requests contains things like dates, or generated guids. The scenario runner is aware of those and resolve those automatically. Another issue is dealing with potentially stale requests, especially because we are issuing requests on the same data immediately Again, this is something that the scenario runner handles internally, and we don’t have to worry about it.

There are some things here that may not be immediately apparent. We are doing pure state base testing, in fact, this is black box testing. The scenarios define the external API of the system, which is a nice addition.

We don’t care about the actual implementation, look at the unit test, we need to setup a db instance, start the background threads, etc. If I modify the DocumentDatabase constructor, or the initialization process, I need to touch each test that uses it. I can try to encapsulate that, but in many cases, you really can’t do that upfront. SpinBackgroundWorkers, for example, is something that is required in only some of the unit tests, and it is a late addition. So I would have to go and add it to each of the tests that require it.

Because the scenarios don’t have any intrinsic knowledge about the server, any require change is something that you would have to do in a single location, nothing more.

Users can send me a failure scenarios. I am using this extensively with NH Prof (InitializeOfflineLogging), and it is amazing. When a user runs into a problem, I can tell them, please send me a Fiddler trace of the issue, and I can turn that into a repeatable test in a matter of moments.

I actually thought about using Fiddler’s saz files as the format for my scenarios, but I would have to actually understand them first. :-) It doesn’t look hard, but flat files seemed easier still.

Actually, I went ahead and made the modification, because now i have even less friction, just record a Fiddler session, drop it in a folder, and I have a test. Turned out that the Fiddler format is very easy to work with.

posted @ Thursday, March 04, 2010 12:00 PM | Feedback (4)

Challenge: Robust enumeration over external code

Here is an interesting little problem:

public class Program
{
    private static void Main()
    {
        foreach (int i in RobustEnumerating(Enumerable.Range(0, 10), FaultyFunc))
        {
            Console.WriteLine(i);
        }
    }

    public static IEnumerable<T> RobustEnumerating<T>(
        IEnumerable<T> input,Func<IEnumerable<T>, IEnumerable<T>> func)
    {
        // how to do this?
        return func(input);

    }

    public static IEnumerable<int> FaultyFunc(IEnumerable<int> source)
    {
        foreach (int i in source)
        {
            yield return i/(i%2);
        }
    }
}

This code should not throw, but print:

1
3
5
7
9

Can you make this happen? You can only change the RobustEnumerating method, nothing else in the code

posted @ Thursday, March 04, 2010 3:54 AM | Feedback (98)

Wednesday, March 03, 2010 #

Git is teh SUCK

Today, I had two separate incidents in which my git repository was corrupted! To the point that nothing, git fsck or git reflog or git just-work-or-i-WILL-shoot-you didn’t work.

The first time, there was no harm done, I just cloned my repository again, and moved on. The second time that it happened, it was after I had ~10 commits locally that weren’t pushed. I had my working copy intact, but I didn’t want to lose the history. I asked around, and got a couple of suggestion to move to mercurial instead, because git has no engineering behind it.

Based on that feedback, I …

Oh, wait, it isn’t this sort of a post.

What I actually did was setup Process Monitor and watched what git.exe was actually doing. I noticed that it was searching for a .git/objects directory, and couldn’t find it anywhere in the path. Indeed, looking there myself, it appeared clear that there was no objects directory under the .git dir. And checking in other repositories showed that they had it. So now I knew why, but I still had no idea who the #*@# decided to randomly @#$%( my repository, totally derailing my productivity.

That is where having multiple personalities come in handy, he did it. The one that isn’t writing this blog post, at some point during the day, there was a need to zip the repository and send it somewhere. Since the working copy is full of crap, that idiot issued the following:

ls -R obj | rm –F

ls -R bin | rm –F

(Not the exact commands, the idiot used the UI to do a search & delete).

You can guess the following from there. At this point, having come to this astounding discovery, I heroically went to the recycle bin, found the objects directory there, and rescued it! All is well, except that there is still a thrashing for uncommon stupidity owed.

And remember, it wasn’t me, it was the other one who did that!

And yes, the spelling mistake in the title is intentional.

posted @ Wednesday, March 03, 2010 7:55 AM | Feedback (17)

Tuesday, March 02, 2010 #

DotNetRocks on Domain Specific Languages

Last week I recorded a DNR session about Domain Specific Languages.

I also talked just a bit about writing your own DSL, some challenges that I run into since the book, the Boo language and why it is suitable for DSLs and how the entire process works.

Looking forward for your comments.

posted @ Tuesday, March 02, 2010 8:59 PM | Feedback (5)

Where do git repositories go when they die?

My RDB repository started giving me this error;

fatal: Not a git repository (or any of the parent directories): .git

I don’t think that I did anything to it, but it is still dead.

image

Any ideas how to recover this?

Update: Found why Git doesn't like my repository, it doesn't have .git\objects, but I have no idea where it could have gone to… or why.

posted @ Tuesday, March 02, 2010 3:33 PM | Feedback (6)

Rhino DivanDB – A full coding sample – Embedded

Rhino Divan DB is going to come in at least two forms, embedded, and remote. The following is a full example of starting DivanDB, defining a view, adding some documents and then querying the database.

Note that here we want to ensure that we get the most up to date result, so we refuse to accept a potentially stale query.

image

This outputs the right result, by the way :-)

posted @ Tuesday, March 02, 2010 12:00 PM | Feedback (25)