Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,640
|
Comments: 51,260
Privacy Policy · Terms
filter by tags archive
time to read 7 min | 1352 words

System.DateTime is a value type in .Net, which means that it can never be null. But what happens when you have a nullable date time in the database, and you load it into a DateTime type?

Consider this simple example. Mapping:

<property name="UpdatedDate" nullable="True"  column="updated_date" type="DateTime" />

Property:

public DateTime UpdatedDate 
{
      
get { return m_dateTime; }
     
set { m_DateTime = value; }
}

When NHibernate loads a null value from the database, it cannot put a null in the UpdatedDate property, the CLR doesn't allow it. What happens is that the UpdatedDate property is set to the default DateTime value, in this case: 01/01/0001.

This can cause two major issues down the road. The first is that 01/01/0001 is not a valid date in SQL Server, so when you try to save the value, it will throw an exception. The second is that because of this issue, when NHibernate needs to track changes, it will check if null != 01/01/0001, and it will turn out that yes, this entity (which we never touched) has changed.

This can cause an extra update (and thus an exception) which can cause some fairly significant head scratching. The solution is simple, you need to let NHibernate know what to do with null values. This can be done by simply using a nullable type, such as:

public DateTime? UpdatedDate 
{
      
get { return m_dateTime; }
     
set { m_DateTime = value; }
}

Or by using the Nullables.dll library for 1.1, and specifying the correct type in the mapping:

<property name="UpdatedDate" column="updated_date" type="Nullables.NHibernate.NullableDateTimeType, Nullables.NHibernate" />

With this property:

public Nullables.NullableDateTime UpdatedDate 
{
   
get { return m_dateTime; }
   set { m_DateTime = value; }
}

Thanks for Sheraz Khan, for finding out and brining this to my attention, since then, I have run into the issue a couple of times, and I hope that if I blog about it, I will remember to match nullabilities.

Code smell

time to read 2 min | 370 words

Why does it have to be like this?

if($F('<%= PolicyDescription.ClientID %>')==

    '<asp:Literal runat="server" Text="<%$ Resources:PolicyResources, DescriptionNotFound %>"/>')

{

    alert('<asp:Literal runat="server" Text="<%$ Resources:PolicyResources, EnterValidPolicyNumberOrDescription %>"/>');

    return;

}

 

time to read 25 min | 4844 words

I just had a head-scratching, hair-pulling, what-the-hell-is-going-on bug. Basically, a classic producer / consumer issue. Each work item is composed of several sub-items, which should always be processed as a single unit (exactly one time). The problem was that I suddenly started to get duplicate processing of the same work item, but with different sub-items each time.

I went over the code with a comb, and I couldn't see anything wrong, I investigated the database, and everything was fine there as well. I knew that I was in trouble when I considerred going down to the SQL Server protocol level and check if somehow network problems were causing this issue.

Here is a simplified version of the producer (and yes, I would never write this kind of code for production, test, or anything but a short and to the point demo). As you can see, it merely generate 500 records into a table.

private static void Producer()

{

       SqlConnection connection = new SqlConnection(connectionString);

       int count = 0;

       while (true)

       {

              connection.Open();

              SqlTransaction sqlTransaction = connection.BeginTransaction(isolationLevel);

              for (int i = 0; i < 500; i++)

              {

                     SqlCommand sqlCommand = connection.CreateCommand();

                     sqlCommand.Transaction = sqlTransaction;

                     sqlCommand.CommandText = "INSERT INTO t (Id) VALUES(@p1)";

                     sqlCommand.Parameters.AddWithValue("@p1", count);

                     sqlCommand.ExecuteNonQuery();

                     sqlCommand.Dispose();

              }

              sqlTransaction.Commit();

              Console.WriteLine("Wrote 500 records with count " + count);

              count += 1;

              connection.Close();

       }

}

And here is the consumer, which read from the table, and ensure that it reads in batches of 500:

private static void Consumer()

{

       SqlConnection connection = new SqlConnection(connectionString);

       while (true)

       {

              connection.Open();

              SqlTransaction sqlTransaction = connection.BeginTransaction(isolationLevel);

              SqlCommand sqlCommand = connection.CreateCommand();

              sqlCommand.Transaction = sqlTransaction;

              sqlCommand.CommandText = "SELECT COUNT(*) FROM t GROUP BY id";

              SqlDataReader sqlDataReader = sqlCommand.ExecuteReader();

              if (sqlDataReader.RecordsAffected != -1)

                     Console.WriteLine("Read: {0}", sqlDataReader.RecordsAffected);

              while (sqlDataReader.Read())

              {

                     int count = sqlDataReader.GetInt32(0);

                     Console.WriteLine("Count = {0}", count);

                     if (count != 500)

                           Environment.Exit(1);

              }

              sqlDataReader.Dispose();

              sqlCommand.Dispose();

              sqlTransaction.Commit();

              connection.Close();

       }

}

Note that I have painted the isolationLevel red in both. Here is the code that run both methods:

private static void Main()

{

       Delete();

       new Thread(Producer).Start();

       new Thread(Consumer).Start();

}

If I set isolationLevel to ReadCommited or RepeatableRead, this consistently fails within a couple of seconds, it manage to do partial read of the records that were inserted by the consumer. I I set the isolationLevel to Serializable or Snapshot, it behaves as expected.

I may be missing something, but I would expect ReadCommited to only allow the second transaction to read... well, commited rows. Just to point out, the real scenario doesn't involve aggregation, and I am seeing the same issue. I suppose that the new records were commited (and thus made visible) when the query already scanned that part of the table, and thus it missed some one the rows, while snapshot and serializable force either a static image all the way through or waiting till the end.

Any comments?

time to read 3 min | 514 words

Jeremy Miller has posted about the DRY principal, as well as the Wormhole anti pattern.  One of the pain points that was brought up was adding a field to the screen. We are currently extremely fluid with regard to the entity design, this means that new fields keep popping up and old fields are going away. I am also extremely fond of ReSharper, and I loath tables like "Cutsomers". The spelling mistake is intentional here, but I have seen systems that have this in productions. This make it fun to work with the system.

The process we use is to have a single authoritive source for our model. That model is the entities. Our entities are Active Record classes (but they do not inherit from ActiveRecordBase), so I get to specify what they do inside the code. Part of the build process would generate the database from our entities.

Assume that we are working with comments, and we have a new requirement, to show the commenter's IP. What we would need to do is:

  • Add the IP Property to Comment (then run the build):
    [Property(NotNull = true)]
    public virtual string Ip { get { ... } set { ... } }
  • When we create the comment, add the IP Address of the current user.
  • When we display the comment, show the IP Address as well.

In practice, we often add several fields at the same time, but it is the same principal. The only pain point in this process is the data, we have test data that needs to be modified when we re-generate the database, which can be a pain to deal with. I am searching for an answer in this matter, by the way.

Another thing that I had some experiance with is moving stuff from configuration to code. There should be only one truth, but the instict to put it in XML is the wrong approach, IMO. There are not tools to refactor XML, while code is extremely flexible under the right hands. I had success in pharsing code and generating additional code from it. In that case, it was to generate perfomance counters create/use code, from an abstract class that contain just their definitions.

At any rate, what I am trying to say is that you should aspire to a single truth as well, and you should carefully consider what kind of a truth it is. <Truth/> is not an easy apparoch.

time to read 3 min | 482 words

Update: This article is fron 1999, which I somehow missed, not news by far. Still wrong, even for its time, though.

This really annoys me, the author of the article tries to preach an OO methodology for writing UI. In general, I don't have an issue with that, but his arguments contains this:

An object-oriented solution tries to encapsulate those things [adding a field to a screen] that are likely to change in such a way that a change to one part of the program won't impact the rest of the program at all. For example, an object-oriented solution to the problems I just discussed requires a Name class, objects of which know how to both display and initialize themselves.

I am not sure where to begin. To start with, an entity should not have ties to the UI. Then we have this dream vision about editing a single class, and suddenly all the relveant screen sprout the new field, and we can start working on it. UI doesn't work like this. It doesn't work like this because adding a new field to screen can horribly break the screen. It might push the Ok/Cancel button outside of the visible realm of the screen, it might break the tab order, it will definately break the flow of the screen.

I am all for OO and encapsulation, but this is not a problem that you can generalize.

This just gets better when you keep on reading:

The other bugaboo that I want to put to death is the notion of different views into the same object, usually characterized by the question: "Suppose you need to display this data as a pie chart over here and a grid over there? How can you do this if the object displays itself?"

Now let's get real. How often in your work has this problem actually come up?

Um, all the time? The simplest version is showing items in a list and then in a details view, but I just finished talking about the different views that I present to the various aspects in my systems. Beyond that, I have different views for different screens. In one screen, I am showing an Employee's personal data, in another the Employee's salary data, etc.

time to read 2 min | 238 words

In the Open Letter to Scott Guthrie David talks about copying the features from OSS projects, and bring MsBuild as an example. MsBuild is a project that reached feature parity with Nant, there is basically no technical reason that I know to prefer on over the other. NAnt has more tasks, but MsBuild has quite a few as well.

The problem is that feature parity is just that, not enough. Both Nant and MsBuild are xml based programming languages. Read the last sentence again, and you can probably tell what I don't like in both. A much better approach to this would have been to keep the same targets & actions model, but to skip the XML and go with an imperative language for the actual implementation.

Xml is nice because you can pharse it easily, but it is not something that is meant for human consumption. A better alternative would have been something like Rake, but on .Net. (About two years ago, I played with porting NAnt to Boo. Not the project itself, but the syntax. I got to the point where it was very readable and left it, don't think that there is anything left of the project anymore, although the sources should be up somewhere.) At any rate, this would have been a significant improvement over the existing technology, not just another feature-parity product.

time to read 2 min | 224 words

Jermey Millier talks about lines of codes in tests vs. productions, and he raises an important issue with regard to the tests:

I say that mainly because unit test code is composed almost entirely of stand alone methods.  They don't interrelate or interract with each other.  The responsibility of a unit test is almost always the same:  setup data, do something, check the results.  The hardest thing about coding is deciding what to code, not the mechanical act of typing. 

This worth repeating. The tests are not a good place to show off how smart you can be with regard to design patterns and triple indirect factories. About the only refactoring that I would do in the tests (aside from rename method) is to refactor the creation of the test data, even then, the names are something like: CreateCustomer_WithTwoOrders_ButWithNoCredit(), I am asserting on the data from this method, so it has better be clear, and easy to find.

A good test, assuming that I understand the technology that is tested, and the testing/mocking framework, so reveal its intent, preferably via its name, but from the code as well.

time to read 3 min | 456 words

What works for the database doesn't work for an entity, and what works for an entity is a really bad idea if you are trying to call a web service, a web service data is really not the right thing when you want a report. Your customer has a Birthday field, which I couldn't care less. I really want to track the last time a user signed in, but you don't have that information avialable to give me. You really don't want to give the employee salary data to the employee listing grid, etc.

I see a lot of dicussion about trying to have a single Entity (in the pure term: something with business meaning) that map to every representation and any scenario. The needs, constraints and requirements for each fo those are very different. Trying to make them all fit in one representation makes this a very bad representation.

Let us consider a simple entity, the Employee. It contains data such as name, date of birth, hire date, salary, etc. On the database side, it may be represented as a temporal table for each property, linked together by the employee id. In the business logic, I have an EmployeeSnapshot, which is not temporal, but is valid for a certain date only. On the UI, I have only a limited view of the employee (with salary information, for instnace), and a web service exposes additional employee information (hierarchies), to be consumed by the CRM (for permissions).

For this simple entity, I already have four different representation. I might be able to satisfy them all within a single class, but what does it gives me? A heavy weight hybrid that I can't really change without affecting all parts of the system.

In my current project, I have:

  • Policy - Active Record Entity - Business logic
  • AjaxianPolicy - Simple view of the fields that I want to show on the screen - Strictly DTO

When I get to the part that I need web services for, I will also add PolicyMessage class.

On the surface, this seems to break the DRY principal. I now exposed the fact that Policy has a Id property in several places. I don't think that this apply here, it may looks like repetition and more work, but I end up with simpler model to work with, and I can optimize each individual case independently. And by optimize I do not mean perfromance, I mean ease of use, compatability, etc.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. API Design (10):
    29 Jan 2026 - Don't try to guess
  2. Recording (20):
    05 Dec 2025 - Build AI that understands your business
  3. Webinar (8):
    16 Sep 2025 - Building AI Agents in RavenDB
  4. RavenDB 7.1 (7):
    11 Jul 2025 - The Gen AI release
  5. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
View all series

Syndication

Main feed ... ...
Comments feed   ... ...