Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,597
|
Comments: 51,224
Privacy Policy · Terms
filter by tags archive
time to read 3 min | 532 words

Hi, my name is Oren, and I am stupid. I forgot that select() ain't broken holds even if I wrote the select().

Let me tell you the story. We are using Rhino ETL to do ETL stuff, and one of the tasks that we had to do was importing an 80 MB file into the CRM, including all sorts of transformations and joins along the way. Until we got to that part, it was working very well, but with the 80 MB file, it read that and then appeared to be hung.

Rhino ETL is a new project, and it is heavily multi threaded. Since the only way that I know to test multi threading is by test of time, I wasn't overly confident that it got it perfectly right, and it looked like the 80 MB file was triggerring some threading issues. After debugging the issue futilely for a while, and after careless applying of lock() with reckless abandon also failed to do the trick, I decided that I had no alternatives, but to rip out the homegrown threading solution that I had there in favor of something with more robustness.

I choose Retlang for that, and I am really happy with the library, but that isn't the story that I have to tell today. As any who dealt with threading knows, they are complex, nasty and hard. Trying to follow ten threads at the same time in not helpful by any meaning of the word. And trying to replace the threading solution with a radically different model is hard. I have a post or two on big refactoring as a result of that, but again, this isn't the story that I have to tell.

After basing Rhino ETL on Retlang, and getting enough of the tests to pass in a satisfactory manner, I turned back to the real issue at hand, and tried to run the 80 MB file through the new Rhino ETL.

It. Showed. The. Exact. Same. Problem.

At that point, I was on the phone, speaking with the hiring managers about getting a thousand monkeys to type the information in, but I kept tapping the keyboard, probably out of a force of a habit. I couldn't believe that two such radically different implementations would exhibit the same problem. I added logging and tracked down everything. I even added color coded logs, just to be sure that I am seeing everything.

I tracked it down to a log statement that basically said:

Join CustomerWithFoodTypeLookup out 0 rows, Left 103,123 rows, Right 0 rows.

Take a look at this, and try to tell me what the problem is. 

We are joining an empty set to a non empty set, giving... an empty set.

The fix, if you can call it that, was to change the join to:

if Left.FooTypeId = Left.OldFoodTypeId or Right.FooTypeId is null

I rewrote the threading layer because I couldn't track down the root cause of a inner join instead of an left outer join. When I found that, I literally got up and started choking the developer responsible for the script. Then I went and hit myself on the head, repeatedly.

time to read 2 min | 321 words

I am reading the Erlang book right now (post coming as soon as I finish it), but so far I have managed to grasp the idea of heavy use of recursion and functions for everything. It is very different from imperative languages, but I think I can make the shift.

Joe Functional, however, may have some trouble on the other way around. The following were found on production (but I'll not comment any further on their origin):

public static int CharCount(string strSource, string strToCount, bool IgnoreCase)
{
    if (IgnoreCase)
    {
        return CharCount(strSource.ToLower(), strToCount.ToLower(), true);
    }
    return CharCount(strSource, strToCount, false);
}

Let us ignore the inconsistent naming convention, the misleading function name are really think what this does...

public static string ToSingleSpace(string strParam)
{
    int index = strParam.IndexOf("  ");
    if (index == -1)
    {
        return strParam;
    }
    return ToSingleSpace(strParam.Substring(0, index) + strParam.Substring(index + 1));
}

The pattern continue, but at least I can hazard a guess about what this does, but I wouldn't want to pipe this post through it.

public static string Reverse(string strParam)
{
    if ((strParam.Length != 1) && (strParam.Length != 0))
    {
        return (Reverse(strParam.Substring(1)) + strParam.Substring(0, 1));
    }
    return strParam;
}

"Reverse a string" is something that I like to ask in interviews, but I don't suppose that this implementation will be found sufficient.

time to read 2 min | 216 words

As I said, after the email exchange with Babylon, I was left with a sour taste regarding them. I have restored my key from a backup, and I have put the episode behind me, until a few minutes ago, when I go this dialog on my system. I was shocked, because I couldn't believe that someone like Babylon would get to these kind of tactics.

As you can see (the dialog below), I am a registered  user, so it is not "reminding" me that I need to register. This is an invasive advertising on my desktop and without my permission! 

image

There is a word for that, and that is called a spyware! 

You may notice the link at the bottom of the dialog, where it says that I can opt out by going to their site. I did, and it too me here (with a bunch of query strings that I stripped) where it asked me for a CATPCHA to un-register from this!

 WTF!?!?!

image

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

.

time to read 2 min | 372 words

Just got (after three weeks) a response to my issue with them:

The confirmation letter you received initially displayed this message : "This page is your proof of purchase. Please print and save it for future reference."
 
The responsibility of keeping track of your license details is yours.
 
Keeping track of user license  and hosting it in our data base, consumes a lot of storage space, and demands the attention of a large crew of people, dealing with both software and hardware, and as I wrote before, this task is not free of charge.

So, basically they are saying it is my fault (correct) and that I should deal with it (wrong). I am talking wrong in making me a happy customer. Every other company that I did business with was more than happy to resend me the license details, sometimes several years after the fact. The fact that Babylon is treating a customer like this is amazing.

Hiding behind "it cost a lot of money" is a sad excuse for someone who know how this stuff works. My response:

I am aware of that, but you know what, that is not a nice way of getting repeated business.
And don't try to tell me that keep the user data cost a lot of money that you wouldn't spend anyway.
Frankly, I expect response times that are greater than three weeks and a far more civil treatment as a paying customer.

 

 

time to read 4 min | 686 words

I would preface by saying that I am using Web Forms at my current project because the client insisted. They are paying, they get to make the decisions, end of story.

This is a rant, plain and simple. I don't do those often, but I have too much annoyances lately to not do it.

I have several years of working with Web Forms, in projects of various sizes, I have committed my Sins Against the GridView, and have memorized the page lifecycle, I have walked the path of Page.ProcessRequest, and learnt at a distant monastery about the mysteries of the ViewState, I have studied the intricate play of control's events and the secrets of Data Binding.

I get Web Forms. I know how they work.

I have detailed elsewhere what I don't like about WebForms (trying to be stateful, complex, lying, etc). Today it had gotten to the point when I have given up on trying anymore. This was after we have been hitting, repeatedly, obstacle after obstacle with using Web Forms.

Today we had no less than three such incidents, all of them related to various intricate ways that the whole messy pipeline is working together to create a results that is more complex than the sum of all its parts and the legacy systems on the next door. Today was far from being an exception.

We didn't even try to do anything complex. Take a bunch of objects, bind them to a GridView, rinse-repeat. There was a set of pages that were almost literally forms over data, exactly the use case for WebForms, or so I have thought. In a set of about a dozen pages, with really minor differences between one another, we had at least one serious problem at each page. Invariably it was related to some weird way WebForms expected me to work.

In order to get the data from the request, I need to bind the data again in page load, and in a later event, I need to process it, after the Web Forms engine had magically filled the values from the request. And woe upon you if you dare to mix a data source with a call to this.DataBind(); for you are heretic and deserve to lose the user's input without warning, and only by the grace of the good lord will you be saved.

At one point today I resorted to an http handler and a set of calls to Response.Writer(), because that was easier than trying to get the functionality from the WebForms engine.

At this point, I feel like I am trying to build on top of a house of cards, with the slightest movement will send everything crumbling down. I found a bug that was totally my fault today, and we had about five minutes of hilarity about finally finding something that we could easily fix. While this is a very instructive in terms of knowing how the internals of the WebForms engine works, it is a complete waste of time otherwise. When I need to be absolutely aware, at all times, about all the implications of what is going on all around me, I can't really do anything at all.

I have learned my lessons from past projects, thou shall not try to be smart with the WebForms framework, it will be smart right back at you and bit you in the ass. I am not trying to do anything smart here. I am literally scaling down everything that I do on the UI layer to the set of what is wholly approved and blessed by Microsoft. I am not even trying to workaround technical limitation, I am just trying to build working software.

At this point, I am more willing to debug Dynamic Proxy's IL generation engine than my pages, because at least with Dynamic Proxy, I have some control over what is going on, hard as it may be. The WebForms engine gives the illusion of control, but it does things in completely arbitrary ways, contrary to the principal of least surprised and common sense.

I am sick of it.

time to read 4 min | 682 words

Assume that I have this piece of code: 
public interface InterfaceWithExplicitImpl<T>
{
   IEnumerator<T> GetEnum1();
}
Now, check the picture. This is 100% repreducable, but only under very specific set of scenarios (basically, running System.Reflection.Emit stuff). I am not sure how exactly I am supposed to handle this sort of an issue.
 
WTF
 
Here is the code to repreduce the issue:
You need to reference Dynamic Proxy 2, but this code produce stuff that I would bet is flat out impossible.
 
   1:   class Program
   2:   {
   3:       static void Main(string[] args)
   4:       {
   5:           Console.WriteLine("Before it has {0} methods", typeof(MyInterfaceWithExplicitImpl<int>).GetMethods().Length);
   6:           
   7:           ProxyGenerator generator = new ProxyGenerator(new PersistentProxyBuilder());
   8:           generator.CreateInterfaceProxyWithoutTarget(typeof(MyInterfaceWithExplicitImpl<int>), new StandardInterceptor());
   9:           generator = new ProxyGenerator(new PersistentProxyBuilder());
  10:           generator.CreateInterfaceProxyWithoutTarget(typeof(MyGenericInterface<object>),
  11:                                                           new Type[] { typeof(MyInterfaceWithExplicitImpl<int>) },
  12:                                                           new StandardInterceptor());
  13:   
  14:   
  15:           Console.WriteLine("After it has {0} methods", typeof(MyInterfaceWithExplicitImpl<int>).GetMethods().Length);
  16:       }
  17:   }
  18:   
  19:   public interface MyInterfaceWithExplicitImpl<T>
  20:   {
  21:       IEnumerator<T> GetEnum1();
  22:   }
  23:   
  24:   public interface MyGenericInterface<T> where T : new()
  25:   {
  26:       T DoSomething(T t);
  27:   }
time to read 1 min | 163 words

I just tried to visit this URL: http://pauloquicoli.spaces.live.com/Blog/ in FireFox, I get this message:

Sorry, we are unable to complete your task at this time. The Windows Live Spaces service is experiencing difficulties. Please try your task again later.

In IE, it works as expected. WTF?!

Update:

Looks like this was the issue, I had this cookie
Name: sc_stgclstbl_107
Value:
YjdmOTYxYjQyMjYyNWJkZjE6Rzk2bmRvVzA3WlF6UldO
YlZiNWpnZC9uSkZ2SzVhWHRkSW9iK2RR
T3E4R2lTbUM4cytUSGdKUFZNaVJpY2tkSTYrZC9takRvc09RPQ

The value is not valid base64, appernatly, which is what caused the issue.

For Experts Only

time to read 2 min | 349 words

There are some things that programmers really shouldn't do. I was once called to help figuring out why a batch process would run for a few hours at 100%. The previous guy has claimed that this was a natural occurance of the task at hand (loading XML file to DB), and that he had already optimized it as far as possible.

int count = 0;
XmlDocument xdoc = new XmlDocument();
xdoc.Load(filePath);

foreach(XmlNode node in xdoc.SelectNodes("data/row")) 
{
   count++;
   new Thread(delegate(object state){
        XmlNode n = (XmlNode)state;
        using(SqlConnection con = new SqlConnection("... "))
        using(SqlCommand cmd = con.CreateCommand ())
        {
            cmd.CommandText = "INSERT INTO .... ";
            cmd.ExecuteNonQuery();
        }
        Interlocked.Decrement(out count);
   }).Start(node);
}

while(count!=0); 

After de-optimizing the code, I managed to get 10,000% performance improvement, and you could actually use the server for more than a single task.

FUTURE POSTS

  1. Goodbye Hibernating Rhinos, Hello RavenDB LTD - 4 minutes from now
  2. Replacing developers with GPUs - 2 days from now
  3. Memory optimizations to reduce CPU costs - 4 days from now
  4. AI's hidden state in the execution stack - 7 days from now

There are posts all the way to Aug 18, 2025

RECENT SERIES

  1. RavenDB 7.1 (7):
    11 Jul 2025 - The Gen AI release
  2. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
  3. Webinar (7):
    05 Jun 2025 - Think inside the database
  4. Recording (16):
    29 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  5. RavenDB News (2):
    02 May 2025 - May 2025
View all series

Syndication

Main feed ... ...
Comments feed   ... ...
}