Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,608
|
Comments: 51,239
Privacy Policy · Terms
filter by tags archive

The NIH dance

time to read 2 min | 356 words

I started thinking about all the type of stuff that I had to write or participated at, and I find it… interesting.

  1. Database – Rhino.DivanDB (hobby project).
  2. Data Access – Multitude of those.
  3. OR/M – NHibernate, obviously, but a few others as well.
  4. Distributed caching systems – NMemcached – several of those.
  5. Distributed queuing systems – Rhino Queues actually have ~7 different implementations.
  6. Distributed hash table – Rhino DHT is in production.
  7. Persistent hash tables – Rhino PHT, of course, but I actually had to write a different implementation for the profiler as well.
  8. Mocking framework – Rhino Mocks, obviously.
  9. Web frameworks– I am referring to MonoRail, although I only dabbled there, to be truthful. Rhino Igloo was a lot of fun, too, if only because I had to.
  10. Text templating language – Brail
  11. Inversion of Control containers – Windsor, and a few custom ones.
  12. AOP – I actually built several implementation, the most fun was with the Code DOM approach :-)
  13. Dynamic Proxies & IL weaving – Castle Dynamic Proxy, not the recommended way to learn IL, I must say.
  14. CMS systems – several, but I really like Impleo and the concept behind it.
  15. ETL system – Took 3 times to get right.
  16. Security system – Rhino Security was fun to design, and quite interesting to implement.
  17. Licensing framework – because trying to buy one commercially just didn’t work.
  18. Service Bus – which I consider to be one of my best coding efforts.
  19. CI Server – so I can get good GitHub integration.
  20. Domain Specific Language framework – well, I did write the book on that :-)
  21. Source control server – SvnBridge

I haven’t written a testing framework, though.

I am probably forgetting a lot of stuff, actually…

time to read 4 min | 716 words

A recent change in the profiler has resulted in the following dialog showing up whenever you close the application on x64 Vista/Win7 machines.

image

Just to be clear, I am not using flash in any way, but something is triggering this check.

Basically, I think that somewhere a call like the one described here is made. Checking for the presence of flash, and that is what triggers the PCA dialog. That makes a sort of sense, mostly because we now shell out to IE to do some stuff for us (we use WPF’s builtin WebBrowser control).

Now, the documentation for that says:

PCA is intended to detect issues with older programs and not intended to monitor programs developed for Windows Vista and Windows Server 2008. The best option to exclude a program from PCA is to include, with the program, an application manifest with run level (either Administrator or as limited users) marking for UAC. This marking means the program is tested to work under UAC (and Windows Vista and Windows Server 2008). PCA checks for this manifest and will exclude the program. This process applies for both installer and regular programs.

The problem, however, is that even after I included the $@#$(@# manifest, it is still showing the bloody dialog.

I find it quite annoying. Here is the custom manifest that comes with the profiler.

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
  <v3:trustInfo xmlns:v3="urn:schemas-microsoft-com:asm.v3">
    <v3:security>
      <v3:requestedPrivileges>
        <v3:requestedExecutionLevel level="asInvoker" uiAccess="false" />
      </v3:requestedPrivileges>
    </v3:security>
  </v3:trustInfo>
</assembly>

As far as I can see, it should work.

Any ideas?

And twitter came to the rescue and told me that I need to specify that I am compatible on Win7, the current manifest, which fixes the issue, is:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
  <v3:trustInfo xmlns:v3="urn:schemas-microsoft-com:asm.v3">
    <v3:security>
      <v3:requestedPrivileges>
        <v3:requestedExecutionLevel level="asInvoker" uiAccess="false" />
      </v3:requestedPrivileges>
    </v3:security>
  </v3:trustInfo>
  <compatibility xmlns="urn:schemas-microsoft-com:compatibility.v1">
    <application>
      <!--The ID below indicates application support for Windows Vista -->
      <supportedOS Id="{e2011457-1546-43c5-a5fe-008deee3d3f0}"/>
      <!--The ID below indicates application support for Windows 7 -->
      <supportedOS Id="{35138b9a-5d96-4fbd-8e2d-a2440225f93a}"/>
    </application>
  </compatibility>
</assembly>

I would like to Paul Betts for handing me the answer in less than 3 minutes.

time to read 4 min | 691 words

I run across the following quote a while ago, and I found it quite interesting.

“As to the methods there may be a million and then some, but principles are few. The man who grasps principles can successfully select his own methods. The man who tries methods, ignoring principles, is sure to have trouble.”

- Ralph Waldo Emerson (1803-1882)

I have been programming, in one form or another, for about fifteen years, but I can put my finger on the precise moment in which I moved from a mere dabbler to a professional. That was in 1999, and I decided that I had enough of toying with Pascal, VB6 & Java Applets. It was the height of the bubble, and I wanted to learn just enough to be able to get a job doing something that I enjoyed. I had about a year opened to me, and I registered myself to a C/C++ course in a local college.

In hindsight, that was one of the best things that I have ever done. That course taught me C and pointers, and then C++ and OO. I also introduced me to concepts that I have been using ever since. Admittedly, I don’t want to look at any of my code from that time period, but that is probably a good thing :-) The most important part of the course was that it taught me how computers work, by introducing C first and forcing me to write my own implementation of any system call that I wanted to make.

I studied programming in High School as well, and I distinctly remember being utterly and completely baffled by strange things like dynamic memory and pointers. I mean, why don’t you just allocate a bigger array. During that course I actually grasped pointers for the first time, and even looking back over the last couple of weeks, a lot of my performance work recently is directly based on things that I learned there.

After completing that course, I got several books to help me understand the fundamentals better. Operating Systems Design and Implementation, Modern Operating Systems and Operating System Concepts to understand how an operating system works, not just a single program. Win32 System Programming, which I read mainly to understand the environment in which I was working and Windows Sockets Network Programming, from which I learned the basic concepts of networking.

The common thread throughout all of them is that initially I focused on understanding the technical environment in which I was working, getting to understand how things are working at a very low level. And while it may appear that understand those low level details would be nice in terms of general education but have little relevance to where I am spending most of my time, that is quite inaccurate. The recently built serialization system built for the profiler was heavily influenced from my reading of the OS books, for example.

For that matter, other good books are Practical File System Design, talking about the BeOS file system, which I found utterly fascinating or Virtual Machine Design and Implementation C/C++ which is a horrible book, but one that gave me the confidence to do a lot of things, since I saw how trivially simple it was to build such things.

Coming back to the quote in the beginning of this post, understanding the underlying principles has allowed me to do approach a new technology with the confidence that I understand how it must work, because I understand the environment in which it works. Oh, there are a lot of details that you need to get, but once you have the conceptual model of a technology in mind, it is so much easier to get to grips with it.

Interestingly enough, I only got to software design books at a much later stage, and even today, I find the low level details quite fascinating, when I can get new material in a subject that is interesting to me.

time to read 2 min | 299 words

I spent some time today trying to optimize the amount of data the profiler is sending on the wire. My first thought was that I could simply wrap the output stream with a compressing stream and use that, indeed, in my initial testing, it proved to be quite simple to do and reduced the amount of data being sent by a factor of 5. I played around a bit more and discovered that different compression implementation can bring me up to a factor of 50!

Unfortunately, I did all my initial testing on files, and while the profiler is able to read files just fine, it is most commonly used for live profiling, to see what is going on in the application right now. The problem here is that adding compression is a truly marvelous way to screw that up. Basically, I want to compress live data, and most compression libraries are not up for that task. It gets a bit more complex when you realize that what I actually wanted was a way to get compression to work on relatively small data chunks.

When you think how most compression algorithm works (there is a dictionary in there somewhere), you realize what the problem is. You need to keep updating the dictionary while you are compressing the stream, and at the same time, you need the dictionary to uncompress things. That make it… difficult to handle things. I thought about compressing small chunks (say, every 256Kb), but then I run into problems of figuring out when exactly I am supposed to be flushing them, how to handle partial messages, and more.

In the end, I decided that while it was a very interesting trial run, this is not something that is likely to show good ROI.

time to read 3 min | 521 words

image

One of the more interesting points of my posts about Entity Framework & NHibernate is the discovery of things that Entity Framework can do that NHibernate cannot. In fact, if you’ll read the posts, instead of the comments, you can see that this is precisely what I asked, but people didn’t really read the text.

I wanted to dedicate this post to ghost objects, and how NHibernate deals with them.

Before we start, let me explain what ghost objects are. Let us say that you have a many to one polymorphic association, such as the one represented as Comment.Post.

A post may be either a Post or an Article, and since NHibernate by default will lazy load the association, NHibernate will generate a proxy object (also called a ghost object). That, in turn, result in several common issues: Leaking this and the inability to cast to the proper type are the most common ones.

In practice, this is something that you would generally run into when you are violating the Liskov Substitution Principle, so my general recommendation is to just fix your design.

Nevertheless, since the question pop up occasionally, I thought that I might write a bit more details on how to resolve this. Basically, the main issue is that at the point in time where we are loading the Comment entity, we don’t have enough information to know what the actual entity type is. The simplest way to work around this issue is to tell NHibernate to load the associated entity as part of the parent entity load.

In the case of the comment, we can do it like this:

<many-to-one name="Post" 
             lazy="false"
             outer-join="true"
             column="PostId"/>

The lazy=”false” tell NHibernate to load the association eagerly, while the outer-join will add a join to load it in a single query. One thing to note, however, is that (by design) HQL queries will ignore any hints in the mapping, so you would have to specify join fetch explicitly in the mapping, otherwise it would generate a separate query for that.

Since we eagerly load the associated entity, and we know its type, we don’t have to deal with any proxies, and can avoid the ghost objects problem completely.

time to read 1 min | 193 words

Christopher Bennage has submitted a Mix talk that I think is interesting :-)

LinqToSQL and EntityFramework Profilers: Case Study

If you aren’t already familiar with the UberProf suite of ORM profilers, you can read tales of the development on Ayende’s blog. Rob and I built the UI side of the application, and we learned a lot in the process. I’d like to do a talk were we discuss the challenges of the project, how we solved them, and what we did wrong.

Yes, NHProf will be included too. (I submitted a case study for it last year, and it didn’t get picked. I have to sneak it in).

A few interesting aspects:

  • we built this using MVVM, but well before Caliburn reached maturity.
  • the four separate apps (NHProf, EFProf, L2SProf, HProf) all use a single code base.
  • we’re about to port the project from WPF to Silverlight.

Please vote for this session.

time to read 11 min | 2113 words

Jose was kind enough to post a review of my sample Effectus application. This post is a reply to that.

The first thing that Jose didn’t like was the fact that I didn’t put an abstraction layer in front of NHibernate’s usage in my application. There are several reasons for that, the first, and simplest, is that I was trying to explain how to use NHibernate, and for that, I wanted to deal with the lowest common denominator, not show off a particular wrapper implementation. Showing NHibernate usage directly means that even if you are using another library with it, you would still be able to take advantage of the information that I give you.

Second, and quite important, is that by using NHibernate directly I can take advantage of NHibernate features explicitly meant to support certain scenarios, but which are not likely to be expose when libraries wrap NHibernate. A good example of that is in Jose’s sample code, which make use of an ISession instead of IStatelessSession to load the data for the main screen. As explained in the article, the difference between the two is important, and in the context where it is used it introduce what is effectively a memory leak into Jose’s implementation, as well as the chance for some really interesting errors down the road if the session will run into an error.

Third, Jose bring up the following:

This presenter is GLUED to NHibernate. And this means for instance that you can’t test it without NHibernate, and this means that you can’t change your persistence layer.

Yes, and that is intentional. That is the very task that those presenters are for. Trying to abstract that away just means that I put an additional layer of abstraction that does absolutely nothing. For that matter, let us look at Jose’s implementation using an abstraction layer compared to mine.

Here is my implementation:

private void LoadPage(int page)
{
    using (var tx = StatelessSession.BeginTransaction())
    {
        var actions = StatelessSession.CreateCriteria<ToDoAction>()
            .SetFirstResult(page * PageSize)
            .SetMaxResults(PageSize)
            .List<ToDoAction>();

        var total = StatelessSession.CreateCriteria<ToDoAction>()
            .SetProjection(Projections.RowCount())
            .UniqueResult<int>();

        this.NumberOfPages.Value = total / PageSize + (total % PageSize == 0 ? 0 : 1);
        this.Model = new Model
        {
            Actions = new ObservableCollection<ToDoAction>(actions),
            NumberOfPages = NumberOfPages,
            CurrentPage = CurrentPage + 1
        };
        this.CurrentPage.Value = page;

        tx.Commit();
    }
}

And here is Jose’s:

private void LoadPage(int page)
{
    var actions = _toDoActionsDao.RetrieveAll()
                                 .Skip(page * PageSize)
                                 .Take(PageSize).ToList();

    var total = _toDoActionsDao.RetrieveAll().Count();
    
    NumberOfPages.Value = total / PageSize 
                        + (total % PageSize == 0 ? 0 : 1);
    
    Model = new Model
    {
        Actions = new ObservableCollection<ToDoAction>(actions),
        NumberOfPages = NumberOfPages,
        CurrentPage = CurrentPage + 1
    };
    
    CurrentPage.Value = page;
}

The code is still doing the exact same thing, in other words, we haven’t moved any logic away, and most of the code is dealing with data access details. My approach for testing this is to make use of an in memory database, which result in a single test that make sure that the code works, instead of a series of tests that verifies that each piece work independently and then my single test. I find it much more effective in terms of time & effort.

As for the idea of changing your persistence layer, forget about it. It isn’t going to happen in any real world application without a lot of effort, so you might as well save yourself the effort of working with the lowest common denominator and take full advantage of the framework. Just to note, when I ported the NerdDinner code to NHibernate (an application that make use of a single table), I had to make major changes to the application.

Jose didn’t like this method:

public void OnSave()
{
    bool successfulSave;
    try
    {
        using (var tx = Session.BeginTransaction())
        {
            // this isn't strictly necessary, NHibernate will 
            // automatically do it for us, but it make things
            // more explicit
            Session.Update(Model.Action);

            tx.Commit();
        }
        successfulSave = true;
    }
    catch (StaleObjectStateException)
    {
        var mergeResult = Presenters.ShowDialog<MergeResult?>("Merge", Model.Action);
        successfulSave = mergeResult != null;

        ReplaceSessionAfterError();
    }

    // we call ActionUpdated anyway, either we updated the value ourselves
    // or we encountered a concurrency conflict, in which case we _still_
    // want other parts of the application to update themselves with the values
    // from the db
    EventPublisher.Publish(new ActionUpdated
    {
        Id = Model.Action.Id
    }, this);

    if (successfulSave)
        View.Close();
}

Because:

  • ReplaceSessionAfterError, to much responsibility for a presenter.
  • Session/Transaction again.
  • There is a BUG, the publish mechanism work in sync with the rest of the code. This means… that this windows is not going to close until others have finished handling the event. For the given example, the Edit windows is not going to close until the main window finish to refresh the current page.
  • Too much logic for this method = hard to test.

I fully agree that this is a complex method, and Jose’s refactoring for that is an improvement indeed.

[PersistenceConversation(ConversationEndMode = EndMode.End)]
public virtual void OnSave() { _toDoActionsDao.Update(Model.Action); EventPublisher.Enlist(new ActionUpdated { Id = Model.Action.Id }, this); View.Close(); } public override void OnException(Exception exception) { if(exception is StaleEntityException) { Presenters.ShowDialog<MergeResult?>("Merge", Model.Action); EventPublisher.Enlist(new ActionUpdated { Id = Model.Action.Id }, this); } }

Jose has implemented the following changes:

  • I’ve defined a new convention, if a public method in the presenter throw an exception, the OnException method will be called. This is done by a castle interceptor, and this is the last line of defense for unhandled exceptions.
  • I’m using “StaleEntityException” rather than “StaleObjectStateException”, this is MY exception. This is easily done by a CpBT artifact.
  • I’m not calling “EventPublisher.Publish” anymore, this code use EventPublisher.Enlist. Here, I’ve split the “Publish” code in two different methods one for Enlist and other for Raise. The enlisted events will be raised right after the OnSave method is called and thus after the windows is closed.
  • Also, notice that here is the conversation per business transaction pattern with all its splendor. The two former methods are conversation participants, with EndMode equals to continue. This means that the NH Session will remain opened. The OnSave method has EndMode equals to End, this means that right after the method finished, CpBT internally will flush the Unit of Work and close it.

This is better than the original implementation, but I think it can be made better still. First, OnException as a generic method is a bad idea. For the simple reason that the exception logic for different method is different, I would probably define a [MethodName]Error(Exception e) convention instead, which would make it easier to separate the error logic for different methods.

Again, I don’t find any usefulness in abstracting the underlying framework, I haven’t seen a single case where it was useful, but I have seen a lot of cases where it was hurting the team & the project.

The idea about splitting the publication and raising is really nice, I agree.

However, there is a problem in the code with regards to session handling in cases of error. There is a pretty good reason why I introduced the ReplaceSessionAfterError method. In general, I want to keep my session alive for the duration of the form, because I get a lot of benefits out of that. But, if the session has run into an error, I need to replace it and all the objects associated with it. Closing the session no matter what is going on is not a good solution, and you can’t really solve the problem in a generic way without calling back to the presenter that generated the error.

time to read 2 min | 263 words

I got a request in email to add something like Disqus to my blog, which would allow a richer platform for the commenting that goes on here. I think that the request and my reply are interesting enough to warrant this blog post.

My comment system is the default subtext one, but there are several advantages to the way it works. You can read the full explanation in Joel on Software post about the matter, but basically, threading encourages people to go off in tangents, single thread of conversation make it significantly easier to have only one conversation.

There is another reason, which is personally important to me, which is that I want to "own" the comments. Not own in terms of copyright, but own in terms of having control of the data itself. Having the comments (a hugely important part of the blog) being managed by a 3rd party which might shut down and take all the comments with it is not acceptable.

That is probably a false fear, but it is something that I take under consideration. The reasoning about the type of interaction going on in the comments is a lot more important. There is also something else to consider, if a post gets too hot (generating too many comments), I am either going to close comments on it, or open a new post with summary of what went on in the previous post comment thread anyway, so it does have some checks & balances that keep a comment thread from growing too large.

time to read 1 min | 174 words

Let us see if you can help me here, I found myself facing a rather unpleasant realization, in that I need to communicate between two processes (that compose a single system) with some rather draconian measures about how they are going to be used.

Quite simply, I need to expose an interface with ~25 methods on it to the second process. So far, it is pretty easy to do, right?

The problem is that the first process is a standard .NET executable, which may also be run on the Mono platform while the second is a Silverlight application. My first thought, just use remoting to handle this, failed because remoting isn’t available to Silverlight. My second thought, to use WCF, failed because that isn’t available on Mono.

Building a simple RPC system is pretty easy, so that doesn’t worry me. The reason I am reluctant to do so is that I really don’t want to build yet another infrastructure. At some point, even to me, NIH flag starts to pop up.

time to read 1 min | 98 words

I was quite amazed by the number of conspiracy theories that were brought up by this post. Some of them in the comments, some of them in private communications.

The reason, the real & only one, that I had so many posts lately about performance is quite simple. I did a lot of that recently, and one aspect of perf testing that I didn’t talk about is that most perf test run takes a long time, that means that I had a lot of free time. Free time for me usually translate into posting time :-)

FUTURE POSTS

  1. Cryptographic documents in RavenDB - 3 days from now
  2. Recording: How To Run AI Agents Natively In Your Database - 6 days from now

There are posts all the way to Sep 29, 2025

RECENT SERIES

  1. Recording (18):
    22 Sep 2025 - How To Create Powerful and Secure AI Agents with RavenDB
  2. Webinar (8):
    16 Sep 2025 - Building AI Agents in RavenDB
  3. RavenDB 7.1 (7):
    11 Jul 2025 - The Gen AI release
  4. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
  5. RavenDB News (2):
    02 May 2025 - May 2025
View all series

Syndication

Main feed ... ...
Comments feed   ... ...
}