Real World NHibernate: Reducing startup times for large amount of entities
The scenario that Christiaan Baes need to solve is reducing the startup time of a Win Forms application. The main issue here is that the initial load of the application should be fast, but in this case, we are feeding NHibernate about a hundred entities, so it take a few seconds to run them.
I asked Christiaan to send me profiler results of the code, and it looked all right on his end, so it was time to look at NHibernate and see what she had to say about that.
The test scenario was startup time for a thousands entities. I think that this is a nice shiny number that probably represent way too much entities per application, but let us go with it.
Initial testing showed that for this amount, we get about 14 seconds startup time, just to get the session factory started. Now that wasn't right, I felt. NHibernate does a lot of reflection on startup, but even so, 14 seconds was quite a bit. A deeper analysis showed that I was wrong, it wasn't reflection that was costing so much, it was actually building the Configuration that took the time, not building the session factory.
Why was it taking so long? Well, NHibernate is using XML for configuration, and that xml is validating using a schema. And that took 11 seconds for a thousand documents.
Problem.
I played a bit with the code, but it remained steady on 11 seconds just for reading the configuration in. Eventually I tried to do merge the 1000 XML documents to a single big document, and that had a significant effect, it dropped the time to just over 3 seconds. To an overall startup time of 5.5 seconds on a 1000 entities session factory.
If you are going in this route, I would strongly suggest that you would do this merging as a pre build step, rather than try to work with a single unwieldy artifact.
Still not good enough, I hear you think, right?
Here are a few other suggestions that are worth keeping in mind:
- Why are you initializing it on startup? And does it have to be on the main thread? If you can push it to a background thread, then this is usually all that you would need to do.
- Do you really need all the entities, from the get go? If not, you can create two session factories. One to serve as a fast initializing factory, able to respond to the initial requests. At the same time, initial the global session factory, and then replace them.
- Do you really need all those entities in a single session factory? If you have a lot of entities, usually this is a sign of several mixed domains that are involved. It would probably be better to split them up to different session factories.
If all of the above still isn't enough for you, the next step is persistable configuration itself (probably using serialization. That is not supported by NHibernate at the moment, although I am more than willing to accept a patch that would add this functionality.
Hope this helps...
Comments
death to XML =o), XML is sloooooww
This problem was raised at ALT.NET. One of the ideas that was tossed around was persistable configuration which you mention above. Another possible solution that was discussed was to JIT the session factory. I'm sure that if this is a bad idea you'll tell me!
Certainly I think that it would be beneficial to do something here because database integration tests are slow anyway and spending 10 secs to build the session factory is too long. Maybe the community can agree upon an approach and then we can take care of implementation?
Is NHibernate actually doing XSD validation on the documents? If so, could it be disabled and see if that improves things somewhat? If not, then what exactly does NHibernate to read all the configuration documents, then? I mean, 1000 files should generate a bit of IO, but not enough to account for such a big difference by itself (though I guess it is building the DOM that results in that).
Since I've fallen in like with Binsor I'd like Binsor for NHibernate...
Maybe Boobernate.
Thanks for looking at my problem, I will change it soon and make it faster.
Regarding to initialize only a subset of entities using a staring session factory and after replace it with a global session factory, how can I decide which entities include in a session factory and which in another session factory?
I only have a assembly with all my entities inside.
Thanks
Here's a tip from the code-generation Gilde :)
Why don't you generate C# code as an option for the XML interpretation, where the C# code simply builds the inner structures build otherwise from the xml data. You can even have a commandline tool which does that at compile time (pre-build actually) where the C# code is embedded into the main program, OR, where the generated code is in a separate assembly and loaded on demand by nhibernate.
Sure, you lose the 'flexibility' of changing mappings for production systems. well... who wants to have that option is really not that sane...
I am interested in maybe doing the patch, but how would you go about it? I guess we would have to make some sort of binary file format for NHibernate configurations. Another possibility would be to make a compiler for the mappings, for example to a binary file format and do it as a pre-build step, then it should be possible to load it very fast. What do you think?
Tomas,
Yes, it is doing XSD validation. And that takes most of the time.
There is no way to build the configuration without that, am I worry about default values and stuff that are in the configuration that may be missed.
From a simple attempt, AddReaderUnvalidate did have a significant performance boost.
Alessandro,
Create a second configuration file, and instead of specifying <assembly>, specify just <resource> with the relevant resources.
Frans,
Yes, that is the next step.
I don't think that it would be C# code for this. I experimented with Code DOM based persistence, it is nice idea, but awkward to maintain.
I think that we will simply allow to save and load a session factory.
Philip,
My first approach would be to slap Serializable on everything, and then see what works and what not.
Writing custom binary formats is not something that I wish to do.
I've actually started refactoring/recoding the XML mapping parsing bits in the NH trunk.
My goal is to be able to use XML deserialization to turn the xml documents into an object graph of simple objects. The effect of this is that this DTO-like object graph could also be binary serialized/deserialized. Currently the mapping parser takes the XML data and directly turns that into more heavyweight NH objects that often cannot be serialized all in one step. Once this process is split into two steps, a schema caching step could be introduced to speed up startup time by more than a few seconds. This is not an insignificant improvement for desktop applications.
So far its been a pretty significant effort (most of this processing was done in one large static class) and I haven't had much time to work on it recently with moving and starting a new job. I will get back to it once the dust settles.
Ayende, that makes perfect sense. Great ideas!
I definitely 2nd the approach for "boobernate" but maybe under a different name... I do have to get clients to approve this stuff, ya know!
It would be great to have an NHibernate API for serializing/deserializing whatever gets output from the Configuration, seems like it would be a bunch of hashtables (for settings and the reflection cache) -- and also the mapping format in memory. All of that sounds serializable, so I think your approach could bear fruit.
I'm thinking something like this:
Configuration cfg = new Configruation();
//add assembly
SessionFactory factory = cfg.BuildsessionFactory();
NHibernate.Utility.PersistConfiguration(config, factory, "config-cache.nhconfig" );
slap that into a pre-build Nant/MSBuild task and then you can do this in your application:
if( File.Exists("config-cache.nhconfig") )
{
}
else
{
}
I'm not entirely sure what those methods would look like off the top of my head, but it seems doable.
Thoughts?
I’m with Frans, this is one of those areas where code generation shines. Consider a DSL such as active writer which could store the map in xml and generate c# on the fly. Problem solved without pre-build steps. For others that like manually hacking xml use a pre-build c# code generator.
This is one of the reasons I don’t use nHibernate, great product, I just don’t like hacking xml and can’t afford the startup times. Why recompile something each time an app launches, waste of time and energy.
Another improvement that could be made is to break the map into two parts, schema and map, the map acts as a bridge between the schema and the objects. So if you convert a relation from 1:N to N:N you only change the schema and all maps update.
Mike D,
The startup times in this case is for the scale far-out scenario. In practice, the startup times are not usually meaningful.
If you want to avoid hacking XML, use ActiveRecord, it makes things much eaiser.
And I don't agree with the "let us just change this tiny bit" approach, it doesn't work in practice.
In the NH trunk, there is already code that generates C# classes from the xml schema for the mapping document. There is already code that can convert an entire mapping document into an object graph of those classes via XmlSerialization. The mapping document parser is already PARTIALLY taking advantage of this, and will eventually COMPLETELY utilize this as the first step to go from XML to fully realized NH objects.
The fully realized objects cannot be guaranteed to be serialized, partially because one can define new types in one's own assemblies outside the control of NH. Plus the order of operations is significant. This is where these schema objects come into play. They are very simple classes that are fully serializable. This would allow them to be cached via binary serialization to disk. Some rudimentary performance tests on this were promising.
In theory, one could build a fluent builder interface around these objects to allow definition of the mapping documents in C# as opposed to XML. (And still not be mixed in with your domain objects like the MappingAttributes stuff.) That would actually be pretty awesome I think. But that can't happen until I get the code migration done. (That idea kinda motivates me to get back into it. Damn real life in the way.)
FYI, much of the relevant code is under these folders:
https://nhibernate.svn.sourceforge.net/svnroot/nhibernate/trunk/nhibernate/src/NHibernate/Cfg/MappingSchema/
https://nhibernate.svn.sourceforge.net/svnroot/nhibernate/trunk/nhibernate/src/NHibernate/Cfg/XmlHbmBinding/
Can we assume that using Attributes instead of XML mappings remove all of this startup time?
I'm a tad surprised no one mentioned attributes at all given the relative extreme counter-ideas that are being thrown around. I appreciate that migrating to attributes from XML would be a great task, but they at least deserve a mention somewhere in the discussion... ?
Comment preview