Ayende @ Rahien

Ayende Rahien commented on RavenDB: Let us write our own JSON Parser, NOT

Fri, 13 May 2011 06:38:22 GMT

James, I absolutely agree with you here. And I think that you have done tremendous job in making it very easy for us to use JSON. RavenDB have benefited greatly from being able to make use of all the good stuff that are in JSON.Net For that matter, the mere fact that we could spend a few days are customize just the parts that were problematic for us is another testament for a well written piece of code.

James Newton-King commented on RavenDB: Let us write our own JSON Parser, NOT

Thu, 12 May 2011 23:04:58 GMT

My perspective with Json.NET is that easy of use and flexibility far outweighs performance in importance. Everyone cares about getting stuff done quickly and well while only 5% (if that) are writing code where the performance critical code is JSON serialization.

Ayende Rahien commented on RavenDB: Let us write our own JSON Parser, NOT

Sun, 01 May 2011 08:52:59 GMT

Demis, 1) Performance is important only as much as it affect the perceived perf. Beyond a certain point, it is no longer relvant. A user can't tell if a page rendered in 50 ms or a 100 ms, for example. 2 & 3) As I said, I am not building a standard app, I am building something that requires a lot of details about the actual document and modifying how to work with it. 4) This is important, objects of size > 85Kb are NOT COMPACTED. You might want to read on the Large Object Heap and the implications of such a thing for building server products. You might want to read this artcile: [msdn.microsoft.com/en-us/magazine/cc534993.aspx](http://msdn.microsoft.com/en-us/magazine/cc534993.aspx)

Demis Bellot commented on RavenDB: Let us write our own JSON Parser, NOT

Sun, 01 May 2011 08:47:18 GMT

1) For the most part, most people would find whatever they are using to be more than enough. Performance only matter if you run into a perf issue. Depends if you consider performance of primary importance or not, from what I can see sitting on the mailing list various mailing lists / NoSQL groups perf seems to matter a lot more outside .NET culture, where perceived perf / response times is deeply linked to end user UX / satisfaction. This is understood a lot better in alt lang fx and dev platforms where there is less enterprise/SC teachings and heavy weight fx's to cloud any focus on perf/scalability. However most top internet properties put performance of paramount importance, but I guess it is something that's disregarded in the areas where .NET is positioned in the enterprise. 2) Regarding the API. I am talking about support for things like converters, selecting which properties/fields will be serialized and which will not be, modifying the way we are reading/saving the data, etc. 3) I need to have access to the DOM in RavenDB. Dictionary loses the minimal amount of type information that is already there in JSON, and is not acceptable. Sounds like you want access to a quasi-strong typed api that's not the POCO that's created it? I guess that's fair enough I suspect as a Document DB server you have some unique requirements. For most other use-cases where the mapping takes place at the data/dto models, I can't see why this is needed. 4) Having only string as an input (and if you have a Stream and read that to a string it is the same) is a big problem when you are dealing with large documents, because you are going to create a single continious (and very large) string. That results in having a lot of fragmentation in the LOB, which can result in Out Of Memory Exceptions in a server application. That is not acceptable for us in RavenDB. I understand CPU vs. Mem tradeoffs, but not considering the LOB and the implications on fragmentation means that you are leaving yourself open to some very bad issues down the road. Yeah I don't buy this, you may be referring to large asset files which you should always stream and as they have the potential to be upward 1GB in size - which, I agree you should always stream. However I very much doubt this fragmentation is a real-world concern for data documents which are unlikely to be no more than a few MB in size. The buffer only lasts a short time after the few ms it takes to deserialize it into object graph (which generally takes up a similar amount of space as the buffer) until its reclaimed by the GC. The .NET GC as you know is self-compacting which as a result has no longterm fragmentation problems. I'd be very interested in any links contrary to this where adding and reclaiming a few MB periodically in .NET causes 'Out Of Memory Exceptions'? - as this is news to me. 5) Regarding my benchmark, take a look at Raven.Tryouts, the PerfTest class. It isn't a realy interesting case, but we are using 3 MB document as our source information here. And we saw about 100% improvement between the two options. ok kool, I'll give it a look when I run into some free time, thx.

Ayende Rahien commented on RavenDB: Let us write our own JSON Parser, NOT

Sun, 01 May 2011 07:56:24 GMT

Demis, 1) For the most part, most people would find whatever they are using to be more than enough. Performance only matter if you run into a perf issue. 2) Regarding the API. I am talking about support for things like converters, selecting which properties/fields will be serialized and which will not be, modifying the way we are reading/saving the data, etc. 3) I need to have access to the DOM in RavenDB. Dictionary loses the minimal amount of type information that is already there in JSON, and is not acceptable. 4) Having only string as an input (and if you have a Stream and read that to a string it is the same) is a big problem when you are dealing with large documents, because you are going to create a single continious (and very large) string. That results in having a lot of fragmentation in the LOB, which can result in Out Of Memory Exceptions in a server application. That is not acceptable for us in RavenDB. I understand CPU vs. Mem tradeoffs, but not considering the LOB and the implications on fragmentation means that you are leaving yourself open to some very bad issues down the road. 5) Regarding my benchmark, take a look at Raven.Tryouts, the PerfTest class. It isn't a realy interesting case, but we are using 3 MB document as our source information here. And we saw about 100% improvement between the two options.

Demis Bellot commented on RavenDB: Let us write our own JSON Parser, NOT

Sat, 30 Apr 2011 21:29:25 GMT

Hi Ayende, Good to see you're tackling the JSON in .NET problem - IMO perf of JSON parsers were pretty lame in .NET (i.e. MS's JSON parser is slower than their XML one, etc). Since JSON is an increasingly important serialization format it was frustrating to see the poor options we had shipped with the .NET framework. Even today I'm still seeing tutorials recommending the use of JavaScriptSerializer which I've found over 100x slower than protobuf-net in my benchmarks ( [http://www.servicestack.net/mythz_blog/?p=344](http://www.servicestack.net/mythz_blog/?p=344)) which I've had to exclude because it was infeasible to run benchmarks with a high N. Anyway since you've evaluated ServiceStack.JsonSerializer I would like to provide my own feedback of the limitations you've listed: - It wasn’t really nearly as rich in terms of API and functionality. Well it's a JSON C# POCO Serializer which can serialize basically any .NET collection, C# POCO types, anonymous and late-bound types, etc - basically any clean .NET DTO/domain object graph. I hear about Json Serializers with DataSet, XML > JSON support, etc but am really not sure/convinced of the use-cases requiring this. IMO these are features/solutions to design problems you shouldn't have to being with. - It was focused purely on reading to and from .NET objects, with no JSON DOM supported. Another area where I don't see value of is JSON DOM compared to just using normal Generic Dictionary / List to build and maintain a dynamic data payload. C# has intrinsic support for populating Generic collection types (i.e. collection initializers) making it much easier and more natural to populate from C#. Also any C# POCO type that is serialized can be deserialized as a Dictionary and vice-versa allowing you to still parse a JSON payload without the C# POCO type that created it. For examples of dynamic JSON parsing with ServiceStack.JsonSerializer see: [http://goo.gl/G8CNI](http://goo.gl/G8CNI) (parsing GitHub pull request) and [http://goo.gl/k8ayt](http://goo.gl/k8ayt) these examples show you you can trivially parse an adhoc JSON payload, populating your own strong-typed model. It would be nice to know of scenarios where a JSON DOM would prove beneficial. - The only input format it had was a string. That's a little misleading which I hope you will clarify. The JsonSerializer exposes APIs to parse JSON via a string, TextReader or Stream (see the src for JsonSerializer: [http://goo.gl/IH5hN](http://goo.gl/IH5hN)). What you're referring to is that behind the scenes I read that into a buffer which is what I use to deserialize. This is done purely for perf reasons in the light that CPU efficiency is better than Memory efficiency (which is becoming more plentiful). I've decided to do this because there is a small fraction of use-cases that would benefit from a streaming JSON api, i.e. who benefits/uses a partially populated domain model? Is RavenDB doing any processing on a partial JSON document/dataset? So with the 80/20 rule in mind I've discarded the calling overhead from reading from a stream for better perf. Note for serialization I still write to a stream since it's important to not buffer the output where writing to a stream will yield perf benefits pushing the serialized output to the response stream as soon as you can. Also do you have your benchmarks available? I'm personally curious on how ServiceStack.JsonSerializer stacks up against your latest efforts :) Anyway great to see you're still blogging and focused on perf, it's a feature that is continually unconsidered in the monolithic frameworks being produced in the .NET space today. BTW I'm trying to start a body of knowledge around perf/scaling in .NET so if you ever want to contribute a piece on either, maybe a piece on where RavenDB is faster or scales better than the traditional .NET persistance options it'd be a very welcome addition to: [https://github.com/mythz/ScalingDotNET](https://github.com/mythz/ScalingDotNET) - Once I get enough info on the subject I'll make a website dedicated to the subject.

Chris Wright commented on RavenDB: Let us write our own JSON Parser, NOT

Mon, 25 Apr 2011 16:18:19 GMT

Towa, O(N) key lookup time is bad. That should be O(1) ideally, though O(log N) might also be acceptable. I can't think of a data structure that would have O(N**2) lookup times.

Towa commented on RavenDB: Let us write our own JSON Parser, NOT

Mon, 25 Apr 2011 15:01:17 GMT

You wrote: "and all operations on the DOM are O(N)". But O(N) is actually pretty good. Did you mean O(N^2)?

Ayende Rahien commented on RavenDB: Let us write our own JSON Parser, NOT

Mon, 25 Apr 2011 06:54:03 GMT

Rafiki, It is part of the ravendb source code, and available under the same license as RavenDB

Rafiki commented on RavenDB: Let us write our own JSON Parser, NOT

Sun, 24 Apr 2011 22:28:48 GMT

Great work! Will your DOM be avaliable under same license as Newtonsoft.Json (CC as far as I know)?

mattmc3 commented on RavenDB: Let us write our own JSON Parser, NOT

Sun, 24 Apr 2011 18:46:42 GMT

The simplest way to draw a distinction between whether you write it yourself or use a 3rd party library is to ask yourself whether it's your organization's or your application's core competency. If it isn't, then you're wasting time and resources creating something that's probably available in a "good enough" form elsewhere. An example of this might be Mozilla's XUL... neat, but totally tangential to creating a nice browser. If it is within your app's core competency, then "good enough" might prove not to be, well, good enough. If your application stores documents as JSON like RavenDB does, you better believe that if you're successful enough you're going to end up highly customizing something or writing your own JSON parser entirely. It's the classic build vs. buy conversation, and it's always surprised me how often people get it wrong.