<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:copyright="http://blogs.law.harvard.edu/tech/rss" xmlns:image="http://purl.org/rss/1.0/modules/image/">
    <channel>
        <title>Databases</title>
        <link>http://ayende.com/Blog/category/502.aspx</link>
        <description>Databases</description>
        <language>en-US</language>
        <copyright>Ayende Rahien</copyright>
        <managingEditor>Ayende@ayende.com</managingEditor>
        <generator>Subtext Version 2.0.0.0</generator>
        <item>
            <title>Slaying relational hydras (or dating them)</title>
            <link>http://ayende.com/Blog/archive/2010/03/08/slaying-relational-hydras-or-dating-them.aspx</link>
            <description>&lt;p&gt;Sometimes client confidentiality can be &lt;em&gt;really&lt;/em&gt; annoying, because the problem sets &amp;amp; suggested solutions that come up are &lt;em&gt;really&lt;/em&gt; interesting. That said, since I &lt;em&gt;am&lt;/em&gt; interesting in having &lt;em&gt;future&lt;/em&gt; clients, it is pretty much a must have. As such, the current post represent a real world customer problem, but probably in a totally different content. In fact, I would be surprised if the customer was able to recognize the problem as his.&lt;/p&gt;  &lt;p&gt;That said, the problem is actually quite simple. Consider a dating site, where you can input your details and what you seek, and the site will match you with the appropriate person. I am going to ignore a lot of things here, so if you actually have built a dating site, try not to cringe.&lt;/p&gt;  &lt;p&gt;At the most basic level, we have two screens, the My Details screen, where the users can specifies their stats and their preferences:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Slayingrelationalhydrasordatingthem_7B6F/image_2.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Slayingrelationalhydrasordatingthem_7B6F/image_thumb.png" width="439" height="406" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;And the results screen, which shows the user the candidate matching their preferences.&lt;/p&gt;  &lt;p&gt;There is just one interesting tidbit, the list of qualities is pretty big (hundreds or thousands of potential qualities).&lt;/p&gt;  &lt;p&gt;Can you design a relational model that would be a good fit for this? And allow efficient searching?&lt;/p&gt;  &lt;p&gt;I gave it some thought, and I can’t think of one, but maybe you can.&lt;/p&gt;  &lt;p&gt;I’ll follow up on this post in a day or two, showing how to implement the problem using Raven.&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/11355.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2010/03/08/slaying-relational-hydras-or-dating-them.aspx</guid>
            <pubDate>Mon, 08 Mar 2010 08:46:00 GMT</pubDate>
            <wfw:comment>http://ayende.com/Blog/comments/11355.aspx</wfw:comment>
            <comments>http://ayende.com/Blog/archive/2010/03/08/slaying-relational-hydras-or-dating-them.aspx#feedback</comments>
            <slash:comments>25</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/11355.aspx</wfw:commentRss>
        </item>
        <item>
            <title>Fun with a non relational databases</title>
            <link>http://ayende.com/Blog/archive/2010/02/26/fun-with-a-non-relational-databases.aspx</link>
            <description>&lt;p&gt;My post showing a &lt;a href="http://ayende.com/Blog/archive/2010/02/22/slaying-relational-dragons.aspx"&gt;different approach for handling data got a lot of traffic&lt;/a&gt;, and a lot of good comments. But I think that there is some misunderstanding with regards to the capabilities of NoSQL databases, so I am going to try to expand on those capabilities in this post.&lt;/p&gt;  &lt;p&gt;Instead of hand waving, and since I am thinking about this a lot lately, we will assume that we are talking about DivanDB (unreleased version), which has the following API:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;JsonDoc[] Get(params Id[] ids);&lt;/li&gt;    &lt;li&gt;Set(params JsonDoc[] docs);&lt;/li&gt;    &lt;li&gt;JsonDoc[] Query(string indexName, string query);&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;DivanDB is a simple Document database, storing documents as Json, and using Lucene as a indexer for the data. An index is defined by specifying a function that creates it:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;var booksByAuthor = from doc in documents     &lt;br /&gt;                                  where doc.type == “book”      &lt;br /&gt;                                  from author in doc.authors      &lt;br /&gt;                                  select new { author };&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;And the data looks like this:&lt;/p&gt;  &lt;p&gt;&lt;img alt="image" src="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Slayingtherelationaldragon_A925/image_thumb_5.png" /&gt;&lt;img alt="image" src="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Slayingtherelationaldragon_A925/image_thumb_7.png" /&gt;&lt;/p&gt;  &lt;p /&gt;  &lt;p&gt;Indexing is done at write time, you can think about those indexes as materialized views.&lt;/p&gt;  &lt;p&gt;It appears that people assumes that just because you aren’t using an RDBMS, you can’t use queries. Here are a few options to show how you &lt;em&gt;can&lt;/em&gt; do so. &lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Books by author:&lt;/strong&gt;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Query(“booksByAuthor”, “author:weber”);&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;&lt;strong&gt;Books by category:&lt;/strong&gt;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Query(“booksByCategory”, “category:scifi”);&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;How is the books by category index defined? I think you can guess.&lt;/p&gt;  &lt;blockquote&gt;var booksByCategory = from doc in documents   &lt;br /&gt;                                  where doc.type == “book”    &lt;br /&gt;                                  from category in doc.categories    &lt;br /&gt;                                  select new { category };&lt;/blockquote&gt;  &lt;p&gt;What other queries did people brought up?&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Who has book X in their queue?&lt;/strong&gt;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;Query(“usersByQueuedBooks”, “book:41”);&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;And the view?&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;var usersByQueuedBooks = from doc in documents     &lt;br /&gt;                                  where doc.type == "user"      &lt;br /&gt;                                  from book in doc.queues_books      &lt;br /&gt;                                  select new { book };&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;I’ll leave the implementation of &lt;strong&gt;Who has book X checked out&lt;/strong&gt; as an exercise for the reader.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;What about deletes? &lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Using this system, it looks like deletes might be really expensive, right? Well, that depends on what exactly you want here.&lt;/p&gt;  &lt;p&gt;My default approach would be to consider exactly what you want, as Udi pointed out, in the real world, you &lt;a href="http://www.udidahan.com/2009/09/01/dont-delete-just-dont/"&gt;&lt;em&gt;don’t delete&lt;/em&gt;&lt;/a&gt;&lt;em&gt;. &lt;/em&gt;&lt;/p&gt;  &lt;p&gt;But it is actually fairly easy to support something like this cheaply. It is all about defining the association and letting the DB handle this (although I am not fond of the syntax I came up with):&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;var books = from book in documents     &lt;br /&gt;            where book.type == "book"      &lt;br /&gt;            select new { book = book.id };      &lt;br /&gt;var checkedOutBooks =    from user in documents      &lt;br /&gt;                        where user.type = "user"      &lt;br /&gt;                        from book in user.checked_out      &lt;br /&gt;                        select new { book }; &lt;/p&gt;    &lt;p&gt;var queuedBooks =    from user in documents     &lt;br /&gt;                    where user.type = "user"      &lt;br /&gt;                    from book in user.queued_books      &lt;br /&gt;                    select new { book };      &lt;br /&gt;      &lt;br /&gt;FK_DisallowDeletion(books, checkoutBooks);      &lt;br /&gt;FK_RemoveElement(books, queuedBooks);&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Under the cover this creates indexers and can check out those at delete / insert time. &lt;/p&gt;  &lt;p&gt;However, I would probably not implement this for Rhino DivanDB, mostly because I agree with Udi.&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/11329.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2010/02/26/fun-with-a-non-relational-databases.aspx</guid>
            <pubDate>Fri, 26 Feb 2010 10:00:00 GMT</pubDate>
            <wfw:comment>http://ayende.com/Blog/comments/11329.aspx</wfw:comment>
            <comments>http://ayende.com/Blog/archive/2010/02/26/fun-with-a-non-relational-databases.aspx#feedback</comments>
            <slash:comments>17</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/11329.aspx</wfw:commentRss>
        </item>
        <item>
            <title>Rhino Divan DB reboot idea</title>
            <link>http://ayende.com/Blog/archive/2010/02/25/rhino-divan-db-reboot-idea.aspx</link>
            <description>&lt;p&gt;Divan DB is my pet database. I created it to&lt;em&gt; scratch an itch &lt;/em&gt;[Nitpickers: please note this!], to see if I can create Couch DB like system in .NET. You can read all about it in the &lt;a href="http://ayende.com/Blog/archive/2009/03/08/designing-a-document-database.aspx"&gt;following series of posts&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;It stalled for a while, mostly because I run into the hard problems (building map/reduce views). But I think that I actually have a better idea, instead of trying to build something that would just mimic Couch DB, a .NET based Document DB is actually a very achievable goal.&lt;/p&gt;  &lt;p&gt;The way it would work is actually pretty simple, the server would accept Json-formatted documents, like those:&lt;/p&gt;  &lt;blockquote&gt;   &lt;pre class="csharpcode"&gt;[
    {
        &lt;span class="str"&gt;"id"&lt;/span&gt;: 153,
        &lt;span class="str"&gt;"type"&lt;/span&gt;: &lt;span class="str"&gt;"book"&lt;/span&gt;,
        &lt;span class="str"&gt;"name"&lt;/span&gt;: &lt;span class="str"&gt;"Storm from the Shadows"&lt;/span&gt;,
        &lt;span class="str"&gt;"authors"&lt;/span&gt;: [
            &lt;span class="str"&gt;"David Weber"&lt;/span&gt;
        ],
        &lt;span class="str"&gt;"categories"&lt;/span&gt;: [
            &lt;span class="str"&gt;"SciFi"&lt;/span&gt;,
            &lt;span class="str"&gt;"Awesome"&lt;/span&gt;,
            &lt;span class="str"&gt;"You gotta read it"&lt;/span&gt;
        ],
        &lt;span class="str"&gt;"avg_stars"&lt;/span&gt;: 4.5,
        &lt;span class="str"&gt;"reviews"&lt;/span&gt;: [13,5423,423,123,512]
    },
    {
        &lt;span class="str"&gt;"id"&lt;/span&gt;: 1337,
        &lt;span class="str"&gt;"type"&lt;/span&gt;: &lt;span class="str"&gt;"book"&lt;/span&gt;,
        &lt;span class="str"&gt;"name"&lt;/span&gt;: &lt;span class="str"&gt;"DSLs in Boo"&lt;/span&gt;,
        &lt;span class="str"&gt;"authors"&lt;/span&gt;: [
            &lt;span class="str"&gt;"Ayende Rahien"&lt;/span&gt;,
            &lt;span class="str"&gt;"Oren Eini"&lt;/span&gt;
        ],
        &lt;span class="str"&gt;"categories"&lt;/span&gt;: [
            &lt;span class="str"&gt;"DSL"&lt;/span&gt;,
            &lt;span class="str"&gt;".NET"&lt;/span&gt;,
            &lt;span class="str"&gt;"You REALLY gotta read it"&lt;/span&gt;
        ],
        &lt;span class="str"&gt;"avg_stars"&lt;/span&gt;: 7,
        &lt;span class="str"&gt;"reviews"&lt;/span&gt;: [843,214,451]
    }
]&lt;/pre&gt;
  &lt;style type="text/css"&gt;&lt;![CDATA[
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;&lt;/blockquote&gt;

&lt;p&gt;Querying could be done either by id, or using a query on an index. Indexes can be defined using the following syntax:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;pre class="csharpcode"&gt;&lt;font color="#0000ff"&gt;var&lt;/font&gt; booksByTitle = 
   &lt;font color="#0000ff"&gt;from&lt;/font&gt; book &lt;span class="kwrd"&gt;in&lt;/span&gt; docs
   &lt;span class="kwrd"&gt;where&lt;/span&gt; book.type == &lt;span class="str"&gt;"book"&lt;/span&gt;
   &lt;font color="#0000ff"&gt;select&lt;/font&gt; &lt;span class="kwrd"&gt;new&lt;/span&gt; { book.title };&lt;/pre&gt;
  &lt;style type="text/css"&gt;&lt;![CDATA[
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;&lt;/blockquote&gt;

&lt;p&gt;The fun part here is that this index would be translated into a Lucene index, which means that you could query the index using a query:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Query(“booksByTitle”, “title:Boo”) –&amp;gt; documents that match this query.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As well as apply any &amp;amp; all the usual Lucene tricks.&lt;/p&gt;

&lt;p&gt;You don’t get Map/Reduce using this method, but the amount of complexity you have is quite low, and the implementation should take only several days to build.&lt;/p&gt;

&lt;p&gt;Thoughts?&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/11324.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2010/02/25/rhino-divan-db-reboot-idea.aspx</guid>
            <pubDate>Thu, 25 Feb 2010 10:00:00 GMT</pubDate>
            <wfw:comment>http://ayende.com/Blog/comments/11324.aspx</wfw:comment>
            <comments>http://ayende.com/Blog/archive/2010/02/25/rhino-divan-db-reboot-idea.aspx#feedback</comments>
            <slash:comments>36</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/11324.aspx</wfw:commentRss>
        </item>
        <item>
            <title>Slaying relational dragons</title>
            <link>http://ayende.com/Blog/archive/2010/02/22/slaying-relational-dragons.aspx</link>
            <description>&lt;p&gt;I recently had a fascinating &lt;a href="http://nhprof.com/commercialsupport"&gt;support call&lt;/a&gt;, talking about how to optimize a &lt;em&gt;very&lt;/em&gt; big model and an access pattern that basically required to have the entire model in memory for performing certain operations.&lt;/p&gt;  &lt;p&gt;A pleasant surprise was that it &lt;em&gt;wasn’t&lt;/em&gt; horrible (when I get called, there is usually a mess), which is what made things interesting. In the space of two hours, we managed to:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Reduced number of queries by 90%.&lt;/li&gt;    &lt;li&gt;Reduced &lt;em&gt;size&lt;/em&gt; of queries by 52%.&lt;/li&gt;    &lt;li&gt;Increased responsiveness by 60%, even for data set an order of magnitude.&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;My default answer whenever I am asked when to use NHibernate is: Whenever you use a relational database. &lt;/p&gt;  &lt;p&gt;My strong recommendation at the end of that support call? Don’t use a relational DB for what you are doing.&lt;/p&gt;  &lt;p&gt;The ERD just below has absolutely nothing to do with the support call, but hopefully it will help make the example. Note that I dropped some of the association tables, to make it simpler.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Slayingtherelationaldragon_A925/image_6.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; margin-left: 0px; border-top: 0px; margin-right: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Slayingtherelationaldragon_A925/image_thumb_2.png" width="706" height="511" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;And the scenario we have to deal with is this one:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Slayingtherelationaldragon_A925/image_8.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Slayingtherelationaldragon_A925/image_thumb_3.png" width="640" height="583" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;em&gt;Every &lt;/em&gt;single table in the ERD is touched by this screen.  Using a relational database, I would need something like the following to get all this data:&lt;/p&gt;  &lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;SELECT&lt;/span&gt; * 
&lt;span class="kwrd"&gt;FROM&lt;/span&gt;   Users 
&lt;span class="kwrd"&gt;WHERE&lt;/span&gt;  Id = @UserID 

&lt;span class="kwrd"&gt;SELECT&lt;/span&gt; * 
&lt;span class="kwrd"&gt;FROM&lt;/span&gt;   Subscriptions 
&lt;span class="kwrd"&gt;WHERE&lt;/span&gt;  UserId = @UserId 
       &lt;span class="kwrd"&gt;AND&lt;/span&gt; GETDATE() &lt;span class="kwrd"&gt;BETWEEN&lt;/span&gt; StartDate &lt;span class="kwrd"&gt;AND&lt;/span&gt; EndDate 

&lt;span class="kwrd"&gt;SELECT&lt;/span&gt;   &lt;span class="kwrd"&gt;MIN&lt;/span&gt;(CheckedBooks.CheckedAt), 
         Books.Name, 
         Books.ImageUrl, 
         &lt;span class="kwrd"&gt;AVG&lt;/span&gt;(Reviews.NumberOfStars), 
         GROUP_CONCAT(&lt;span class="str"&gt;', '&lt;/span&gt;,Authors.Name), 
         GROUP_CONCAT(&lt;span class="str"&gt;', '&lt;/span&gt;,Categories.Name) 
&lt;span class="kwrd"&gt;FROM&lt;/span&gt;     CheckedBooks 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; Books 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; BookId 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; BookToAuthors 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; BookId 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; Authors 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; AuthorId 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; Reviews 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; BookId 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; BooksCategories 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; BookId 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; Categories 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; CategoryId 
&lt;span class="kwrd"&gt;WHERE&lt;/span&gt;    CheckedBooks.UserId = @UserId 
&lt;span class="kwrd"&gt;GROUP&lt;/span&gt; &lt;span class="kwrd"&gt;BY&lt;/span&gt; BookId 

&lt;span class="kwrd"&gt;SELECT&lt;/span&gt;   Books.Name, 
         Books.ImageUrl, 
         &lt;span class="kwrd"&gt;AVG&lt;/span&gt;(Reviews.NumberOfStars), 
         GROUP_CONCAT(&lt;span class="str"&gt;', '&lt;/span&gt;,Authors.Name), 
         GROUP_CONCAT(&lt;span class="str"&gt;', '&lt;/span&gt;,Categories.Name) 
&lt;span class="kwrd"&gt;FROM&lt;/span&gt;     Books 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; BookToAuthors 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; BookId 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; Authors 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; AuthorId 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; Reviews 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; BookId 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; BooksCategories 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; BookId 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; Categories 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; CategoryId 
&lt;span class="kwrd"&gt;WHERE&lt;/span&gt;    BookId &lt;span class="kwrd"&gt;IN&lt;/span&gt; (&lt;span class="kwrd"&gt;SELECT&lt;/span&gt; BookID 
                    &lt;span class="kwrd"&gt;FROM&lt;/span&gt;   QueuedBooks 
                    &lt;span class="kwrd"&gt;WHERE&lt;/span&gt;  UserId = @UserId) 
&lt;span class="kwrd"&gt;GROUP&lt;/span&gt; &lt;span class="kwrd"&gt;BY&lt;/span&gt; BookId 

&lt;span class="kwrd"&gt;SELECT&lt;/span&gt;   Books.Name, 
         Books.ImageUrl, 
         &lt;span class="kwrd"&gt;AVG&lt;/span&gt;(Reviews.NumberOfStars), 
         GROUP_CONCAT(&lt;span class="str"&gt;', '&lt;/span&gt;,Authors.Name), 
         GROUP_CONCAT(&lt;span class="str"&gt;', '&lt;/span&gt;,Categories.Name) 
&lt;span class="kwrd"&gt;FROM&lt;/span&gt;     Books 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; BookToAuthors 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; BookId 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; Authors 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; AuthorId 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; Reviews 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; BookId 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; BooksCategories 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; BookId 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; Categories 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; CategoryId 
&lt;span class="kwrd"&gt;WHERE&lt;/span&gt;    BookId &lt;span class="kwrd"&gt;IN&lt;/span&gt; (&lt;span class="kwrd"&gt;SELECT&lt;/span&gt; BookID 
                    &lt;span class="kwrd"&gt;FROM&lt;/span&gt;   RecommendedBooks 
                    &lt;span class="kwrd"&gt;WHERE&lt;/span&gt;  UserId = @UserId) 
&lt;span class="kwrd"&gt;GROUP&lt;/span&gt; &lt;span class="kwrd"&gt;BY&lt;/span&gt; BookId 

&lt;span class="kwrd"&gt;SELECT&lt;/span&gt;   Books.Name, 
         Books.ImageUrl, 
         &lt;span class="kwrd"&gt;AVG&lt;/span&gt;(Reviews.NumberOfStars), 
         GROUP_CONCAT(&lt;span class="str"&gt;', '&lt;/span&gt;,Authors.Name), 
         GROUP_CONCAT(&lt;span class="str"&gt;', '&lt;/span&gt;,Categories.Name) 
&lt;span class="kwrd"&gt;FROM&lt;/span&gt;     Books 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; BookToAuthors 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; BookId 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; Authors 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; AuthorId 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; Reviews 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; BookId 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; BooksCategories 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; BookId 
         &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; Categories 
           &lt;span class="kwrd"&gt;ON&lt;/span&gt; CategoryId 
&lt;span class="kwrd"&gt;WHERE&lt;/span&gt;    Books.Name &lt;span class="kwrd"&gt;LIKE&lt;/span&gt; @&lt;span class="kwrd"&gt;search&lt;/span&gt; 
          &lt;span class="kwrd"&gt;OR&lt;/span&gt; Categories.Name &lt;span class="kwrd"&gt;LIKE&lt;/span&gt; @&lt;span class="kwrd"&gt;search&lt;/span&gt; 
          &lt;span class="kwrd"&gt;OR&lt;/span&gt; Reviews.Review &lt;span class="kwrd"&gt;LIKE&lt;/span&gt; @&lt;span class="kwrd"&gt;search&lt;/span&gt; 
&lt;span class="kwrd"&gt;GROUP&lt;/span&gt; &lt;span class="kwrd"&gt;BY&lt;/span&gt; BookId&lt;/pre&gt;
&lt;style type="text/css"&gt;&lt;![CDATA[
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;

&lt;p&gt;Yes, this is a fairly simplistic approach, without de-normalization, and I would never perform searches in this manner, but… notice how complex things are getting. For bonus points, look at the forth query, the queued books are &lt;em&gt;ordered&lt;/em&gt;, try to figure out how we can get the order in a meaningful way. I shudder to thing about the execution plan of this set of queries. Even if we ignore the last one that does full text searching in the slowest possible way. And this is just for bringing the data for a single screen, assuming that magically it will show up (you need to do a &lt;em&gt;lot &lt;/em&gt;of manipulation at the app level to make this happen). &lt;/p&gt;

&lt;p&gt;The problem is simple, our data access pattern and the data storage technology that we use are at odds with one another. While relational modeling dictate normalization, our actual data usage means that we don’t really deal with a single-row entity, with relatively rare access to associations, which is the best case for OLTP. Nor are we dealing with set based logic, which is the best case for OLAP / Relational based queries.&lt;/p&gt;

&lt;p&gt;Instead, we are dealing an aggregate that spans multiple tables, mostly because we have no other way to express lists and many to many associations in a relational database.&lt;/p&gt;

&lt;p&gt;Let us see how we could handle things if we were using a document or key/value database. We would have two aggregates, User and Book. &lt;/p&gt;

&lt;p&gt;GetUser(userId) –&amp;gt; would result in:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Slayingtherelationaldragon_A925/image_12.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Slayingtherelationaldragon_A925/image_thumb_5.png" width="224" height="546" /&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;We can now issue another query, to bring the associated books. GetBooks(153, 1337) would result in:&lt;/p&gt;

&lt;p&gt;&lt;a href="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Slayingtherelationaldragon_A925/image_16.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/Slayingtherelationaldragon_A925/image_thumb_7.png" width="248" height="514" /&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Note that the entire data structure is different, we haven’t just copied the normalized relational model, we have a totally different model. An aggregate (similar to DDD’s aggregate) is a single entity that &lt;em&gt;contains&lt;/em&gt; anything except other aggregates. References to other aggregates are allowed (from user to all the books), but most of the entity’s data is stored as a single value.&lt;/p&gt;

&lt;p&gt;That has several interesting implications. First, we need two queries to get the data for the screen. One to get the user’s data, and the second to get the books that we need to display. Reducing remote calls is something that you &lt;em&gt;really&lt;/em&gt; care about, and simplifying the queries to mere query by ids is going to have a significant effect as well.&lt;/p&gt;

&lt;p&gt;By changing the data storage technology, we also enforced a very rigid aggregate boundary. Transactions becomes much simpler as well, since most transactions will now modify only a single aggregate, which is a single operation, no matter how many actual operations we perform on that aggregate. And by tailoring the data structure that we use to match our needs, we have natural aggregate boundaries.&lt;/p&gt;

&lt;p&gt;The end result is a &lt;em&gt;far&lt;/em&gt; simpler method of working with the data. It may mean that we have to do more work upfront, but look at the type of work we would have to do in order to try to solve our problems using the relational model. I know what model &lt;em&gt;I&lt;/em&gt; would want for this sort of a problem.&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/11323.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2010/02/22/slaying-relational-dragons.aspx</guid>
            <pubDate>Mon, 22 Feb 2010 08:00:00 GMT</pubDate>
            <wfw:comment>http://ayende.com/Blog/comments/11323.aspx</wfw:comment>
            <comments>http://ayende.com/Blog/archive/2010/02/22/slaying-relational-dragons.aspx#feedback</comments>
            <slash:comments>79</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/11323.aspx</wfw:commentRss>
        </item>
        <item>
            <title>You see that database? OFF WITH HIS HEAD!</title>
            <link>http://ayende.com/Blog/archive/2009/10/30/you-see-that-database-off-with-his-head.aspx</link>
            <description>&lt;p&gt;&lt;a href="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/YouseethatdatabaseOFFWITHHISHEAD_D173/image_2.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; margin-left: 0px; border-top: 0px; margin-right: 0px; border-right: 0px" title="image" border="0" alt="image" align="right" src="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/YouseethatdatabaseOFFWITHHISHEAD_D173/image_thumb.png" width="328" height="321" /&gt;&lt;/a&gt; A while ago I was chatting with a friend that complained about a migration script timing out on his local machine. When I looked at the script, it was fairly obvious what was wrong:&lt;/p&gt;  &lt;blockquote&gt;   &lt;pre class="csharpcode"&gt;&lt;span class="kwrd"&gt;DECLARE&lt;/span&gt; @VideoID UNIQUEIDENTIFIER
&lt;span class="kwrd"&gt;DECLARE&lt;/span&gt; @NewID UNIQUEIDENTIFIER

&lt;span class="kwrd"&gt;DECLARE&lt;/span&gt; VideoCursor &lt;span class="kwrd"&gt;CURSOR&lt;/span&gt; READ_ONLY
&lt;span class="kwrd"&gt;FOR&lt;/span&gt;
&lt;span class="kwrd"&gt;SELECT&lt;/span&gt; ID &lt;span class="kwrd"&gt;FROM&lt;/span&gt; Video

&lt;span class="kwrd"&gt;OPEN&lt;/span&gt; VideoCursor

&lt;span class="kwrd"&gt;FETCH&lt;/span&gt; &lt;span class="kwrd"&gt;NEXT&lt;/span&gt; &lt;span class="kwrd"&gt;FROM&lt;/span&gt; VideoCursor
&lt;span class="kwrd"&gt;INTO&lt;/span&gt; @VideoID

&lt;span class="kwrd"&gt;WHILE&lt;/span&gt; &lt;span class="preproc"&gt;@@FETCH_STATUS&lt;/span&gt; = 0
&lt;span class="kwrd"&gt;BEGIN&lt;/span&gt;
    &lt;span class="kwrd"&gt;SET&lt;/span&gt; @NewID = NEWID()
   
    INSERT &lt;span class="kwrd"&gt;INTO&lt;/span&gt; Content (ID, Body, FormatBody, Plain)
        &lt;span class="kwrd"&gt;SELECT&lt;/span&gt; @NewID, ContentItem.Body, Video.FormatBody, Video.Plain
        &lt;span class="kwrd"&gt;FROM&lt;/span&gt; ContentItem
        &lt;span class="kwrd"&gt;INNER&lt;/span&gt; &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; Video
        &lt;span class="kwrd"&gt;ON&lt;/span&gt; Video.Id=ContentItem.ID
        &lt;span class="kwrd"&gt;WHERE&lt;/span&gt; Video.Id=@VideoID
   
    &lt;span class="kwrd"&gt;UPDATE&lt;/span&gt; Video &lt;span class="kwrd"&gt;SET&lt;/span&gt; ContentId=@NewID &lt;span class="kwrd"&gt;WHERE&lt;/span&gt; Video.Id=@VideoID
       
    &lt;span class="kwrd"&gt;UPDATE&lt;/span&gt; ThumbImage &lt;span class="kwrd"&gt;SET&lt;/span&gt; ContentId=@NewID &lt;span class="kwrd"&gt;WHERE&lt;/span&gt; Video_id=@VideoID
   
    &lt;span class="kwrd"&gt;FETCH&lt;/span&gt; &lt;span class="kwrd"&gt;NEXT&lt;/span&gt; &lt;span class="kwrd"&gt;FROM&lt;/span&gt; VideoCursor
    &lt;span class="kwrd"&gt;INTO&lt;/span&gt; @VideoID
END&lt;/pre&gt;
  &lt;style type="text/css"&gt;&lt;![CDATA[
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;&lt;/blockquote&gt;

&lt;p&gt;The script was using a cursor!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Every time you want a use a cursor, you must fast for three days while reading the memoires of Edgar F. Codd.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Cursors are evil!&lt;/p&gt;

&lt;p&gt;Let us see how we can make this work using set based logic, shall we?&lt;/p&gt;

&lt;blockquote&gt;
  &lt;pre class="csharpcode"&gt;INSERT &lt;span class="kwrd"&gt;INTO&lt;/span&gt; #TempContent 
&lt;span class="kwrd"&gt;SELECT&lt;/span&gt; newid() &lt;span class="kwrd"&gt;as&lt;/span&gt; NewId, Video.Id &lt;span class="kwrd"&gt;as&lt;/span&gt; OldId, ContentItem.Body, Video.FormatBody, Video.Plain
&lt;span class="kwrd"&gt;FROM&lt;/span&gt; ContentItem
&lt;span class="kwrd"&gt;INNER&lt;/span&gt; &lt;span class="kwrd"&gt;JOIN&lt;/span&gt; Video
&lt;span class="kwrd"&gt;ON&lt;/span&gt; Video.Id=ContentItem.ID
&lt;span class="kwrd"&gt;WHERE&lt;/span&gt; Video.Id=@VideoID

INSERT &lt;span class="kwrd"&gt;INTO&lt;/span&gt; Content(ID, Body, FormatBody, Plain)
&lt;span class="kwrd"&gt;SELECT&lt;/span&gt; NewId, ContentItem.Body, Video.FormatBody, Video.Plain

&lt;span class="kwrd"&gt;UPDATE&lt;/span&gt; Video 
&lt;span class="kwrd"&gt;SET&lt;/span&gt; ContentId=NewId 
&lt;span class="kwrd"&gt;FROM&lt;/span&gt; #TempContent
&lt;span class="kwrd"&gt;WHERE&lt;/span&gt; Video.Id=OldId

&lt;span class="kwrd"&gt;UPDATE&lt;/span&gt; ThumbImage 
&lt;span class="kwrd"&gt;SET&lt;/span&gt; ContentId=NewId 
&lt;span class="kwrd"&gt;FROM&lt;/span&gt; #TempContent
&lt;span class="kwrd"&gt;WHERE&lt;/span&gt; Video.Id=OldId

&lt;span class="kwrd"&gt;DROP&lt;/span&gt; &lt;span class="kwrd"&gt;TABLE&lt;/span&gt; #TempContent&lt;/pre&gt;
  &lt;style type="text/css"&gt;&lt;![CDATA[
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }]]&gt;&lt;/style&gt;&lt;/blockquote&gt;

&lt;p&gt;I can &lt;em&gt;assure&lt;/em&gt; you that this will work faster, read better, get parallelize by the database and in generally be better behaved than the previous version.&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/11185.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2009/10/30/you-see-that-database-off-with-his-head.aspx</guid>
            <pubDate>Fri, 30 Oct 2009 10:00:00 GMT</pubDate>
            <comments>http://ayende.com/Blog/archive/2009/10/30/you-see-that-database-off-with-his-head.aspx#feedback</comments>
            <slash:comments>29</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/11185.aspx</wfw:commentRss>
        </item>
        <item>
            <title>JAOO: More on Evolving the Key/Value Programming Model to a Higher Level from Billy Newport</title>
            <link>http://ayende.com/Blog/archive/2009/10/20/jaoo-more-on-evolving-the-keyvalue-programming-model-to-a.aspx</link>
            <description>&lt;p&gt;As I already mentioned, this presentation had me thinking. Billy presented a system called Redis, which is a Key/Value store which is intended for an attribute based storage.&lt;/p&gt;  &lt;p&gt;That means that storing something like User { Id = 123, Name = “billy”, Email = &lt;a href="mailto:“billy@example.org"&gt;“billy@example.org&lt;/a&gt;”} is actually stored as:&lt;/p&gt;  &lt;blockquote&gt;   &lt;div&gt;     &lt;pre style="border-bottom-style: none; padding-bottom: 0px; line-height: 12pt; border-right-style: none; background-color: #f4f4f4; margin: 0em; padding-left: 0px; width: 100%; padding-right: 0px; font-family: consolas, 'Courier New', courier, monospace; border-top-style: none; color: black; font-size: 8pt; border-left-style: none; overflow: visible; padding-top: 0px"&gt;{ &lt;span style="color: #006080"&gt;"uid:123:name"&lt;/span&gt;: &lt;span style="color: #006080"&gt;"billy"&lt;/span&gt; } 
{ &lt;span style="color: #006080"&gt;"uid:123:email"&lt;/span&gt;: &lt;span style="color: #006080"&gt;"billy@example.org"&lt;/span&gt; }
{ &lt;span style="color: #006080"&gt;"uname:billy"&lt;/span&gt;: &lt;span style="color: #006080"&gt;"123"&lt;/span&gt; } &lt;/pre&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;

&lt;p&gt;Each of those lines represent a different key/value pair in the Redis store. According to Billy, this has a lot of implications. On the advantage side, you get no schema and very easy support for just adding stuff as you go along. On the other hand, Redis supports not transactions and it is easy to “corrupt” the database during development (usually as a result of a programming bug).&lt;/p&gt;

&lt;p&gt;What actually bothered me the most was the implications on the number of remotes calls that are being made. The problem shows itself very well in this code sample, (a twitter clone), which show inserting a new twit:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;div&gt;
    &lt;pre style="border-bottom-style: none; padding-bottom: 0px; line-height: 12pt; border-right-style: none; background-color: #f4f4f4; margin: 0em; padding-left: 0px; width: 100%; padding-right: 0px; font-family: consolas, 'Courier New', courier, monospace; border-top-style: none; color: black; font-size: 8pt; border-left-style: none; overflow: visible; padding-top: 0px"&gt;&lt;span style="color: #0000ff"&gt;long&lt;/span&gt; postid = R.str_long.incr(&lt;span style="color: #006080"&gt;"nextPostId"&lt;/span&gt;); 
&lt;span style="color: #0000ff"&gt;long&lt;/span&gt; userId = PageUtils.getUserID(request); 
&lt;span style="color: #0000ff"&gt;long&lt;/span&gt; time = System.currentTimeMillis(); 
&lt;span style="color: #0000ff"&gt;string&lt;/span&gt; post = Long.toString(userId)+&lt;span style="color: #006080"&gt;"|"&lt;/span&gt; + Long.toString(time)+&lt;span style="color: #006080"&gt;"|"&lt;/span&gt;+status; 
R.c_str_str.set(&lt;span style="color: #006080"&gt;"p:"&lt;/span&gt;+Long.toString(postid), post);
List&amp;lt;&lt;span style="color: #0000ff"&gt;long&lt;/span&gt;&amp;gt; followersList = R.str_long.smembers(Long.toString(userId)+&lt;span style="color: #006080"&gt;":followers"&lt;/span&gt;); 
&lt;span style="color: #0000ff"&gt;if&lt;/span&gt;(followersList == &lt;span style="color: #0000ff"&gt;null&lt;/span&gt;) 
   followersList - &lt;span style="color: #0000ff"&gt;new&lt;/span&gt; ArrayList&amp;lt;Long&amp;gt;(); 
HashSet&amp;lt;&lt;span style="color: #0000ff"&gt;long&lt;/span&gt;&amp;gt; followerSet = &lt;span style="color: #0000ff"&gt;new&lt;/span&gt; HashSet&amp;lt;&lt;span style="color: #0000ff"&gt;long&lt;/span&gt;&amp;gt;(followersList); 
followerSet.add(userid); 
&lt;span style="color: #0000ff"&gt;long&lt;/span&gt; replyId = PageUtils.isReply(status); 
&lt;span style="color: #0000ff"&gt;if&lt;/span&gt;(replyId != -1) 
   followerSet.add(replyId); 
&lt;span style="color: #0000ff"&gt;for&lt;/span&gt;(&lt;span style="color: #0000ff"&gt;long&lt;/span&gt; i : followerSet) 
    R.str_long.lpush(Long.toString(i)+&lt;span style="color: #006080"&gt;":posts"&lt;/span&gt;, postid); 
&lt;span style="color: #008000"&gt;// -1 uid is global timeline &lt;/span&gt;
String globalKey = Long.toString(-l)+&lt;span style="color: #006080"&gt;":posts"&lt;/span&gt;; 
R.str_long.lpush(globalKey,postid); 
R.str_long.ltrim(globolKey, 200);&lt;/pre&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;

&lt;p&gt;I &lt;em&gt;really&lt;/em&gt; don’t like the API, mostly because it reminds me of C, but the conventions are pretty easy to figure out. &lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;R is the static gateway into the Redis API&lt;/li&gt;

  &lt;li&gt;str_long = store ong&lt;/li&gt;

  &lt;li&gt;c_str_str – store string and keep it in nearby cache&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The problem with this type of code is the number of remote calls and the &lt;em&gt;locality&lt;/em&gt; of those calls. With a typical sharded set of servers, you are going to have lots of calls going all &lt;em&gt;over&lt;/em&gt; the place. And when you get into people that have thousands and millions of followers, the application is simply going to die.&lt;/p&gt;

&lt;p&gt;A better solution is required. Billy suggested using async programming or sending code to the data store to execute there.&lt;/p&gt;

&lt;p&gt;I have a different variant on the solution.&lt;/p&gt;

&lt;p&gt;We will start from the assumption that we &lt;em&gt;really&lt;/em&gt; want to reduce remote calls, and that the system performance in the face of large amount of writes (without impacting reads) is important. The benefits of using something like Redis is that it is very easy to get started, very easy to change things around and great for rapid development mode. We want to keep that for now, so I am going to focus on a solution based on the same premise.&lt;/p&gt;

&lt;p&gt;The first thing to go is the notion that a key can sit anywhere that it wants. In a key/value store, it is important to be able to control locality of reference. We change the key format so it is now: [server key]@[local key]. What does this mean? It means that for the previously mentioned user, this is the format it will be stored as:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;div&gt;
    &lt;pre style="border-bottom-style: none; padding-bottom: 0px; line-height: 12pt; border-right-style: none; background-color: #f4f4f4; margin: 0em; padding-left: 0px; width: 100%; padding-right: 0px; font-family: consolas, 'Courier New', courier, monospace; border-top-style: none; color: black; font-size: 8pt; border-left-style: none; overflow: visible; padding-top: 0px"&gt;{ &lt;span style="color: #006080"&gt;"uid:123@name"&lt;/span&gt;: &lt;span style="color: #006080"&gt;"billy"&lt;/span&gt; } 
{ &lt;span style="color: #006080"&gt;"uid:123@email"&lt;/span&gt;: &lt;span style="color: #006080"&gt;"billy@example.org"&lt;/span&gt; }
{ &lt;span style="color: #006080"&gt;"uname@billy"&lt;/span&gt;: &lt;span style="color: #006080"&gt;"123"&lt;/span&gt; } &lt;/pre&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;

&lt;p&gt;We use the first part of the key (before the @) to find the appropriate server. This means that everything with a prefix of “uid:123” is known to reside on the same server. This allow you to do things like transactions on a single operation of setting multiple keys.&lt;/p&gt;

&lt;p&gt;Once we have that, we can start adding to the API. Instead of getting a single key at a time, we can get a set of values in one remote call. That has the potential of significantly reducing the number of remote calls we will make.&lt;/p&gt;

&lt;p&gt;Next, we need to consider repeated operations. By that I mean anything where we have a look in which we call to the store. That is a &lt;em&gt;killer &lt;/em&gt;when you are talking about any data of significant size. We need to find a good solution for this.&lt;/p&gt;

&lt;p&gt;Billy suggested sending JRuby script to the server (or similar) and executing it there, saving the network roundtrips. Which this is certainly possible, I think it would be a mistake. I have a much simpler solution. Teach the data store about repeated operations. Let us take as a good example the copying that we are doing of a new twit to all your followers. Instead of reading the entire list of followers into memory, and then writing the status to every single one of them, let us do something different:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;div&gt;
    &lt;pre style="border-bottom-style: none; padding-bottom: 0px; line-height: 12pt; border-right-style: none; background-color: #f4f4f4; margin: 0em; padding-left: 0px; width: 100%; padding-right: 0px; font-family: consolas, 'Courier New', courier, monospace; border-top-style: none; color: black; font-size: 8pt; border-left-style: none; overflow: visible; padding-top: 0px"&gt;Redis.PushToAllListFoundIn(&lt;span style="color: #006080"&gt;"uid:"&lt;/span&gt;+user_id+&lt;span style="color: #006080"&gt;"@followers"&lt;/span&gt;, status, &lt;span style="color: #006080"&gt;"{0}@posts"&lt;/span&gt;);&lt;/pre&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;

&lt;p&gt;I am using the .NET conventions here because otherwise I would go mad. As you can see, we instruct Redis to go to a particular list, and copy the status that we pass it to all the keys found in the list ( after formatting the key with the pattern). This gives the data store enough information about this to be able to optimize this operation considerably.&lt;/p&gt;

&lt;p&gt;With just these few changes, I think that you gain enormously, and you retain the very simple model of using a key/value store.&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/11163.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2009/10/20/jaoo-more-on-evolving-the-keyvalue-programming-model-to-a.aspx</guid>
            <pubDate>Tue, 20 Oct 2009 00:28:00 GMT</pubDate>
            <comments>http://ayende.com/Blog/archive/2009/10/20/jaoo-more-on-evolving-the-keyvalue-programming-model-to-a.aspx#feedback</comments>
            <slash:comments>6</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/11163.aspx</wfw:commentRss>
        </item>
        <item>
            <title>Soft Deletes aren&amp;rsquo;t Append Only model</title>
            <link>http://ayende.com/Blog/archive/2009/09/06/soft-deletes-arenrsquot-append-only-model.aspx</link>
            <description>&lt;p&gt;There seems to be some confusion regarding my post about &lt;a href="http://ayende.com/Blog/archive/2009/08/30/avoid-soft-deletes.aspx"&gt;soft deletes&lt;/a&gt;, in particular, &lt;a href="http://bigjimindc.blogspot.com/2009/08/temporal-database-design-soft-deletes.html"&gt;people brought up the idea of append only models&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;I had the chance to work on both types of systems, and I can tell you that I would much rather work with append only model than with soft deletes. The append only model means that you can only ever insert, never delete or update.&lt;/p&gt;  &lt;p&gt;Thing about the way your bank account works. If I had the clerk transfer money from one account to another, and he had a typo and send tens times the amount that I wanted, the bank will not “delete” the transaction. What will happen is that there will be a separate transaction, canceling the first one. &lt;/p&gt;  &lt;p&gt;There are several reasons for going with this approach, which Jim has brought up in &lt;a href="http://bigjimindc.blogspot.com/2009/08/temporal-database-design-soft-deletes.html"&gt;his post&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;ul&gt;     &lt;li&gt;automatic audit logging, since nothing is ever UPDATE'd or DELETE'd, you've got a constant trail of changes&lt;/li&gt;      &lt;li&gt;automatic support for infinite undo/roll-back support of data, as you simply load a prior version and then save as usual&lt;/li&gt;      &lt;li&gt;automatic support for labeling of versions, much like in source/version control systems, at an individual record level, table level, "aggregate root level", or database level&lt;/li&gt;      &lt;li&gt;automatic support for "back querying" a system, in search of what the situation looked like last month, last year, etc. (though raising this "aspect", as in AOP, to the ORM level would be crucial)&lt;/li&gt;   &lt;/ul&gt; &lt;/blockquote&gt;  &lt;p&gt;As I said, this makes things much simpler from a lot of aspects. It does mean that you have a more complex data model (because all associations are now using: Id + max(version) ), but that is manageable.&lt;/p&gt;  &lt;p&gt;But, as I said, there is a distinct difference between that and soft deletes. Soft deletes, as I refer to them, portend to IsDeleted columns that perform a logical deletion in the database. I don’t really like those, and I explained my reasoning in my previous post.&lt;/p&gt;  &lt;p&gt;Append Only models represent some complexity with regards to managing things, but in general, they force you to think in a very different fashion than CUD models. For one thing, you are almost always going to have a different reporting model, instead of trying to query the append only model directly (which gets to be complicated).&lt;/p&gt;  &lt;p&gt;There is one thing that I want to emphasis, using Append Only model &lt;em&gt;should&lt;/em&gt; be reflected in your API. Trying to abstract that away is going to lead to a &lt;em&gt;world&lt;/em&gt; of pain.&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/11083.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2009/09/06/soft-deletes-arenrsquot-append-only-model.aspx</guid>
            <pubDate>Sun, 06 Sep 2009 07:50:00 GMT</pubDate>
            <comments>http://ayende.com/Blog/archive/2009/09/06/soft-deletes-arenrsquot-append-only-model.aspx#feedback</comments>
            <slash:comments>14</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/11083.aspx</wfw:commentRss>
        </item>
        <item>
            <title>Buying the pot as a way of winning the database wars?</title>
            <link>http://ayende.com/Blog/archive/2009/04/20/buying-the-pot-as-a-way-of-winning-the-database.aspx</link>
            <description>&lt;p&gt;The news are just out, &lt;a href="http://www.chron.com/disp/story.mpl/ap/top/all/6382172.html"&gt;Oracle is buying Sun&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;This is especially interesting since Oracle has previously bought InnoDB (a key component for MySQL in the enterprise) and Sun has bought MySQL. This means that, off the top of my head, Oracle is now the owner of the following database products:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Oracle DB&lt;/li&gt;    &lt;li&gt;BerkleyDB&lt;/li&gt;    &lt;li&gt;MySQL &amp;amp; InnoDB&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;I am pretty sure that they have others, but even so, that is quite a respectable list, I should think.&lt;/p&gt;  &lt;p&gt;Of course, with Sun, Oracle is also getting Java…&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/10885.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2009/04/20/buying-the-pot-as-a-way-of-winning-the-database.aspx</guid>
            <pubDate>Mon, 20 Apr 2009 14:00:03 GMT</pubDate>
            <comments>http://ayende.com/Blog/archive/2009/04/20/buying-the-pot-as-a-way-of-winning-the-database.aspx#feedback</comments>
            <slash:comments>16</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/10885.aspx</wfw:commentRss>
        </item>
        <item>
            <title>Repository is the new Singleton</title>
            <link>http://ayende.com/Blog/archive/2009/04/17/repository-is-the-new-singleton.aspx</link>
            <description>&lt;p&gt;I mentioned in passing that I don’t like the Repository pattern anymore much, and gotten a lot of responses to that. This is the answering post, and yes, the title was  chosen to get a rise out of you.&lt;/p&gt;  &lt;p&gt;There are actually two separate issues that needs to be handled here. One of them is my issues with the actual pattern and the second is the pattern &lt;em&gt;usage&lt;/em&gt;. There most commonly used definition for Repository is defined in &lt;a href="http://martinfowler.com/eaaCatalog/repository.html"&gt;Patterns of Enterprise Application Architecture&lt;/a&gt;:&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;A system with a complex domain model often benefits from a layer, such as the one provided by Data Mapper, that isolates domain objects from details of the database access code. In such systems it can be worthwhile to build another layer of abstraction over the mapping layer where query construction code is concentrated. This becomes more important when there are a large number of domain classes or heavy querying. In these cases particularly, adding this layer helps minimize duplicate query logic.&lt;/p&gt;    &lt;p&gt;A Repository mediates between the domain and data mapping layers, acting like an in-memory domain object collection. Client objects construct query specifications declaratively and submit them to Repository for satisfaction. Objects can be added to and removed from the Repository, as they can from a simple collection of objects, and the mapping code encapsulated by the Repository will carry out the appropriate operations behind the scenes. Conceptually, a Repository encapsulates the set of objects persisted in a data store and the operations performed over them, providing a more object-oriented view of the persistence layer. Repository also supports the objective of achieving a clean separation and one-way dependency between the domain and data mapping layers.&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;Note that while the actual pattern description defined in PoEAA and DDD are very nearly identical, the actual &lt;a href="http://domaindrivendesign.org/discussion/messageboardarchive/Repositories.html"&gt;reasoning&lt;/a&gt; behind it is different, and the DDD repository is limited to aggregate roots only.&lt;/p&gt;  &lt;p&gt;So, what is the problem with that?&lt;/p&gt;  &lt;p&gt;The problem with this pattern is that it totally ignores the existence of mature persistence technologies, such as NHibernate. NHibernate already provides an illusion of in memory access, in fact, that is its sole reason of existing. Declarative queries, check. OO view on the persistence store, check. One way dependency between the domain and the data store, check.&lt;/p&gt;  &lt;p&gt;So, what do I gain by using the repository pattern when I already have NHibernate (or similar, most OR/M have matching capabilities by now)?&lt;/p&gt;  &lt;p&gt;Not much, really, expect as additional abstraction. More than that, the details of persistence storage are:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Complex&lt;/li&gt;    &lt;li&gt;Context sensitive&lt;/li&gt;    &lt;li&gt;Important&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;Trying to hide that behind a repository interface usually lead us to a repository that has method like:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;FindCustomer(id)&lt;/li&gt;    &lt;li&gt;FindCustomerWithAddresses(id)&lt;/li&gt;    &lt;li&gt;FindCustomerWith..&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;It get worse when you have complex search criteria &lt;em&gt;and&lt;/em&gt; complex fetch plan. Then you are stuck either creating a method per each combination that you use or generalizing that. Generalizing that only means that you now have an additional abstraction that usually map pretty closely to the persistent storage that you use.&lt;/p&gt;  &lt;p&gt;From my perspective, that is additional code that doesn’t have to be written.&lt;/p&gt;  &lt;p&gt;Wait, I can hear you say, but repositories encapsulate queries, and removing query logic duplication is one of the reasons for them in the first place.&lt;/p&gt;  &lt;p&gt;Well, yes, but encapsulation of queries should be done in the repository. Queries are &lt;em&gt;complex, &lt;/em&gt;and you want to encapsulate them in their own object. In most cases, I have something like this:&lt;/p&gt;  &lt;p&gt;&lt;a href="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/RepositoryisthenewSingleton_10C1A/image_6.png"&gt;&lt;img title="image" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="205" alt="image" src="http://ayende.com/Blog/images/ayende_com/Blog/WindowsLiveWriter/RepositoryisthenewSingleton_10C1A/image_thumb_2.png" width="259" border="0" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;GetQuery takes an ISession an return ICriteria, which mean that my code gets the chance to set paging, ordering, fetching strategies, etc. That is not the responsibility of the query object, and trying to hide it only add additional abstraction that doesn’t actually &lt;em&gt;give&lt;/em&gt; me anything.&lt;/p&gt;  &lt;p&gt;I mentioned that I have two problems with the repository pattern, the second being the way it is being used.&lt;/p&gt;  &lt;p&gt;Quite frankly, and here I fully share the blame, the Repository pattern is popular. A lot of people use it, mostly because of the DDD association. I am currently in the opinion that DDD should be approached with caution, since if you don’t actually need it (and have the prerequisites for it, such as business expert to work closely with or an app that can actually benefit from it), it is probably going to be more painful to try using DDD than without.&lt;/p&gt;  &lt;p&gt;More than that, the &lt;em&gt;way&lt;/em&gt; that most people use a Repository more closely follows the DAO pattern, not the Repository pattern. But Repository sounds more cool, so they call it that.&lt;/p&gt;  &lt;p&gt;My current approach for data access now is:&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;When using a database, use NHibernate’s ISession directly&lt;/li&gt;    &lt;li&gt;Encapsulate complex queries into query objects that construct an ICriteria query that I can get and manipulate further&lt;/li&gt;    &lt;li&gt;When using something other than a database, create a DAO for that, respecting the underlying storage implementation &lt;/li&gt;    &lt;li&gt;Don’t try to &lt;a href="http://davybrion.com/blog/2009/04/educate-developers-instead-of-protecting-them/"&gt;protect developers&lt;/a&gt;&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;Let us see how many call for my lynching we get now…&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/10877.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2009/04/17/repository-is-the-new-singleton.aspx</guid>
            <pubDate>Fri, 17 Apr 2009 16:03:57 GMT</pubDate>
            <comments>http://ayende.com/Blog/archive/2009/04/17/repository-is-the-new-singleton.aspx#feedback</comments>
            <slash:comments>76</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/10877.aspx</wfw:commentRss>
        </item>
        <item>
            <title>Implementing a document database: simple queries</title>
            <link>http://ayende.com/Blog/archive/2009/03/25/implementing-a-document-database-simple-queries.aspx</link>
            <description>&lt;p&gt;And now this passes:&lt;/p&gt;  &lt;blockquote&gt;   &lt;div&gt;     &lt;pre style="padding-right: 0px; padding-left: 0px; font-size: 8pt; padding-bottom: 0px; margin: 0em; overflow: visible; width: 100%; color: black; border-top-style: none; line-height: 12pt; padding-top: 0px; font-family: consolas, 'Courier New', courier, monospace; border-right-style: none; border-left-style: none; background-color: #f4f4f4; border-bottom-style: none"&gt;&lt;span style="color: #0000ff"&gt;public&lt;/span&gt; &lt;span style="color: #0000ff"&gt;class&lt;/span&gt; PerformingQueries
{
    &lt;span style="color: #0000ff"&gt;const&lt;/span&gt; &lt;span style="color: #0000ff"&gt;string&lt;/span&gt; query = &lt;span style="color: #006080"&gt;@"
var pagesByTitle = 
from doc in docs
where doc.type == "&lt;/span&gt;&lt;span style="color: #006080"&gt;"page"&lt;/span&gt;&lt;span style="color: #006080"&gt;"
select new { Key = doc.title, Value = doc.content, Size = (int)doc.size };
"&lt;/span&gt;;

    [Fact]
    &lt;span style="color: #0000ff"&gt;public&lt;/span&gt; &lt;span style="color: #0000ff"&gt;void&lt;/span&gt; Can_query_json()
    {
        &lt;span style="color: #0000ff"&gt;var&lt;/span&gt; serializer = &lt;span style="color: #0000ff"&gt;new&lt;/span&gt; JsonSerializer();
        &lt;span style="color: #0000ff"&gt;var&lt;/span&gt; docs = (JArray)serializer.Deserialize(
                &lt;span style="color: #0000ff"&gt;new&lt;/span&gt; JsonTextReader(
                    &lt;span style="color: #0000ff"&gt;new&lt;/span&gt; StringReader(
                        &lt;span style="color: #006080"&gt;@"[
{'type':'page', title: 'hello', content: 'foobar', size: 2},
{'type':'page', title: 'there', content: 'foobar 2', size: 3},
{'type':'revision', size: 4}
]"&lt;/span&gt;)));
        &lt;span style="color: #0000ff"&gt;var&lt;/span&gt; compiled = &lt;span style="color: #0000ff"&gt;new&lt;/span&gt; LinqTransformer(query, &lt;span style="color: #006080"&gt;"docs"&lt;/span&gt;, &lt;span style="color: #0000ff"&gt;typeof&lt;/span&gt;(JsonDynamicObject)).Compile();
        &lt;span style="color: #0000ff"&gt;var&lt;/span&gt; compiledQuery = (AbstractViewGenerator&amp;lt;JsonDynamicObject&amp;gt;)Activator.CreateInstance(compiled);
        &lt;span style="color: #0000ff"&gt;var&lt;/span&gt; actual = compiledQuery.Execute(docs.Select(x =&amp;gt; &lt;span style="color: #0000ff"&gt;new&lt;/span&gt; JsonDynamicObject(x)))
            .Cast&amp;lt;&lt;span style="color: #0000ff"&gt;object&lt;/span&gt;&amp;gt;().ToArray();
        &lt;span style="color: #0000ff"&gt;var&lt;/span&gt; expected = &lt;span style="color: #0000ff"&gt;new&lt;/span&gt;[]
        {
            &lt;span style="color: #006080"&gt;"{ Key = hello, Value = foobar, Size = 2 }"&lt;/span&gt;,
            &lt;span style="color: #006080"&gt;"{ Key = there, Value = foobar 2, Size = 3 }"&lt;/span&gt;
        };

        Assert.Equal(expected.Length, actual.Length);
        &lt;span style="color: #0000ff"&gt;for&lt;/span&gt; (&lt;span style="color: #0000ff"&gt;var&lt;/span&gt; i = 0; i &amp;lt; expected.Length; i++)
        {
            Assert.Equal(expected[i], actual[i].ToString());
        }
    }
}&lt;/pre&gt;
  &lt;/div&gt;
&lt;/blockquote&gt;

&lt;p&gt;You wouldn’t &lt;em&gt;believe &lt;/em&gt;how much effort it took, and all in all, implementing this is about 500 lines of code or so.&lt;/p&gt;&lt;img src="http://ayende.com/Blog/aggbug/10838.aspx" width="1" height="1" /&gt;</description>
            <dc:creator>Ayende Rahien</dc:creator>
            <guid>http://ayende.com/Blog/archive/2009/03/25/implementing-a-document-database-simple-queries.aspx</guid>
            <pubDate>Wed, 25 Mar 2009 03:57:27 GMT</pubDate>
            <comments>http://ayende.com/Blog/archive/2009/03/25/implementing-a-document-database-simple-queries.aspx#feedback</comments>
            <slash:comments>12</slash:comments>
            <wfw:commentRss>http://ayende.com/Blog/comments/commentRss/10838.aspx</wfw:commentRss>
        </item>
    </channel>
</rss>