Ayende @ Rahien

Ayende @ Rahienhttp://ayende.comAyende @ RahienCopyright (C) Ayende Rahien 2004 - 2021 (c) 202660evereq commented on Database assisted denormalization@Joe "that SQL statement with the grouping is not 'too complex'. People learn that on the first day in a SQL class and its not database specific as long as you use ANSI sql which is what they all can understand." I don't tell "too complex to understand :D" (I see before queries 10 pages long and understand them very well...)... And No, it IS TOO complex to SCALE ones you start using partitioning for example :) and will have 100s millions of records in DB (think how your grouping will work in that case)! Or if you will need to federate some tables from another storage? etc. About "database specific" - I mean approaches with Materialized Views etc, not just ANSI sql features sure thing :) Try for example to do it in MySQL your hands (because of no build-in support for Materialized Views in MySQL for now!) http://ayende.com/4601/database-assisted-denormalization#comment25http://ayende.com/4601/database-assisted-denormalization#comment25Thu, 02 Sep 2010 10:25:15 GMTAjai Shankar commented on Database assisted denormalizationOk, here is my take on this - any aggregation (map/reduce) you do in a document database, there is nothing preventing you from doing the same in a relational world. In our website, we analyze prospect behavior and there might be decisions made on the the number of vsiits etc, so we build separate tables that aggregates the #times somebody has visited, #times they submitted a page, saw terms & conditions etc. Obviously these are had to compute real time even with proper indexes. Good example is in the home page you want to show not the #posts, but the #posts per month, or per category... Also not for such simple aggregations, but data warehouse solutions like analysis services play a huge role in pre-computing these aggregations by different dimensions and allow to drill down extremely fast... Ultimately all it matters is the right tool for the job :-) Ajai http://ayende.com/4601/database-assisted-denormalization#comment24http://ayende.com/4601/database-assisted-denormalization#comment24Fri, 27 Aug 2010 18:45:07 GMTFrans Bouma commented on Database assisted denormalization> This had the exact same behavior as the first query (13 seconds and the suggestion for adding the index). That's because the SQL of the view is embedded in the query at the spot where you name the view, so it's not using some special shortcut, because it's a view. I also agree with others here, that NOEXPAND should be used to trigger the usage of the index. http://ayende.com/4601/database-assisted-denormalization#comment23http://ayende.com/4601/database-assisted-denormalization#comment23Tue, 24 Aug 2010 08:30:02 GMTSet commented on Database assisted denormalizationmeisinger speaks the truth. Don't think instead of SQL Server, never do. And for missing index hints, these features already exist in SQL Server 2005 but less direct ( DMVs ) or via the dashboard. http://ayende.com/4601/database-assisted-denormalization#comment22http://ayende.com/4601/database-assisted-denormalization#comment22Tue, 24 Aug 2010 05:59:53 GMTTapio Kulmala commented on Database assisted denormalizationThe same doc also says : "Only Enterprise and Developer editions support automatic query-to-indexed-view matching. Reference the indexed view by name and include the NOEXPAND hint to have the query processor use the indexed view in all other editions." http://ayende.com/4601/database-assisted-denormalization#comment21http://ayende.com/4601/database-assisted-denormalization#comment21Tue, 24 Aug 2010 04:23:01 GMTPure Krome commented on Database assisted denormalization@Ayende - I use indexed views a bit (i don' t have many, but I use them where I can) because they are abstracting a lot of scehmas into some really useful forms. ANYWAYS - a massive trap for young players (It's a Trap!) is that indexed views do NOT use the index (WTF?) on SQL SERVER EXPRESS or STANDARD editions without manually specifying WITH (NOEXPAND) eg. select Id, Title, Subtitle, PostCount from BlogsWithPostCount WITH (NOEXPAND) where Id = 365819 It's one of the 'bonuses' when u pay for an enterprise, plus plus plus edition of SQL SERVER. Try that query now - 0 secs again? Proof: "You must use NOEXPAND in all versions of SQL Server other than Developer and Enterprise editions to have SQL Server process a query against an indexed view directly. " MSDN docs: [http://msdn.microsoft.com/en-us/library/dd171921](http://msdn.microsoft.com/en-us/library/dd171921)(SQL.100).aspx IT'S A TRAP! http://ayende.com/4601/database-assisted-denormalization#comment20http://ayende.com/4601/database-assisted-denormalization#comment20Tue, 24 Aug 2010 00:24:36 GMTmattmc3 commented on Database assisted denormalization@meisinger - Agreed. Especially since this post doesn't detail the actual performance benchmarks of the original query, or whether simpler strategies like verifying/adding indexes were tried. Premature performance optimization is the root of all kinds of evil. http://ayende.com/4601/database-assisted-denormalization#comment19http://ayende.com/4601/database-assisted-denormalization#comment19Mon, 23 Aug 2010 21:07:25 GMTmeisinger commented on Database assisted denormalizationmanaging the count yourself will always lead you into a corner when considering simple inserts, updated, and deletes managing the count yourself is fine when you get into bulk operations where some posts will be inserted and some will be deleted... complexity arises then take into considerations where/when something fails... i would much rather take the hit for the aggregate, index the tables properly and give myself a chance to enhance or change the criteria for "count" than attempting to manage it myself through some column where no body knows what "count" means just my two cents http://ayende.com/4601/database-assisted-denormalization#comment18http://ayende.com/4601/database-assisted-denormalization#comment18Mon, 23 Aug 2010 19:53:38 GMTTapio Kulmala commented on Database assisted denormalization... and add the postcount into IDX_BlogsWithPostCount http://ayende.com/4601/database-assisted-denormalization#comment17http://ayende.com/4601/database-assisted-denormalization#comment17Mon, 23 Aug 2010 19:05:54 GMTTapio Kulmala commented on Database assisted denormalizationWhen you tested this, did you execute dbcc freeproccache and dbcc dropcleanbuffers before each select? http://ayende.com/4601/database-assisted-denormalization#comment16http://ayende.com/4601/database-assisted-denormalization#comment16Mon, 23 Aug 2010 18:54:16 GMTtobi commented on Database assisted denormalizationUnfortunately many people do not know how to get extreme performance from sql server. There is a handful of tricks that seemingly nobody used but that are absolute killers. Materialized views and covering nonclustered indexes (which are equivalent to a second _clustered_ index!) are the most underused features in my opinion. http://ayende.com/4601/database-assisted-denormalization#comment15http://ayende.com/4601/database-assisted-denormalization#comment15Mon, 23 Aug 2010 15:11:45 GMTJoe commented on Database assisted denormalizationIts rudimentary SQL to have an index on a table with 20 million rows. I don't understand why you would be trying to do it any other way than have that index. If you're using a SQL database then it should be used as recommended. evereq, that SQL statement with the grouping is not 'too complex'. People learn that on the first day in a SQL class and its not database specific as long as you use ANSI sql which is what they all can understand. http://ayende.com/4601/database-assisted-denormalization#comment14http://ayende.com/4601/database-assisted-denormalization#comment14Mon, 23 Aug 2010 15:01:00 GMTjames commented on Database assisted denormalizationlike tobi said, use the with(noexpand) hint. the results of the entire view (including the aggregate) was persisted when you created a clustered index. however, by default, the query optimizer will expand the view first, and if the estimated plan it finds falls withing a certain threshold, it will just use it (to save time in the optimizer). by using the "noexpand" hint, you tell the optimizer to skip expanding the query and just use the clustered index. http://ayende.com/4601/database-assisted-denormalization#comment13http://ayende.com/4601/database-assisted-denormalization#comment13Mon, 23 Aug 2010 14:22:27 GMTAndrew Borodin commented on Database assisted denormalizationNonclustered index is perfect for calculating such aggregates. You do not need to read actual data, just to count rows. Nonclustered indexes are even faster then clustered for this. select dbo.Blogs.Id, dbo.Blogs.Title, dbo.Blogs.Subtitle, (select COUNT(*) from Posts where Posts.BlogId = Blogs.Id AND Posts.Something is not null) as PostCount from dbo.Blogs I suppose, this query will be much more slower, because you actually need to read all posts many times (due to clustered vs nonclustered index fragmentation) http://ayende.com/4601/database-assisted-denormalization#comment12http://ayende.com/4601/database-assisted-denormalization#comment12Mon, 23 Aug 2010 13:03:18 GMTAriel commented on Database assisted denormalizationGood post. Thanks. http://ayende.com/4601/database-assisted-denormalization#comment11http://ayende.com/4601/database-assisted-denormalization#comment11Mon, 23 Aug 2010 12:02:05 GMTmattmc3 commented on Database assisted denormalizationIn my experience, COUNT(*) queries on a properly index field are not too expensive. I suspect that you over-thought this and made this more complicated than it needed to be. Put a NONCLUSTERED index on the BlogID field in your Posts table, and perhaps compute statistics as well. Then, let us know the performance results of your original query: select dbo.Blogs.Id, dbo.Blogs.Title, dbo.Blogs.Subtitle, (select COUNT(*) from Posts where Posts.BlogId = Blogs.Id) as PostCount from dbo.Blogs http://ayende.com/4601/database-assisted-denormalization#comment10http://ayende.com/4601/database-assisted-denormalization#comment10Mon, 23 Aug 2010 11:37:36 GMTscooletz commented on Database assisted denormalizationIf you considered a query with a few more queries, I'd propose saving a denormalized view of your entity (with all the counts, calculated during saving) in another table, each time, you save your basic entity. Then, for query purposes, use the second table. http://ayende.com/4601/database-assisted-denormalization#comment9http://ayende.com/4601/database-assisted-denormalization#comment9Mon, 23 Aug 2010 10:44:07 GMTtobi commented on Database assisted denormalizationI have experimented with indexed views a lot. The automatic matching is very fragile and uncapable If you add the with (noexpand) hint your query will fly. I found the following limitations on automatic indexed view maching: Max 2 joins, only simple predicates with rules not fully known to me. Now we have another limitation. Very annoying because those views are so damn powerful. I hope we get stacked indexed views in the next release which are going to rock big time. http://ayende.com/4601/database-assisted-denormalization#comment8http://ayende.com/4601/database-assisted-denormalization#comment8Mon, 23 Aug 2010 10:39:52 GMTevereq commented on Database assisted denormalizationha! ;-) P.S. I just not like complex queries really, because of possible federation / partitioning and BAD support of complex queries in such cases (especially with indexing efficiently etc)... And because many projects after some time, probably will start using sharding / federation anyway, I really afraid how you will scale with selected approach in long run... Again we don't speak as I see about few blogs web site :) We speak more as I see in terms of some SaaS solution with whole a LOT of blogs (in your example > 1 milion) and > 20 mil posts that growing :D http://ayende.com/4601/database-assisted-denormalization#comment7http://ayende.com/4601/database-assisted-denormalization#comment7Mon, 23 Aug 2010 10:07:43 GMTAyende Rahien commented on Database assisted denormalizationEvereq, Oh, okay, phew! I thought you were suggesting scanning the data in the cache to calculate it. http://ayende.com/4601/database-assisted-denormalization#comment6http://ayende.com/4601/database-assisted-denormalization#comment6Mon, 23 Aug 2010 09:59:19 GMTevereq commented on Database assisted denormalization@Ayende Rahien. What I mean is that because you have so big amount of blogs / posts you probably use cache, right? So probably you anyway spend some time to maintain vector cache for each blog posts, right (to display on blog page all latest posts for given blog)? If so, why not to maintain as well following structure in your cache hashtable: Key | Value (count of posts) blog_count_by_id_4594594954 | 54 blog_count_by_id_9695695950 | 21 And if you maintain it in cache, it will take you exactly 1ms (or even less) to get count for blogId = 4594594954 :) which will be 54 in our example :) Also note that Memcached support "atomic" increment, so you even not need to issue locks etc :) After working some time with high loaded / huge data in storage systems, I prefer to have all data like this in cache anyway, and go to database only for "backup" when cache become invalid for some reason... Sure this apply only when you build something "big" :) If you have less load (both storage size and amount of transactions per minute) probably I think it's best way to go with simple query and just carefully select indexes in database like you done in post... But think for a minute if you use federation? What will be with your query? etc... http://ayende.com/4601/database-assisted-denormalization#comment5http://ayende.com/4601/database-assisted-denormalization#comment5Mon, 23 Aug 2010 09:57:55 GMTMatt commented on Database assisted denormalizationYour other option is to implement a trigger on the Posts table that updates the blog table whenever a post changes. You could then adjust your ORM so the PostCount value on the Blogs table is a read-only field, and that would work. I'm not a huge fan of triggers personally as I belive they're a good way to "hide" logic, but I thought I'd throw it in there. http://ayende.com/4601/database-assisted-denormalization#comment4http://ayende.com/4601/database-assisted-denormalization#comment4Mon, 23 Aug 2010 09:37:57 GMTAyende Rahien commented on Database assisted denormalizationEvereq, Show me the count that count the quantity in the cache. Oh, and remember to add 1 ms overhead for every memcached call, see where you get there. http://ayende.com/4601/database-assisted-denormalization#comment3http://ayende.com/4601/database-assisted-denormalization#comment3Mon, 23 Aug 2010 09:36:02 GMTHoyaBaptiste commented on Database assisted denormalizationIs it safe to say that on similar big tables (high record count), there should be an index on foreign keys? SQL Server does not do this automatically. http://ayende.com/4601/database-assisted-denormalization#comment2http://ayende.com/4601/database-assisted-denormalization#comment2Mon, 23 Aug 2010 09:34:28 GMTevereq commented on Database assisted denormalization"I can always add a PostCount property to the Blogs table, but that would require me to manage that myself" - yep and this will be probably most simple, universal (will work with any database) and most fast (from performance point of view) solution in RDMBS ! And it's very easy to do this actually! I simply can't accept as solution any query you write in post, sorry, because they simply NOT scale (too many groupings, left joins), looks too complex (i.e. require whole a lot of experienced in DBA) and database specific (i.e. say in MySQL it will be too problematic to made something like this possible) etc... In the other hand, in most real life situations, you anyway have some kind of vector cache (say in Memcached) for all your blog posts, so it's not a big issue to calculate qty in cache directly and does not go to database at all if your cache valid :) If your cache for some reason become not valid (say you just restart cache or something else), than I don't see any issue to calculate qty of posts in database using simply count query from begging of the post because it will be not so frequently anyway. So my vote to move it to a bit higher level than database and / or just use most simple solution with denormalization :) Agree or? http://ayende.com/4601/database-assisted-denormalization#comment1http://ayende.com/4601/database-assisted-denormalization#comment1Mon, 23 Aug 2010 09:33:18 GMT