Ayende @ Rahien

Hi!
My name is Ayende Rahien
Founder of Hibernating Rhinos LTD and RavenDB.
You can reach me by phone or email:

ayende@ayende.com

+972 52-548-6969

, @ Q c

Posts: 5,949 | Comments: 44,548

filter by tags archive

Database assisted denormalization – Oracle edition


I decided to take a chance (installing Oracle is a big leap :-) ) and see how things match in Oracle.

I decided to run the following query:

SELECT deptno, 
       dname, 
       loc, 
       (SELECT COUNT(*) 
        FROM   emp 
        WHERE  emp.deptno = dept.deptno) AS empcount 
FROM   dept 
WHERE  deptno = 20 

Please note that I run in on a database that had (total) maybe a 100 records, so the results may be skewed.

image

Like in the SQL Server case, we need to create an index on the FK column. I did so, after which I got:

image

Then I dropped that index and create a simple view:

CREATE VIEW depswithempcount 
AS 
  SELECT deptno, 
         dname, 
         loc, 
         (SELECT COUNT(*) 
          FROM   emp 
          WHERE  emp.deptno = dept.deptno) AS empcount 
  FROM   dept 

Querying on top of that gives me the same query plan as before. Trying to create a materialized view out of this fails, because of the subquery expression, I’ll have to express the view in terms of joins, instead. Like this:

SELECT dept.deptno, 
       dname, 
       loc, 
       COUNT(*) empcount 
FROM   dept 
       LEFT JOIN emp 
         ON dept.deptno = emp.deptno 
WHERE  dept.deptno = 20 
GROUP  BY dept.deptno, 
          dname, 
          loc 

Interestingly enough, this is a different query plan than the subquery, with SQL Server, those two query exhibit identical query plans.

image

Now, to turn that into an materialized view.

CREATE materialized VIEW deptwithempcount 
AS SELECT dept.deptno, 
          dname, 
          loc, 
          COUNT(*) empcount 
   FROM   dept 
          left join emp 
            ON dept.deptno = emp.deptno 
   GROUP  BY dept.deptno, 
             dname, 
             loc 

And querying on this gives us very interesting results:

select * from deptwithempcount 
where deptno = 20

image

Unlike SQL Server, we can see that Oracle is reading everything from the view. But let us try one more thing, before we conclude this with a victory.

update emp 
set deptno = 10
where deptno = 20;

select * from deptwithempcount 
where deptno = 20

But now, when we re-run the materialized view query, we see the results as they were at the creation of the view.

There appears to be a set of options to control that, but the one that I want (RERESH FAST), which update the view as soon as data changes will not work with this query, since it consider it too complex. I didn’t investigate too deeply, but it seems that this is another dead end.


Comments

Justin

Probably one of the biggest deficiencies in the various RDBMS offerings out there is their inability to materialize complex views.

This is a hard problem to solve but could be improved and fits very well into the relational model.

If you went with a eventually consistent model with a background view generation like some NOSQL products, the the complex view update could be deferred, but you could get stale results.

You should try a trigger based solution with MSSQL or Oracle at least as a comparison.

Not sure what the aversion to triggers is, they are ideal for creating complex user defined "indexes" such as what you are doing with the Post Count.

Materialize views are essentially just declarative system managed triggers that update another table.

Bruno Martinez

Ayende,

Put both tables in the same table cluster, so that finding the employees is not another disk seek. This gets close to document data bases.

tobi

"This is a hard problem to solve" It is not. It can be solved by using stacked indexed views (search on google to find a pdf from MS; alas not implemented).

I was appalled as I learned about oracles deferred materialized views. I believe there is no synchronous option which makes them worthless for many applications. What an unnecessary crippling.

tobi

On the other hand If you want a deferred update system then the automatic view matching in the query optimizer must come in very handy.

Sam
Sam

I'm with Justin - relational databases have an existing, proven, built-in mechanism to implement what you're trying to accomplish; they're called triggers.

On one occasion I remember fiddling around with indexed views for a week before giving up and implementing a trigger solution. Quickest, easiest, most sensible and most performant change I ever made to that system.

Frans Bouma

@Tobi: materialized/indexed views are by definition used in functionality which consumes large buckets of data and processes them for a lengthy period of time for readonly purposes. This alone makes it already not necessary to have the views be 'up to date' at all time, as that's only possible with a live query.

many people think data has to be 'non-stale' at any given moment, but they forget that only some data has to be 'non-stale'. All the rest can be consumed from for example materialized/indexed views. The post count is a good example for data which is totally acceptable to be 'stale': who's going to check whether user X has indeed 103 posts? no-one.

It therefore comes down to:

  • decide what data elements can be stale

  • make sure the non-stale data is fast and easy to retrieve and manipulate.

And with 'stale' I don't mean data which is a year old, but for example materialized views (or your own tables with duplicated denormalized data, same thing, if your db doesn't support materialized views) which are refreshed every 10 minutes. For reports about monthly sales etc., that's perfectly fine.

This also leads to the point where one could decide to use two distinct databases instead of 1: one for the non-stale data and one for the stale data, which is for example updated every night.

It's true that this means one has to think about what the data in a system really means at time T and time T+t. But IMHO that's a good thing, it makes one realize that a database is more than just a bucket to store bits in, and actually a machine which can make things really easy.

gunteman

"many people think data has to be 'non-stale' at any given moment, but they forget that only some data has to be 'non-stale'. All the rest can be consumed from for example materialized/indexed views. The post count is a good example for data which is totally acceptable to be 'stale': who's going to check whether user X has indeed 103 posts? no-one."

Word!

tobi

@Frans Bouma: Materialized views are not only used for aggregation. You can also save tons of join with them. Example: A forum thread. At least the following tables contribute one row for each post: ForumPosts, Users, UserAvatarImages. Then the resultset has to be sorted by datetime. All of this can be materialized so that the forum page needs exactly one range seek. Very cheap, can do that 2k times per second per core.

Imran

Have you tried using count(1) instead of count(*) and seeing if it makes a difference to your execution time? Also if it is ok for the count to be a little stale you could store it as a normal numerical field that gets updated by a batch process (database job in sql server) thats runs every so often?

Simon Mac

@ Justin

+1 for materialized / schema bound views, being "sytem shortcuts" for triggers.

Ultimately the DB needs to infer if ANY object insert / update / delete would alter the output of ANY of the materialized views (Which I guess are little more than tables anyway).

I suspect this is the reason they are fairly limited in complexity, otherwise they could cripple modification performance, but in a way that is completely non-obvious.

Whilst triggers too maybe less than obvious, at least they are tied to the table you find the bottleneck on.

Would be interesting to see the query plan for the insert / updates on tables that form part of the materialized views.

tobi

The query plan does multiple updates in sequence. For aggregation views it actually aggregates the changed rows for each different view.

Comment preview

Comments have been closed on this topic.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. The RavenDB Comic Strip (3):
    28 May 2015 - Part III – High availability & sleeping soundly
  2. Special Offer (2):
    27 May 2015 - 29% discount for all our products
  3. RavenDB Sharding (3):
    22 May 2015 - Adding a new shard to an existing cluster, splitting the shard
  4. Challenge (45):
    28 Apr 2015 - What is the meaning of this change?
  5. Interview question (2):
    30 Mar 2015 - fix the index
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats