Ayende @ Rahien

Refunds available at head office

Dealing with hierarchical structures in databases

I have a very simple requirement, I need to create a hierarchy of users' groups. So you can do something like:

  • Administrators
    • DBA
      • SQLite DBA

If you are a member of SQLite DBA group, you are implicitly a member of the Administrators group.

In the database, it is trivial to model this:

image

Except that then we run into the problem of dealing with the hierarchy. We can't really ask questions that involve more than one level of the hierarchy easily.  Some databases has support for hierarchical operators, but that is different from one database to the next. That is a problem, since I need it to work across databases, and without doing too much fancy stuff.

We can work around the problem by introducing a new table:

image

Now we move the burden of the hierarchy from the query phase to the data entry phase.

From the point of view of the entity, we have this:

image

Please ignore the death star shape and concentrate on the details :-)

Here is how we are getting all the data in the tree:

public virtual UsersGroup[] GetAssociatedUsersGroupFor(IUser user)
{
    DetachedCriteria directGroupsCriteria = DetachedCriteria.For<UsersGroup>()
        .CreateAlias("Users", "user")
        .Add(Expression.Eq("user.id", user.SecurityInfo.Identifier))
        .SetProjection(Projections.Id());

    DetachedCriteria allGroupsCriteria = DetachedCriteria.For<UsersGroup>()
        .CreateAlias("Users", "user", JoinType.LeftOuterJoin)
        .CreateAlias("AllChildren", "child", JoinType.LeftOuterJoin)
        .Add(
            Subqueries.PropertyIn("child.id", directGroupsCriteria) ||
            Expression.Eq("user.id", user.SecurityInfo.Identifier));

    ICollection<UsersGroup> usersGroups = 
        usersGroupRepository.FindAll(allGroupsCriteria, Order.Asc("Name"));
    return Collection.ToArray<UsersGroup>(usersGroups);
}

Note that here we don't care whatever we are associated with a group directly or indirectly. This is an important consideration in some scenarios (mostly when you want to display information to the user), so we need some way to chart the hierarchy, right?

Here is how we are doing this:

public virtual UsersGroup[] GetAncestryAssociation(IUser user, string usersGroupName)
{
    UsersGroup desiredGroup = GetUsersGroupByName(usersGroupName);
    ICollection<UsersGroup> directGroups =
        usersGroupRepository.FindAll(GetDirectUserGroupsCriteria(user));
    if (directGroups.Contains(desiredGroup))
    {
        return new UsersGroup[] { desiredGroup };
    }
    // as a nice benefit, this does an eager load of all the groups in the hierarchy
    // in an efficient way, so we don't have SELECT N + 1 here, nor do we need
    // to load the Users collection (which may be very large) to check if we are associated
    // directly or not
    UsersGroup[] associatedGroups = GetAssociatedUsersGroupFor(user);
    if (Array.IndexOf(associatedGroups, desiredGroup) == -1)
    {
        return new UsersGroup[0];
    }
    // now we need to find out the path to it
    List<UsersGroup> shortest = new List<UsersGroup>();
    foreach (UsersGroup usersGroup in associatedGroups)
    {
        List<UsersGroup> path = new List<UsersGroup>();
        UsersGroup current = usersGroup;
        while (current.Parent != null && current != desiredGroup)
        {
            path.Add(current);
            current = current.Parent;
        }
        if (current != null)
            path.Add(current);
        // Valid paths are those that are contains the desired group
        // and start in one of the groups that are directly associated
        // with the user
        if (path.Contains(desiredGroup) && directGroups.Contains(path[0]))
        {
            shortest = Min(shortest, path);
        }
    }
    return shortest.ToArray();
}

As an aside, this is about as complex a method as I can tolerate, and even that just barely.

I mentioned that the burden was when creating it, right? Here is what I meant:

public UsersGroup CreateChildUserGroupOf(string parentGroupName, string usersGroupName)
{
    UsersGroup parent = GetUsersGroupByName(parentGroupName);
    Guard.Against<ArgumentException>(parent == null,
                                     "Parent users group '" + parentGroupName + "' does not exists");

    UsersGroup group = CreateUsersGroup(usersGroupName);
    group.Parent = parent;
    group.AllParents.AddAll(parent.AllParents);
    group.AllParents.Add(parent);
    parent.Directchildren.Add(group);
    parent.AllChildren.Add(group);
    return group;
}

We could hide it all inside the Parent's property setter, but we still need to deal with it.

And that is all you need to do in order to get it cross database hierarchical structures working.

Comments

Andrew
01/22/2008 01:31 AM by
Andrew

Or you could just use Nested Sets.

Andrew
01/22/2008 01:44 AM by
Andrew

I think this is the original paper:

http://www.dbmsmag.com/9603d06.html

Ayende Rahien
01/22/2008 02:26 AM by
Ayende Rahien

I think I read that a while ago.

But doesn't this mean that you have to do a lot of work when you add items to the tree?

Bunter
01/22/2008 03:21 AM by
Bunter

Now put few thousand nodes in your tree giving it some depth and calculate, how many records you'll have in UsersGroupsHierarchy.

Pete W
01/22/2008 03:23 AM by
Pete W

Heirarchical data is a natural phenomenon in so many models.

There's a couple of great ways to implement it.

One of the nicer approaches I have tried has a table with at least 3 columns: the ID, the Parent ID, and a varchar column that has an ordered collection of IDs representing the "Path" of IDs starting from the root to the current row, in a pipe delimited format.

If you maintain and persist this path as a pipe delimited string, you can have very fast string-comparison searches for finding the children of any node in the tree.

Then again, this kind of plan is optimized for high-read, low-update scenarios so your mileage may vary.

Ayende Rahien
01/22/2008 03:26 AM by
Ayende Rahien

Bunter,

H, where H is the height of the deepest tree.

Assuming that in most cases we are fairly flat, I don't see an issue.

It also tend to affect only the changed rows, nothing else.

Phil Haselden
01/22/2008 03:37 AM by
Phil Haselden

As far as I can tell there are 3 main methods for dealing with trees in SQL: Adjacency List, Nested Sets, and Materialized Path.

Joe Celko has a section on nested sets etc in SQL for Smarties. He also has a whole book called Graphs, Trees and Hierarchies which may be worth a look (haven't read it myself).

Another reference worth a look is chapter 9 of Inside Microsoft® SQL

Server™ 2005: T-SQL Programming by Itzik Ben-Gan et al.

Bunter
01/22/2008 03:45 AM by
Bunter

H? How come? Did i misread it and you don't have all possible indirect path element relations in database i.e. if it would be a family tree grandson would have entry for grandfather, grandgrandfather etc?

Andrew
01/22/2008 03:45 AM by
Andrew

But doesn't this mean that you have to do a lot of work when you add items to the tree?

True, one of those annoying trade-offs :-) Ideally, support should be at the mapper level (like RoR has) - something we have on our backlog for LightSpeed 2.

Ayende Rahien
01/22/2008 03:52 AM by
Ayende Rahien

Bunter,

Yes, that is the H I was talking about.

Times however items you have.

Again, assume shallow hierarchies are common, this gives you 2 - 5 rows per item, worst case.

Ayende Rahien
01/22/2008 05:06 AM by
Ayende Rahien

Bunter,

For ~4000 groups, with average nesting level of 7, you get ~26,000 rows in the association table.

What is more important from my point of view is that to traverse 50 levels of hierarchy I need the following 3 queries:

-- get desired group

SELECT this.Id AS Id50_,

   this_.Name   AS Name5_0_,

   this_.Parent AS Parent5_0_

FROM Security.dbo.securityUsersGroups this

WHERE this_.Name = @p0;

-- get user's direct groups

SELECT this.Id AS Id51_,

   this_.Name      AS Name5_1_,

   this_.Parent    AS Parent5_1_,

   users3_.GroupId AS GroupId__3_,

   user1_.Id       AS UserId3_,

   user1_.Id       AS Id1_0_,

   user1_.Name     AS Name1_0_

FROM Security.dbo.securityUsersGroups this

   INNER JOIN Security.dbo.security_UsersToUsersGroups users3_

     ON this_.Id = users3_.GroupId

   INNER JOIN Security.dbo.Users user1_

     ON users3_.UserId = user1_.Id

WHERE user1_.Id = @p0;

-- get user's group with ancestry

SELECT this.Id AS Id52_,

     this_.Name               AS Name5_2_,

     this_.Parent             AS Parent5_2_,

     users4_.GroupId          AS GroupId__4_,

     user1_.Id                AS UserId4_,

     user1_.Id                AS Id1_0_,

     user1_.Name              AS Name1_0_,

     allchildre6_.ParentGroup AS ParentGr1___5_,

     child2_.Id               AS ChildGroup5_,

     child2_.Id               AS Id5_1_,

     child2_.Name             AS Name5_1_,

     child2_.Parent           AS Parent5_1_

FROM Security.dbo.securityUsersGroups this

     LEFT OUTER JOIN Security.dbo.security_UsersToUsersGroups users4_

       ON this_.Id = users4_.GroupId

     LEFT OUTER JOIN Security.dbo.Users user1_

       ON users4_.UserId = user1_.Id

     LEFT OUTER JOIN Security.dbo.security_UsersGroupsHierarchy allchildre6_

       ON this_.Id = allchildre6_.ParentGroup

     LEFT OUTER JOIN Security.dbo.security_UsersGroups child2_

       ON allchildre6_.ChildGroup = child2_.Id

WHERE (child2.Id IN (SELECT this0.Id AS y0

                     FROM   Security.dbo.security_UsersGroups this_0_

                            INNER JOIN Security.dbo.security_UsersToUsersGroups users3_

                              ON this_0_.Id = users3_.GroupId

                            INNER JOIN Security.dbo.Users user1_

                              ON users3_.UserId = user1_.Id

                     WHERE  user1_.Id = @p0)

       OR user1_.Id = @p1)

ORDER BY this_.Name ASC;

Jan Van Ryswyck
01/22/2008 06:32 AM by
Jan Van Ryswyck

We are using the approach of Nested Sets as Andrew pointed out. It makes filling the table in the database a lot harder to implement, but makes the querying a lot faster.

Manuel Abadia
01/22/2008 08:36 AM by
Manuel Abadia

Hierarchical structures in the DB are a pain to deal with.

I hope that when NHibernate uses an AST approach, each DB driver can handle this transparently.

Frans Bouma
01/22/2008 09:25 AM by
Frans Bouma

Use Celko's approach:

http://groups-beta.google.com/group/microsoft.public.sqlserver.programming/browse_frm/thread/a0971b548a1daa06/d94afd1d56858df4?q=tree

With that you can write normal SQL queries to fetch data from it. You can also pre-calc the hierarchies with a trigger which comes down to what you're doing more or less. The downside is that it's less logical. Celko's approach of a tree with adjacency lists (a tree is a graph after all), makes it very easy to do. Inserting is a bit of work, however that's no different than you have now.

Materialized paths is what we used in our CMS and which looks similar to what you have.

Supporting this in an o/r mapper ('fetch the hierarchy efficiently') is often a nightmare, because the user has constructed a naive table structure (i.e. with an FK to self). Fetching that efficiently is a single table scan and building the tree with an O(n) algo by traversing the set once, using a dictionary for bookkeeping. But often people don't want the whole hierarchy, just a part of it, and then the problems begin.

Tim Wilde
01/22/2008 09:56 AM by
Tim Wilde

I was talking with my brother about this sort of thing in the context of site maps in a database, and he pointed me to the following article which is a very interesting aproach. Maybe worth a look?

http://www.sitepoint.com/article/hierarchical-data-database

Ayende Rahien
01/22/2008 11:48 AM by
Ayende Rahien

Frans,

This approach gives you easy querying capabilities.

In addition to that, if you want hierarchy with a certain part, you can easily define it with a level column, and use that to limit the query, no?

Frans Bouma
01/22/2008 03:56 PM by
Frans Bouma

Correct. The downside (IMHO) is that you have to precalc the paths (if I understand your approach as the same as materialized paths) after insert/update, but after that it's pretty easy. If not a lot of mutations take place, it is an easy way of doing hierarchies in db's.

A level column might increase the # of recalculations of the paths I think.

Jeff Tucker
01/22/2008 07:19 PM by
Jeff Tucker

I think Common Table Expressions (CTE's) are what you're looking for here. IIRC, it's part of ANSI standard SQL and is implemented by MS Sql Server 2005 and Oracle, not sure about MySQL and Postgres though I suspect that MySQL doesn't have support for it and postgres probably does. Anyway, what if you just use this either as a stored proc to serialize to XML, write the sql yourself, or spend the 30 seconds it would probably take you to add this functionality to NHibernate or something? I would argue that databases are capable of handling this and querying data on it much faster than any code anyone could write (unless you're doing it in assembly by hand, but who wants to do that?) and it allows you to use your simple model with one table. I'd have to check the query plans to be sure, but I think you'd also save a table scan by using a CTE instead of joining on another table like you are.

Ayende Rahien
01/22/2008 07:31 PM by
Ayende Rahien

Jeff,

I would be pleased if you spend the 30 seconds to add this to NHibernate, I promise to apply the patch.

And CTE are be no means commons across databases. For instance, oracle doesn't support CTE, but uses the connect by syntax.

Jeff Tucker
01/22/2008 08:00 PM by
Jeff Tucker

I'd love to add this to NHibernate but it would take me a lot longer to do than it would take you. I recall a time in devteach when you added a feature to (I think) Castle and blogged about it during one of the presentations, so I was just joking about your superhuman ability to write code quickly.

I've actually strongly considered fixing all the Oracle support in NHibernate because that is what's stopping us from using it (there are a few other trivial things that would also be easy to fix) but I haven't had the time yet and all the Oracle connectors that exist for .Net have crappy support for CLOBS (CLOBS = NTEXT for those non-Oracle people reading this) just to name one thing that I have about them. I know Oracle doesn't explicitly support CTE's but my point was that the concept of CTE's is supported across both databases. I'll add NHibernate to the list of projects that I'm working on in my spare time, but I think for now my linux projects are taking priority because of what I'm trying to do for my home network.

Joshua McKinney
01/22/2008 11:09 PM by
Joshua McKinney

There's a codeproject article that demonstrates one approach to this at http://www.codeproject.com/KB/database/ModelingDAGsonSQLDBs.aspx

Kalpesh
01/23/2008 04:45 PM by
Kalpesh

Pardon my ignorance, if I haven't understood the problem correctly.

Can this thing be solved in a way that is similar to an enum?

e.g.

Administrator = 1

DBA = 2

SQLite DBA = 4

So, a user with all the rights is combination of Administrator + DBA + SQLite DBA (similar to enum with FlagsAttribute).

Once again, if I have misunderstood the problem - I apologize.

Ayende Rahien
01/23/2008 05:28 PM by
Ayende Rahien

Kalpesh,

No.

A member of the DBA group is a member of the Administrator group.

Those are groups, not permissions

And those are flexible at runtime, not hard coded.

Comments have been closed on this topic.