Dealing with hierarchical structures in databases
I have a very simple requirement, I need to create a hierarchy of users' groups. So you can do something like:
- Administrators
- DBA
- SQLite DBA
If you are a member of SQLite DBA group, you are implicitly a member of the Administrators group.
In the database, it is trivial to model this:
Except that then we run into the problem of dealing with the hierarchy. We can't really ask questions that involve more than one level of the hierarchy easily. Some databases has support for hierarchical operators, but that is different from one database to the next. That is a problem, since I need it to work across databases, and without doing too much fancy stuff.
We can work around the problem by introducing a new table:
Now we move the burden of the hierarchy from the query phase to the data entry phase.
From the point of view of the entity, we have this:
Please ignore the death star shape and concentrate on the details :-)
Here is how we are getting all the data in the tree:
public virtual UsersGroup[] GetAssociatedUsersGroupFor(IUser user) { DetachedCriteria directGroupsCriteria = DetachedCriteria.For<UsersGroup>() .CreateAlias("Users", "user") .Add(Expression.Eq("user.id", user.SecurityInfo.Identifier)) .SetProjection(Projections.Id()); DetachedCriteria allGroupsCriteria = DetachedCriteria.For<UsersGroup>() .CreateAlias("Users", "user", JoinType.LeftOuterJoin) .CreateAlias("AllChildren", "child", JoinType.LeftOuterJoin) .Add( Subqueries.PropertyIn("child.id", directGroupsCriteria) || Expression.Eq("user.id", user.SecurityInfo.Identifier)); ICollection<UsersGroup> usersGroups = usersGroupRepository.FindAll(allGroupsCriteria, Order.Asc("Name")); return Collection.ToArray<UsersGroup>(usersGroups); }
Note that here we don't care whatever we are associated with a group directly or indirectly. This is an important consideration in some scenarios (mostly when you want to display information to the user), so we need some way to chart the hierarchy, right?
Here is how we are doing this:
public virtual UsersGroup[] GetAncestryAssociation(IUser user, string usersGroupName) { UsersGroup desiredGroup = GetUsersGroupByName(usersGroupName); ICollection<UsersGroup> directGroups = usersGroupRepository.FindAll(GetDirectUserGroupsCriteria(user)); if (directGroups.Contains(desiredGroup)) { return new UsersGroup[] { desiredGroup }; } // as a nice benefit, this does an eager load of all the groups in the hierarchy // in an efficient way, so we don't have SELECT N + 1 here, nor do we need // to load the Users collection (which may be very large) to check if we are associated // directly or not UsersGroup[] associatedGroups = GetAssociatedUsersGroupFor(user); if (Array.IndexOf(associatedGroups, desiredGroup) == -1) { return new UsersGroup[0]; } // now we need to find out the path to it List<UsersGroup> shortest = new List<UsersGroup>(); foreach (UsersGroup usersGroup in associatedGroups) { List<UsersGroup> path = new List<UsersGroup>(); UsersGroup current = usersGroup; while (current.Parent != null && current != desiredGroup) { path.Add(current); current = current.Parent; } if (current != null) path.Add(current); // Valid paths are those that are contains the desired group // and start in one of the groups that are directly associated // with the user if (path.Contains(desiredGroup) && directGroups.Contains(path[0])) { shortest = Min(shortest, path); } } return shortest.ToArray(); }
As an aside, this is about as complex a method as I can tolerate, and even that just barely.
I mentioned that the burden was when creating it, right? Here is what I meant:
public UsersGroup CreateChildUserGroupOf(string parentGroupName, string usersGroupName) { UsersGroup parent = GetUsersGroupByName(parentGroupName); Guard.Against<ArgumentException>(parent == null, "Parent users group '" + parentGroupName + "' does not exists"); UsersGroup group = CreateUsersGroup(usersGroupName); group.Parent = parent; group.AllParents.AddAll(parent.AllParents); group.AllParents.Add(parent); parent.Directchildren.Add(group); parent.AllChildren.Add(group); return group; }
We could hide it all inside the Parent's property setter, but we still need to deal with it.
And that is all you need to do in order to get it cross database hierarchical structures working.
Comments
Or you could just use Nested Sets.
Can you explain?
I think this is the original paper:
http://www.dbmsmag.com/9603d06.html
I think I read that a while ago.
But doesn't this mean that you have to do a lot of work when you add items to the tree?
Now put few thousand nodes in your tree giving it some depth and calculate, how many records you'll have in UsersGroupsHierarchy.
Heirarchical data is a natural phenomenon in so many models.
There's a couple of great ways to implement it.
One of the nicer approaches I have tried has a table with at least 3 columns: the ID, the Parent ID, and a varchar column that has an ordered collection of IDs representing the "Path" of IDs starting from the root to the current row, in a pipe delimited format.
If you maintain and persist this path as a pipe delimited string, you can have very fast string-comparison searches for finding the children of any node in the tree.
Then again, this kind of plan is optimized for high-read, low-update scenarios so your mileage may vary.
Bunter,
H, where H is the height of the deepest tree.
Assuming that in most cases we are fairly flat, I don't see an issue.
It also tend to affect only the changed rows, nothing else.
As far as I can tell there are 3 main methods for dealing with trees in SQL: Adjacency List, Nested Sets, and Materialized Path.
Joe Celko has a section on nested sets etc in SQL for Smarties. He also has a whole book called Graphs, Trees and Hierarchies which may be worth a look (haven't read it myself).
Another reference worth a look is chapter 9 of Inside Microsoft® SQL
Server™ 2005: T-SQL Programming by Itzik Ben-Gan et al.
H? How come? Did i misread it and you don't have all possible indirect path element relations in database i.e. if it would be a family tree grandson would have entry for grandfather, grandgrandfather etc?
True, one of those annoying trade-offs :-) Ideally, support should be at the mapper level (like RoR has) - something we have on our backlog for LightSpeed 2.
Bunter,
Yes, that is the H I was talking about.
Times however items you have.
Again, assume shallow hierarchies are common, this gives you 2 - 5 rows per item, worst case.
Bunter,
For ~4000 groups, with average nesting level of 7, you get ~26,000 rows in the association table.
What is more important from my point of view is that to traverse 50 levels of hierarchy I need the following 3 queries:
-- get desired group
SELECT this_.Id AS Id5_0_,
FROM Security.dbo.security_UsersGroups this_
WHERE this_.Name = @p0;
-- get user's direct groups
SELECT this_.Id AS Id5_1_,
FROM Security.dbo.security_UsersGroups this_
WHERE user1_.Id = @p0;
-- get user's group with ancestry
SELECT this_.Id AS Id5_2_,
FROM Security.dbo.security_UsersGroups this_
WHERE (child2_.Id IN (SELECT this_0_.Id AS y0_
ORDER BY this_.Name ASC;
We are using the approach of Nested Sets as Andrew pointed out. It makes filling the table in the database a lot harder to implement, but makes the querying a lot faster.
Hierarchical structures in the DB are a pain to deal with.
I hope that when NHibernate uses an AST approach, each DB driver can handle this transparently.
Use Celko's approach:
http://groups-beta.google.com/group/microsoft.public.sqlserver.programming/browse_frm/thread/a0971b548a1daa06/d94afd1d56858df4?q=tree
With that you can write normal SQL queries to fetch data from it. You can also pre-calc the hierarchies with a trigger which comes down to what you're doing more or less. The downside is that it's less logical. Celko's approach of a tree with adjacency lists (a tree is a graph after all), makes it very easy to do. Inserting is a bit of work, however that's no different than you have now.
Materialized paths is what we used in our CMS and which looks similar to what you have.
Supporting this in an o/r mapper ('fetch the hierarchy efficiently') is often a nightmare, because the user has constructed a naive table structure (i.e. with an FK to self). Fetching that efficiently is a single table scan and building the tree with an O(n) algo by traversing the set once, using a dictionary for bookkeeping. But often people don't want the whole hierarchy, just a part of it, and then the problems begin.
I was talking with my brother about this sort of thing in the context of site maps in a database, and he pointed me to the following article which is a very interesting aproach. Maybe worth a look?
http://www.sitepoint.com/article/hierarchical-data-database
Frans,
This approach gives you easy querying capabilities.
In addition to that, if you want hierarchy with a certain part, you can easily define it with a level column, and use that to limit the query, no?
Correct. The downside (IMHO) is that you have to precalc the paths (if I understand your approach as the same as materialized paths) after insert/update, but after that it's pretty easy. If not a lot of mutations take place, it is an easy way of doing hierarchies in db's.
A level column might increase the # of recalculations of the paths I think.
I think Common Table Expressions (CTE's) are what you're looking for here. IIRC, it's part of ANSI standard SQL and is implemented by MS Sql Server 2005 and Oracle, not sure about MySQL and Postgres though I suspect that MySQL doesn't have support for it and postgres probably does. Anyway, what if you just use this either as a stored proc to serialize to XML, write the sql yourself, or spend the 30 seconds it would probably take you to add this functionality to NHibernate or something? I would argue that databases are capable of handling this and querying data on it much faster than any code anyone could write (unless you're doing it in assembly by hand, but who wants to do that?) and it allows you to use your simple model with one table. I'd have to check the query plans to be sure, but I think you'd also save a table scan by using a CTE instead of joining on another table like you are.
Jeff,
I would be pleased if you spend the 30 seconds to add this to NHibernate, I promise to apply the patch.
And CTE are be no means commons across databases. For instance, oracle doesn't support CTE, but uses the connect by syntax.
I'd love to add this to NHibernate but it would take me a lot longer to do than it would take you. I recall a time in devteach when you added a feature to (I think) Castle and blogged about it during one of the presentations, so I was just joking about your superhuman ability to write code quickly.
I've actually strongly considered fixing all the Oracle support in NHibernate because that is what's stopping us from using it (there are a few other trivial things that would also be easy to fix) but I haven't had the time yet and all the Oracle connectors that exist for .Net have crappy support for CLOBS (CLOBS = NTEXT for those non-Oracle people reading this) just to name one thing that I have about them. I know Oracle doesn't explicitly support CTE's but my point was that the concept of CTE's is supported across both databases. I'll add NHibernate to the list of projects that I'm working on in my spare time, but I think for now my linux projects are taking priority because of what I'm trying to do for my home network.
There's a codeproject article that demonstrates one approach to this at http://www.codeproject.com/KB/database/Modeling_DAGs_on_SQL_DBs.aspx
Pardon my ignorance, if I haven't understood the problem correctly.
Can this thing be solved in a way that is similar to an enum?
e.g.
Administrator = 1
DBA = 2
SQLite DBA = 4
So, a user with all the rights is combination of Administrator + DBA + SQLite DBA (similar to enum with FlagsAttribute).
Once again, if I have misunderstood the problem - I apologize.
Kalpesh,
No.
A member of the DBA group is a member of the Administrator group.
Those are groups, not permissions
And those are flexible at runtime, not hard coded.
Comment preview