Welcome to HBH! If you have tried to register and didn't get a verification email, please using the following link to resend the verification email.

Hierarchical data and SQL


Hierarchical data and SQL

By GTADarkDude avatarGTADarkDude | 9902 Reads |
0     0

/ Introduction /

Hierarchical data, like in tree structures or other graphs, is hard to use directly in relational database management systems. Some trick has to be thought of in order to easily store and maintain such data. Lots of information can be found on this topic all over the internet, but I thought it would be a good idea to write a short article about this myself, to explain the basics.

/ The Case /

For this article, I thought I'd make up some sample data to use for examples. Here is the tree (special case of a directed acyclic graph) I'd like to store:

This tree represents some hierarchical structure, were Adams is Blake's manager and Evans is Blake's subordinate. The table will be called 'Personnel' in my examples.

/ Solution 1: Adjacency list /

In this solution, we represent the tree by representing its edges, like this:

	name varchar(40) not null primary key,
	mgr varchar(40) references Personnel(name)
);```
This means: there is an edge from employees to their managers.

Something you might have to do regularly, is asking if someone is (indirectly) subordinate of someone else. As the number of levels between those persons is unknown, we don\'t know how many times we have to self join Personnel. This means that this task can\'t be solved in traditional SQL. We would have to use a recursive query to solve this, like this (SQL3 format):
```markupWITH
	RECURSIVE Temp AS
		(SELECT name, mgr FROM Personnel)
	UNION
		(SELECT Temp.name, Temp.mgr
		FROM Temp, Personnel
		WHERE Temp.mgr = Personnel.name)
SELECT mgr
FROM Temp
WHERE name = \'James\' AND mgr = \'Adams\';```
However, lots of RDBMSs don\'t support this. Updating the table (adding or deleting an employee) can also require multiple queries. Therefore, this method is often not preferred. However, executing SELECT queries can be simplified by making sure that your table is transitively closed.

I\'ll use the transitive relationship \'is greater than\' as an example for this. Imagine A is greater than B and B is greater than C. This also implies that A is greater than C. A transitively closed table would include all such (implied) pairs. This would mean a much bigger database, yet SELECT queries like the one above will presumably execute faster. So whether this method is efficient all depends on the information you want to store and on the way in which you\'ll be using said information.


**/* Solution 2: Materialized path */**

This solutions stores the complete path to each employee. In my example, James would have a path of 1.2.1.1, which means that James is the first child of 1.2.1 (Garcia), that Garcia is the first child of 1.2 (Clark) and, finally, that Clark is the second child of 1 (Adams).
```markupCREATE TABLE Personnel (
	name varchar(40) not null primary key,
	path varchar(10) not null
);```
In this case, showing a given employee and all of his/her supervisors can be done (quite easily) with the LIKE operator:
```markupSELECT P1.name
FROM Personnel P1, Personnel P2
WHERE P2.path LIKE P1.path || \'%\'
AND P2.name = \'James\';```
In Oracle, || is the operator for string concatenation. In MySQL, we would use the CONCAT() function.

So querying the table is a lot easier than with the first solution. Updating, however, remains (relatively) hard, depending on the string manipulation functions your RDBMS has built in.


**/* Solution 3: Nested set */**

This solution uses the preorder tree traversal algorithm. (http://en.wikipedia.org/wiki/Tree_traversal) For each node, the moment of visit entry and exit is recorded as its left and right values. It is important that you know what this means. I\'m too lazy to make an image to explain this, but you can try Google Images (\"preorder tree traversal\" -> the image with the food tree) for this. Our table is going to look like this:
```markupCREATE TABLE Personnel (
	name varchar(40) not null primary key,
	lft tinyint not null unique check (lft>0),
	rgt tinyint not null unique check (rgt>1),
	check (lft<rgt)
);```
Clark\'s record has a lft value of 8 and a rgt value of 15. If we would want to show all of his subordinates, the query would be like this:
```markupSELECT name
FROM Personnel
WHERE lft BETWEEN 8 AND 15
ORDER BY lft ASC;```
On the other hand, all of James\' managers can be found with this query:
```markupSELECT name
FROM Personnel
WHERE lft < 10 AND rgt > 11
ORDER BY lft ASC;```
This would require a second query to find the lft and rgt values of a given employee though.

Intermezzo: The number of subordinates of a given person can be found with simple math: numDesc = (rgt-lft-1)/2 .

Updating this table is not that hard either. For example, lets hire a new employee called \'Lewis\' who is going to work for Blake. He\'s going to have the following values: lft=5, rgt=6. (That\'s between Evans and Ford.) First, we have to make some space by updating the lft and rgt values of all employees right of Lewis:
```markupUPDATE Personnel SET lft=lft+2 WHERE lft>4;
UPDATE Personnel SET rgt=rgt+2 WHERE rgt>4;```
Now Lewis can fill up the new space:
```markupINSERT INTRO Personnel SET name=\'Lewis\', lft=4, rgt=5;```

This last solution is probably the hardest to understand fully, yet you can do almost anything with this and it is usually a lot faster.


**/* Epilogue */**

First of all, I hope you liked this article and found it useful. English is not my native language, but I think I speak it well enough to make myself clear. There could still be some errors in this text though.

This article was meant as a quick overview. For more in-depth information, this link contains a really good article about the same subject:
http://www.dbazine.com/oracle/or-articles/tropashko4

I know my article looks very similar to the one in the link on first sight, but I want to assure you that I have not copied anything directly from this site. I got my information mainly from my college materials, but I found out later on that my lecturers based their slides on that specific article.


Greetings,

GTADarkDude

Comments
ynori7's avatar
ynori7 14 years ago

God I hate working with trees. Nice article though, well organized.

korg's avatar
korg 14 years ago

Good information, well written, really liked it. Trees are fun to pee out of.

ghost's avatar
ghost 14 years ago

Nice work interesting article.