Blog

How-Tos, Deep-Dives, Brain-Dumps, and More

MySQL - Adjacency List Model For Hierarchical Data Using CTE

Who is this article for?

Anyone looking for how Manage Hierarchical Data in MySQL with Adjacency List Model using CTE (Common Table Expressions).

In this article, I’m going to walk through on how to retrieve, insert and update hierarchical data.

Before You Begin

If you don't know what CTE's are, I would highly recommend reading about it. Since v8.x, MySQL has added support for CTE which it was missing for quite some time.

Before v8.X, it was challenging to retrieve data from the Adjacency List Model as outlined in the Mike Hillyer article.

Just to give you a brief overview of how Recursive CTE works, let's check the pseudo code below.

CTE DEFINITION (
  ANCHOR (SQL STATEMENT 1)
  UNION ALL
  RECURSIVE MEMBER (SQL STATEMENT 2)
)
SELECT * FROM CTE
  • The first line is the CTE definition, for a recursive CTE it looks something like WITH RECURSIVE cteName (columns) AS (
  • Then the "Anchor" part is executed and generates a result, let's assume it generated result "R1",
  • After the anchor, "Recursive Member" gets executed using "R1" as input, let's says it generated result set "R2".
  • Again, "Recursive Member" gets executed using "R2" as input, this goes on until "Recursive Member" output is null.
  • Finally, "Union All" is applied to all output, and that becomes our final result set.

Hopefully, that is not too confusing, if so, google and read more about CTE's.

Let go over some of the common terms with adjacency list.

  • Root Node - This is the primary node, it is usually the node that does not have a parent but is a parent for one or more entities.
  • Leaf nodes - These are the node that does not have any children.
  • Non-Leaf Node - These are the node that has one or more children.
  • Descendants - All children, grandchildren, etc. of a node.
  • Ancestors - All parents, grandparent, etc. of a node.

You must be asking, how this article is different from the one posted here. Honestly, there is not much of a difference, but in this article, we'll cover some extra set of queries and use multiple recursive CTE for some queries. This makes it a little more advanced when compared to mysqltutorial.org article.

For this article, I will be using the below graph as an adjacency list.

Ok, now let's jump into creating a table and see how we can do inserts, updates, and selects.

Step 1 - Create a table.

CREATE TABLE category (
  id int(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  name varchar(255) NOT NULL,
  parent_id int(10) UNSIGNED DEFAULT NULL,
  is_active int(1) UNSIGNED DEFAULT 1,
  PRIMARY KEY (id),
  FOREIGN KEY (parent_id) REFERENCES category (id)
    ON DELETE CASCADE ON UPDATE CASCADE
);

As you can see I have added one additional column (is_active), we will be using this column later on in the select queries.

Step 2 - Insert dummy records.

INSERT INTO category (name, parent_id) VALUES ("Electronics", null);
INSERT INTO category (name, parent_id) VALUES ("TV", 1);
INSERT INTO category (name, parent_id) VALUES ("Smart", 2);
INSERT INTO category (name, parent_id) VALUES ("4K Ultra HD", 2);
INSERT INTO category (name, parent_id) VALUES ("Curved", 2);
INSERT INTO category (name, parent_id) VALUES ("Camera", 1);
INSERT INTO category (name, parent_id) VALUES ("Computer", 1);
INSERT INTO category (name, parent_id) VALUES ("Desktop", 7);
INSERT INTO category (name, parent_id) VALUES ("Laptops", 7);
INSERT INTO category (name, parent_id) VALUES ("Work", 9);
INSERT INTO category (name, parent_id) VALUES ("Travel", 9);
INSERT INTO category (name, parent_id) VALUES ("All Around", 9);
INSERT INTO category (name, parent_id) VALUES ("Gaming", 9);
INSERT INTO category (name, parent_id) VALUES ("Tablet", 7);

Step 3 - Selecting Data via CTE

Finding the root node

SELECT 
  * 
FROM 
  category 
WHERE 
  parent_id IS NULL;

Finding the immediate children of a node

You can first retrieve the id of the root node and then supply it here, or you can do a sub where clause.

SELECT 
  * 
FROM 
  category 
WHERE 
  parent_id = (SELECT id FROM category WHERE parent_id IS NULL);

Finding the leaf nodes

SELECT
  c1.*
FROM
  category c1
    LEFT JOIN category c2 ON c2.parent_id = c1.id
WHERE
  c2.id IS NULL;

Retrieving the whole tree

WITH RECURSIVE shoppingCategories AS
(
  SELECT id, name, parent_id, 1 AS depth, name AS path
    FROM category
    WHERE parent_id IS NULL
  UNION ALL
  SELECT c.id, c.name, c.parent_id, sc.depth + 1, CONCAT(sc.path, ' > ', c.name)
    FROM shoppingCategories AS sc 
      JOIN category AS c ON sc.id = c.parent_id
)
SELECT * FROM shoppingCategories;

Retrieving the subtree.

We will now retrieve the Laptop subtree

WITH RECURSIVE shoppingCategories AS
(
  SELECT id, name, parent_id, 1 AS depth, name AS path
    FROM category
    WHERE id = 9
  UNION ALL
  SELECT c.id, c.name, c.parent_id, sc.depth + 1, CONCAT(sc.path, ' > ', c.name)
    FROM shoppingCategories AS sc 
      JOIN category AS c ON sc.id = c.parent_id
)
SELECT * FROM shoppingCategories;

Retrieving only active categories

Let's make the TV category as inactive, remember we added the flag called is_active.

UPDATE category SET is_active = 0 WHERE id = 2;

Ok, Now we'll retrieve the complete tree again, but this time we will exclude any nodes that are inactive and exclude their descendants.

WITH RECURSIVE shoppingCategories AS
(
  SELECT id, name, parent_id, 1 AS depth, name AS path
    FROM category
    WHERE parent_id IS NULL
  UNION ALL
  SELECT c.id, c.name, c.parent_id, sc.depth + 1, CONCAT(sc.path, ' > ', c.name)
    FROM shoppingCategories AS sc 
      JOIN category AS c ON sc.id = c.parent_id
    WHERE c.is_active = 1
)
SELECT * FROM shoppingCategories;

Retrieving the descendants and ancestors line

Now, let's say you are on the Laptop category and want to retrieve all the descendants and ancestors line. There are multiple ways to do that, I will go over the one that works best for me.

Retrieve all the descendants with positive depth and ancestors as negative depth.

WITH RECURSIVE
  ancestorCategories  AS
  (
    SELECT id, name, parent_id, 0 AS depth
      FROM category
      WHERE id = 9
    UNION ALL
    SELECT c.id, c.name, c.parent_id, ac.depth - 1
      FROM ancestorCategories AS ac 
        JOIN category AS c ON ac.parent_id = c.id
  ),
  descendantCategories  AS
  (
    SELECT id, name, parent_id, 0 AS depth
      FROM category
      WHERE id = 9
    UNION ALL
    SELECT c.id, c.name, c.parent_id, dc.depth + 1
      FROM descendantCategories AS dc 
        JOIN category AS c ON dc.id= c.parent_id
  )
SELECT * FROM ancestorCategories
UNION
SELECT * FROM descendantCategories
ORDER BY depth;

Tree Expansion Query

Let's say you are building an expanding tree. When you click on Electronics it shows you all Level 1 children, i.e. TV, Camera and Computer, then if you select Computer it will show all Level 1 of Ancestor, i.e., Electronics plus Level 1 of Computer.

It will look something like this.

Here is how you do that

SET @catID = 9;
WITH RECURSIVE
  ancestorCategories  AS
  (
    SELECT id, name, parent_id, 1 AS depth
      FROM category
      WHERE id = @catID
    UNION ALL
    SELECT c.id, c.name, c.parent_id, ac.depth - 1
      FROM ancestorCategories AS ac
        JOIN category AS c ON ac.parent_id = c.id
  ),
  ancestorDeptOneCategories  AS
  (
    SELECT id, name, parent_id, 1 AS depth
      FROM category
      WHERE parent_id IS null
    UNION ALL
    SELECT c.id, c.name, c.parent_id, adoc.depth + 1
      FROM ancestorDeptOneCategories AS adoc
        JOIN ancestorCategories ac ON ac.id = adoc.id
        JOIN category AS c ON adoc.id = c.parent_id
  )
SELECT * FROM ancestorDeptOneCategories
ORDER BY depth;

Here you can see we are passing the id as variable @catID, in the first CTE we are getting all the ancestors including self, and then in the second CTE ancestorDeptOneCategories we are getting all first level descendants of those ancestors.

Step 4 - Updating a parent.

This when you want to move a sub-tree. If we're going to move Laptopand all its descendants to a Camera we need to update the parent_id of Laptop record

UPDATE category SET parent_id = 6 where id = 9;

If we only want to move Laptop to Camera and promote its descendants we first update the parent of Level 1 children of Laptop, then update parent of Laptop.

UPDATE category SET parent_id = 7 where parent_id = 9;
UPDATE category SET parent_id = 6 where id = 9;

Step 5 - Deleting a parent.

To delete a node and its descendants, just remove the node itself, all the descendants will be deleted automatically by the DELETE CASCADE of the foreign key constraint.

So if we want to remove Laptop then -

DELETE FROM category WHERE id = 9;

Delete a node and promote it, descendants, we first promote the descendants and then remove the node.

UPDATE category SET parent_id=7 where parent_id=9;
    DELETE FROM category WHERE id = 9;

Conclusion

In my experience when you have a hierarchical relation that has nodes inserted or moved regularly, nested sets causes a lot of Database transaction locks and tends to be slow. We had faced issues when our hierarchy relationship was broken. If you implement nested set using mike hillyer methods, you will not be able to add unique constraints which mean the data can be corrupted quite easily.

From the StackOverflow

The Nested Set Model is nowadays not commonly used in databases, since it is more complex than the Adjacency List Model, given the fact that it requires managing two “pointers” instead of a single one. Actually, the Nested Set Model has been introduced in databases when it was complex or impossible to do recursive queries that traversed a hierarchy.

On the other hand, adjacency set offers much faster select, inserts and update. I found it to be much reliable and especially with CTE introduction, I don't see why we need to use a nested set.

If you want to run more performance test on adjacency set with a much larger database, I would recommend downloading data from ITIS.gov they have around half a million records saved in adjacency set.

In my future article, I will be covering how to record a particular hierarchy before any insert or update to the tree. This is useful if you want to show historical hierarchy at a given point in time.

References

POSTED IN MYSQL, CTE, BLOG