POSTED IN JSON, PERFORMANCE, PHP, BLOG

PHP - Performance - Adjacency set to nested JSON

Before You Begin

In this article, we will compare the performance of Method 1 and Method 2 as used in - PHP – Convert Adjacency set to nested JSON.

We will be utilizing an extensive data set with approximately 500K records. I have downloaded these records from itis.gov and made some minor modification for this article. Data can be downloaded from here - https://drive.google.com/open?id=115_d5PV4OQ948cWP5TLXUpN2TvLvvo_y

We will be using MySQL CTE and break the data into chunks to test the performance for different size of records. You can refer to the CTE article here - MySQL – Adjacency List Model For Hierarchical Data Using CTE.

There is a slight change made to getDataFromDatabase() method as we want to limit the records to test the performance. Below you can find the updated definition. I have added the placeholder inside the CTE as [ID_OF_ENTITY]. This will be our starting point and then I will retrieve all descendants for this entity and convert it to a nested response.

Method 1

<?php

// Initialize php setting - @node - NOT RECOMMENDED FOR PRODUCTION, update your php.ini if needed.
error_reporting(E_ALL);
ini_set('display_errors', 1);
ini_set('memory_limit', '-1');
ini_set('max_execution_time', 30000);

// Get the tree array from DB
$treeArray = getDataFromDatabase();

// Transform the data
$outputTree = transformTree($treeArray, [ID_OF_ENTITY]);

// Output the response
header('Content-Type: application/json');
echo json_encode($outputTree[0]);

/**
 * Transform the tree
 *
 * @param $treeArray
 * @param null $parentId
 * @return array
 */
function transformTree($treeArray, $parentId = null)
{
    $output = [];

    // Read through all nodes of the tree
    foreach ($treeArray as $node) {
        
    // If the node parent is same as parent passed in argument
        if ($node['parent_id'] == $parentId) {
            
            // Get all the children for that node, using recursive method
            $children = transformTree($treeArray, $node['id']);
            
            // If children are found, add it to the node children array
            if ($children) {
                $node['children'] = $children;
            }
            
            // Add the main node with/without children to the main output
            $output[] = $node;
            
            // Remove the node from main array to avoid duplicate reading, speed up the process
            unset($node);
        }
    }
    return $output;
}

/**
 * Get all records from DB and return array
 *
 * @return array
 */
function getDataFromDatabase()
{
    // Create database connection
    $dbConnection = new mysqli("localhost", "root", "secret", "adjacency");
    
    // Get the result from DB Table
    $records = $dbConnection->query("
        WITH RECURSIVE tree AS
        (
          SELECT id, name, parent_id
            FROM entities
            WHERE id = [ID_OF_ENTITY]
          UNION ALL
          SELECT e.id, e.name, e.parent_id
            FROM tree AS t
              JOIN entities AS e ON t.id = e.parent_id
        )
        SELECT id, name, parent_id FROM tree;
    ");
    
    // Fetch all records
    // @MYSQLI_ASSOC - Columns are returned into the array having the field name as the array index.
    $output = mysqli_fetch_all($records, MYSQLI_ASSOC);
    
    // Close the connection
    $dbConnection->close();
    
    return $output;
}

Method 2

<?php

// Initialize php setting - @node - NOT RECOMMENDED FOR PRODUCTION, update your php.ini if needed.
error_reporting(E_ALL);
ini_set('display_errors', 1);
ini_set('memory_limit', '-1');
ini_set('max_execution_time', 30000);

// Get the tree array from DB
$treeArray = getDataFromDatabase();

// Group by parent id
$treeArrayGroups = [];

foreach ($treeArray as $record) {
    $treeArrayGroups[$record['parent_id']][] = $record;
}

// Get the root
$rootArray = $treeArray[0];

// Transform the data
$outputTree = transformTree($treeArrayGroups, $rootArray);

// Output the response
header('Content-Type: application/json');
echo json_encode($outputTree);

/**
 * Transform the tree
 *
 * @param $treeArrayGroups
 * @param $rootArray
 * @return mixed
 */
function transformTree($treeArrayGroups, $rootArray)
{
    // Read through all nodes where parent is root array
    foreach ($treeArrayGroups[$rootArray['id']] as $child) {
        
        // If there is a group for that child, aka the child has children
        if (isset($treeArrayGroups[$child['id']])) {
            // Traverse into the child
            $newChild = transformTree($treeArrayGroups, $child);
        } else {
            $newChild = $child;
        }
        
        // Assign the child to the array of children in the root node
        $rootArray['children'][] = $newChild;
    }
    return $rootArray;
}

/**
 * Get all records from DB and return array
 *
 * @return array
 */
function getDataFromDatabase()
{
    // Create database connection
    $dbConnection = new mysqli("localhost", "root", "secret", "adjacency");
    
    // Get the result from DB Table
    $records = $dbConnection->query("
        WITH RECURSIVE tree AS
        (
          SELECT id, name, parent_id
            FROM entities
            WHERE id = [ID_OF_ENTITY]
          UNION ALL
          SELECT e.id, e.name, e.parent_id
            FROM tree AS t
              JOIN entities AS e ON t.id = e.parent_id
        )
        SELECT id, name, parent_id FROM tree;
    ");
    
    // Fetch all records
    // @MYSQLI_ASSOC - Columns are returned into the array having the field name as the array index.
    $output = mysqli_fetch_all($records, MYSQLI_ASSOC);
    
    // Close the connection
    $dbConnection->close();
    
    return $output;
}

Performance Comparision

For performance comparison, I am going to use blackfire.io. We will be ignoring the performance of all other components like MySQL except for transformTree().

Results

ID_OF_ENTITY = 846539, ~5,000 DESCENDANTS

# CALLS FOR transformTree() EXECUTION TIME I/O WAIT TIME CPU PEAK MEMORY
Method 1 - Result 4,960 1.58 s 58.9 ms 1.52 s 3.88 MB
Method 2 - Result 847 88 ms 79.4 ms 8.65 ms 4.41 MB

ID_OF_ENTITY = 846542, ~10,000 DESCENDANTS

# CALLS FOR transformTree() EXECUTION TIME I/O WAIT TIME CPU PEAK MEMORY
Method 1 - Result 12,014 21.3 s 40 ms 21.2 s 8.28 MB
Method 2 - Result 1,807 185 ms 158 ms 27.4 ms 10.3 MB

ID_OF_ENTITY = 846535, ~20,000 DESCENDANTS

# CALLS FOR transformTree() EXECUTION TIME I/O WAIT TIME CPU PEAK MEMORY
Method 1 - Result 19,059 1 min 5 s 82.3 ms 1 min 5 s 13.6 MB
Method 2 - Result 3,061 179 ms 122 ms 57.6 ms 16.8 MB

ID_OF_ENTITY = 846504, ~50,000 DESCENDANTS

# CALLS FOR transformTree() EXECUTION TIME I/O WAIT TIME CPU PEAK MEMORY
Method 1 - Result 54,703 11 min 33 s 1.03 s 11 min 32 s 37.4 MB
Method 2 - Result 8,589 534 ms 404 ms 129 ms 47.1 MB

Conclusion

As you can see from the above result set, Method 2 is significantly faster than Method 1 when it comes to big data set.

I tried to run Method 1 on the complete data set with ~ 500K records, and I waited for more than an hr, but it never executed. On the flip side - Method 2 was still able to complete within 10 seconds.

The main reason for Method 2 being superior is because of the recursive method called significantly less than Method 1.

In future articles, I will talk about how you can best mock any third party service while running PHPUnit tests inside Laravel. As making a real connection slows down the complete suite.