Hierarchical JSON and RediSQL

RediSQL is compiled including the JSON1 SQLite extensions. Hence, all the functions documented in JSON1 are available out of the box.

JSON1 is extremely flexible and powerful, as an example consider a report table that track sales in a company by year, quarter and week.

> REDISQL.CREATE_DB DB
> REDISQL.EXEC DB "CREATE TABLE sales(year STRING, quarter STRING, week STRING, total INT);"
> REDISQL.EXEC DB "INSERT INTO sales VALUES('2019', 'q1', '1', 100);"
> REDISQL.EXEC DB "INSERT INTO sales VALUES('2019', 'q1', '2', 125);"
> REDISQL.EXEC DB "INSERT INTO sales VALUES('2019', 'q2', '1', 200);" 
> REDISQL.EXEC DB "INSERT INTO sales VALUES('2019', 'q2', '2', 300);" 
> REDISQL.EXEC DB "INSERT INTO sales VALUES('2020', 'q1', '1', 400);" 
> REDISQL.EXEC DB "INSERT INTO sales VALUES('2020', 'q1', '2', 450);" 
> REDISQL.EXEC DB "INSERT INTO sales VALUES('2020', 'q2', '1', 500);" 

From this table we would like to generate a JSON report in the form:

{'2019': 
  {'q1': {'1': 100, '2': 125}, 
   'q2': {'1': 200, '2': 300}}, 
 '2020': 
  {'q1': {'1': 400, '2': 450}, 
   'q2': {'1': 500}}}

This is not a trivial problem, because SQL generally does not like to return data in this format, just a string. However the JSON1 module is flexible enough and CTE provide us with enough expressing power.

Let's see the final query first and then we will try to understand it piece by piece.

WITH quarters AS (
  WITH weeks AS (
      SELECT year, quarter, json_group_object(week, total) AS week_json 
      FROM sales 
      GROUP BY year, quarter
      ) 
  SELECT year, json_group_object(quarter, json(week_json)) AS quarters_json 
  FROM weeks 
  GROUP BY year
) 
SELECT json_group_object(year, json(quarters_json)) 
FROM quarters;"

This returns exactly the single line we are looking for.

It seems a difficult query, but working on it piece by piece we can understand it quickly.

The WITH constructor simply create a "virtual table" valid for the execution of the query.

The simplest way to understand this query is going inside-out.

SELECT year, quarter, json_group_object(week, total) AS week_json 
FROM sales 
GROUP BY year, quarter

json_group_object is an aggreate query and it returns a JSON string with the weeks as key and the totals as values.

> REDISQL.EXEC DB "SELECT year, quarter, json_group_object(week, total) AS week_json FROM sales GROUP BY year, quarter" 
1) 1) (integer) 2019
   2) "q1"
   3) "{\"1\":100,\"2\":125}"
2) 1) (integer) 2019
   2) "q2"
   3) "{\"1\":200,\"2\":300}"
3) 1) (integer) 2020
   2) "q1"
   3) "{\"1\":400,\"2\":450}"
4) 1) (integer) 2020
   2) "q2"
   3) "{\"1\":500}"

In this way we are able to create a JSON document that express the total of sales for each week. We compres the week columns in a flat JSON document.

The next step is similar, for the year, we compress each quarter in a JSON document, the difficulties lays in maintaining the total of the weeks.

WITH weeks AS (
    --- same query as above
    SELECT year, quarter, json_group_object(week, total) AS week_json 
    FROM sales 
    GROUP BY year, quarter
) 
  SELECT year, json_group_object(quarter, json(week_json)) AS quarters_json 
  FROM weeks 
  GROUP BY year

We introduce the WITH statement. Using the WITH statement we treat the result of the query above as a new table that we can use in the later statement, the new table is called week. Note how we conveniently associate a name (week_json) to the result of the json_group_object aggregation. This is useful to manipulate that JSON object.

The rest of the query is very similar, we are compressing all the quarters into a flat JSON object.

> REDISQL.EXEC DB "WITH weeks AS ( SELECT year, quarter, json_group_object(week, total) AS week_json FROM sales GROUP BY year, quarter) SELECT year, json_group_object(quarter, json(week_json)) AS quarters_json FROM weeks GROUP BY year"
1) 1) (integer) 2019
   2) "{\"q1\":{\"1\":100,\"2\":125},\"q2\":{\"1\":200,\"2\":300}}"
2) 1) (integer) 2020
   2) "{\"q1\":{\"1\":400,\"2\":450},\"q2\":{\"1\":500}}"

This query provided us, for each year, a JSON hierarchical structure that map quarters and weeks tp total sales.

Now, we can guess the last step, compress the years into another hierarchical JSON structure.

This yield to the original query:

WITH quarters AS (
  --- same query as above
  WITH weeks AS (
      SELECT year, quarter, json_group_object(week, total) AS week_json 
      FROM sales 
      GROUP BY year, quarter
      ) 
  SELECT year, json_group_object(quarter, json(week_json)) AS quarters_json 
  FROM weeks 
  GROUP BY year
) 
SELECT json_group_object(year, json(quarters_json)) 
FROM quarters;"

And let's see the result:

> REDISQL.EXEC DB "WITH quarters AS ( WITH weeks AS ( SELECT year, quarter, json_group_object(week, total) AS week_json FROM sales GROUP BY year, quarter ) SELECT year, json_group_object(quarter, json(week_json)) AS quarters_json FROM weeks GROUP BY year) SELECT json_group_object(year, json(quarters_json)) FROM quarters;"

1) 1) "{\"2019\":{\"q1\":{\"1\":100,\"2\":125},\"q2\":{\"1\":200,\"2\":300}},\"2020\":{\"q1\":{\"1\":400,\"2\":450},\"q2\":{\"1\":500}}}"

The last result is a hierarchical JSON structure where the years map to the quartes, the quarters map to the weeks and the weeks map to the sales.