photo by Joshua Woroniecki
One of my customers has a very elegant multi-tenanted application with the flexible schema achieved by the extensive use of JSON. While base tables are the same, JSON allow a lot of customisation. Unfortunately, the tradeoff of this is a lot of manipulation with the data on the way out of the database as none of the modern reporting tools has a good support of a complex nested JSON. So my job is to “relationalise” JSON in the reporting layer of the database to make it understandable by the reporting tools. Probably this is not a very typical scenario, but I got a lot of fun working with JSON in Postgres.
I have found some unexpected behaviour of the json_array_elements function. As I said, the data I’m dealing with allows a lot of customisation, so I expect some elements to be missing. Here is an example of such data.
CREATE TABLE test_json(
id integer,
objects json
)
INSERT INTO test_json(id, objects)
VALUES (1, '{"a": 1, "dates": [{"date": "2010-01-01"},
{"date": "2011-01-01"},
{"date": "2012-01-01"}]}'),
(2, '{"a": 2}'),
(3, '{"dates": [{"date": "2015-01-01"}]}');
So, the JSON for my record 1 contains values for both keys “a” and “dates”, while records 2 and 3 only contain values for one of the keys. To “relationalise” this data I need to turn each key into column and turn values into rows. In case of key “a” it is very easy and straightforward.
SELECT id,
objects->>'a' as a
FROM test_json
Output:
id | a
1 | 1
2 | 2
3 | NULL
There’s no key “a” for the third row, so it’s NULL, which is ok.
There is a handy function json_array_elements that can do the job of unnesting the “dates” array for me into separate rows. It creates duplication, but it’s also fine with me.
SELECT id,
json_array_elements(objects->'dates')->>'date' AS dates
FROM test_json
Output:
id | dates
1 | 2010-01-01
1 | 2011-01-01
1 | 2012-01-01
3 | 2015-01-01
There’s no value for row 2, so it’s not included into the result.
Unfortunately, when I combine everything in one query, the result I’m getting is aligned with the output of the json_array_elements function. I.e. even if there’s some extra data for the row 2, it is excluded from the result set.
SELECT id,
objects->>'a' as a,
json_array_elements(objects->'dates')->>'date' AS dates
FROM test_json
Output:
id | a | dates
1 | 1 | 2010-01-01
1 | 1 | 2011-01-01
1 | 1 | 2012-01-01
3| NULL | 2015-01-01
I can assume from other examples of this function’s use in the internet that something like INNER JOIN happens behind the scenes to combine the data of different grain. Unfortunately, this isn’t mentioned in the documentation, so becomes a surprise to people new to Postgres like myself.
So, if I want to achieve the following result, I have to perform json_array_elements as a subquery and then left join with the rest of the dataset.
id | a | dates
1 | 1 | 2010-01-01
1 | 1 | 2011-01-01
1 | 1 | 2012-01-01
2 | 2 | NULL
3 | NULL | 2015-01-01
Kate,
Data masseuse