Working with JSON in Postgres

by | Jan 12, 2022

 

One of my customers has a very elegant multi-tenanted application with the flexible schema achieved by the extensive use of JSON. While base tables are the same, JSON allow a lot of customisation.  Unfortunately, the tradeoff of this is a lot of manipulation with the data on the way out of the database as none of the modern reporting tools has a good support of a complex nested JSON. So my job is to “relationalise” JSON in the reporting layer of the database to make it understandable by the reporting tools. Probably this is not a very typical scenario, but I got a lot of fun working with JSON in Postgres.

I have found some unexpected behaviour of the json_array_elements function. As I said, the data I’m dealing with allows a lot of customisation, so I expect some elements to be missing. Here is an example of such data.

CREATE TABLE test_json(
id integer,
objects json
)
INSERT INTO test_json(id, objects)
  VALUES (1, '{"a": 1, "dates": [{"date": "2010-01-01"},
{"date": "2011-01-01"},
{"date": "2012-01-01"}]}'),
      (2, '{"a": 2}'),
       (3, '{"dates": [{"date": "2015-01-01"}]}');

So, the JSON for my record 1 contains values for both keys “a” and “dates”, while records 2 and 3 only contain values for one of the keys. To “relationalise” this data I need to turn each key into column and turn values into rows. In case of key “a” it is very easy and straightforward.

SELECT id,
objects->>'a' as a
FROM test_json

Output:

id a
1 1
2 2
3 NULL

There’s no key “a” for the third row, so it’s NULL, which is ok.

There is a handy function json_array_elements that can do the job of unnesting the “dates” array for me into separate rows. It creates duplication, but it’s also fine with me.

SELECT id,
json_array_elements(objects->'dates')->>'date' AS dates
FROM test_json

Output:

id dates
1 2010-01-01
1 2011-01-01
1 2012-01-01
3 2015-01-01

There’s no value for row 2, so it’s not included into the result.

Unfortunately, when I combine everything in one query, the result I’m getting is aligned with the output of the json_array_elements function. I.e. even if there’s some extra data for the row 2, it is excluded from the result set.

SELECT id,
objects->>'a' as a,
json_array_elements(objects->'dates')->>'date' AS dates
FROM test_json

Output:

id a dates
1 1 2010-01-01
1 1 2011-01-01
1 1 2012-01-01
3 NULL 2015-01-01

I can assume from other examples of this function’s use in the internet that something like INNER JOIN happens behind the scenes to combine the data of different grain. Unfortunately, this isn’t mentioned in the documentation, so becomes a surprise to people new to Postgres like myself.

So, if I want to achieve the following result, I have to perform json_array_elements as a subquery and then left join with the rest of the dataset.

id a dates
1 1 2010-01-01
1 1 2011-01-01
1 1 2012-01-01
2 2 NULL
3 NULL 2015-01-01

Kate,
Data masseuse

Kate writes technical blogs about data warehouses, and is a Data Vault convert who works mostly with MS SQL Server.

You can connect with Kate on LinkedIn, or read her other blogs here.

0 Comments
Submit a Comment

Your email address will not be published. Required fields are marked *