Querying JSON data in Amazon Redshift

By
Ben Lee
April 14, 2014


json


I’ll admit I’m not sold on the JSON format. However others have embraced the format and this leads to need to be able to store and query this type of data.
Amazon Redshift gives you a couple of options to deal with this data. Recently the Copy command has been enhanced to parse data from a file with data in JSON format. The issue with this approach is that you are expected to know exactly what data you want when you load the file  otherwise you will need to reload the file.
I believe an more elegant solution is to load each row of the JSON data into the database as VARCHARs and use the built in JSON parser capability to extract the desired data from the JSON as required.
For instance if you have the following JSON data.

{
 "stadium_id": 1,
 "name": "Eden Park",
 "location":{"city":"Auckland","country":"NZ" },
 "seats": 60000
}
{
 "stadium_id": 2,
 "name": "Westpac Stadium",
 "location":{"city":"Wellington","country":"NZ" },
 "seats": 35000
}
{
 "stadium_id": 3,
 "name": "Forsyth Barr Stadium",
 "location":{"city":"Dunedin","country":"NZ" },
 "seats": 30000
}
{
 "stadium_id": 4,
 "name": "Stadium Australia",
 "location":{"city":"Sydney","country":"AUS" },
 "seats": 83000
}

Create a table and insert the data

create table TEST.JSON_DATA(
   JSON VARCHAR(65000)
);
insert into TEST.JSON_DATA(JSON) values ('{ "stadium_id": 1, "name": "Eden Park", "location":{"city":"Auckland","country":"NZ" }, "seats": 60000}');
insert into TEST.JSON_DATA(JSON) values ('{ "stadium_id": 2, "name": "Westpac Stadium", "location":{"city":"Wellington","country":"NZ" }, "seats": 35000}');
insert into TEST.JSON_DATA(JSON) values ('{ "stadium_id": 3, "name": "Forsyth Barr Stadium", "location":{"city":"Dunedin","country":"NZ" }, "seats": 30000}');
insert into TEST.JSON_DATA(JSON) values ('{ "stadium_id": 4, "name": "Stadium Australia", "location":{"city":"Sydney","country":"AUS" }, "seats": 83000 }');

Using the JSON_EXTRACT_PATH_TEXT function you can write queries like the following to get value from the data

select
 json_extract_path_text(JSON,'stadium_id') stadium_id,
 json_extract_path_text(JSON,'name') name
from test.json_data
order by 1;
stadium_id    name
1             Eden Park
2             Westpac Stadium
3             Forsyth Barr Stadium
4             Stadium Australia
select
 json_extract_path_text(JSON,'stadium_id') stadium_id,
 json_extract_path_text(JSON,'name') name
from test.json_data
where json_extract_path_text(JSON,'seats') >50000
 and json_extract_path_text(JSON,'location','country') ='NZ'
order by 1;
stadium_id    name
1             Eden Park

Note the path element is case sensitive so if you get the case wrong you will get empty strings

select
 json_extract_path_text(JSON,'stadium_id') stadium_id,
 json_extract_path_text(JSON,'NAME') name
from test.json_data
order by 1;
stadium_id    name
1
2
3
4


All the code, all the fun – Ben

Ben writes blogs on the technical side of BI, the code, all the code and not much other than the code.

Connect with Ben on LinkedIn or read some of his other blogs here.

Copyright © 2019 OptimalBI LTD.