I’ll admit I’m not sold on the JSON format. However others have embraced the format and this leads to need to be able to store and query this type of data.
Amazon Redshift gives you a couple of options to deal with this data. Recently the Copy command has been enhanced to parse data from a file with data in JSON format. The issue with this approach is that you are expected to know exactly what data you want when you load the file otherwise you will need to reload the file.
I believe an more elegant solution is to load each row of the JSON data into the database as VARCHARs and use the built in JSON parser capability to extract the desired data from the JSON as required.
For instance if you have the following JSON data.
{
"stadium_id": 1,
"name": "Eden Park",
"location":{"city":"Auckland","country":"NZ" },
"seats": 60000
}
{
"stadium_id": 2,
"name": "Westpac Stadium",
"location":{"city":"Wellington","country":"NZ" },
"seats": 35000
}
{
"stadium_id": 3,
"name": "Forsyth Barr Stadium",
"location":{"city":"Dunedin","country":"NZ" },
"seats": 30000
}
{
"stadium_id": 4,
"name": "Stadium Australia",
"location":{"city":"Sydney","country":"AUS" },
"seats": 83000
}
Create a table and insert the data
create table TEST.JSON_DATA(
JSON VARCHAR(65000)
);
insert into TEST.JSON_DATA(JSON) values ('{ "stadium_id": 1, "name": "Eden Park", "location":{"city":"Auckland","country":"NZ" }, "seats": 60000}');
insert into TEST.JSON_DATA(JSON) values ('{ "stadium_id": 2, "name": "Westpac Stadium", "location":{"city":"Wellington","country":"NZ" }, "seats": 35000}');
insert into TEST.JSON_DATA(JSON) values ('{ "stadium_id": 3, "name": "Forsyth Barr Stadium", "location":{"city":"Dunedin","country":"NZ" }, "seats": 30000}');
insert into TEST.JSON_DATA(JSON) values ('{ "stadium_id": 4, "name": "Stadium Australia", "location":{"city":"Sydney","country":"AUS" }, "seats": 83000 }');
Using the JSON_EXTRACT_PATH_TEXT function you can write queries like the following to get value from the data
select
json_extract_path_text(JSON,'stadium_id') stadium_id,
json_extract_path_text(JSON,'name') name
from test.json_data
order by 1;
stadium_id name
1 Eden Park
2 Westpac Stadium
3 Forsyth Barr Stadium
4 Stadium Australia
select
json_extract_path_text(JSON,'stadium_id') stadium_id,
json_extract_path_text(JSON,'name') name
from test.json_data
where json_extract_path_text(JSON,'seats') >50000
and json_extract_path_text(JSON,'location','country') ='NZ'
order by 1;
stadium_id name
1 Eden Park
Note the path element is case sensitive so if you get the case wrong you will get empty strings
select
json_extract_path_text(JSON,'stadium_id') stadium_id,
json_extract_path_text(JSON,'NAME') name
from test.json_data
order by 1;
stadium_id name
1
2
3
4
All the code, all the fun – Ben