Querying JSON data in Amazon Redshift

by | Apr 14, 2014

 
json
I’ll admit I’m not sold on the JSON format. However others have embraced the format and this leads to need to be able to store and query this type of data.
Amazon Redshift gives you a couple of options to deal with this data. Recently the Copy command has been enhanced to parse data from a file with data in JSON format. The issue with this approach is that you are expected to know exactly what data you want when you load the file  otherwise you will need to reload the file.
I believe an more elegant solution is to load each row of the JSON data into the database as VARCHARs and use the built in JSON parser capability to extract the desired data from the JSON as required.
For instance if you have the following JSON data.

{
  "stadium_id": 1,
  "name": "Eden Park",
  "location":{"city":"Auckland","country":"NZ" },
  "seats": 60000
}
{
  "stadium_id": 2,
  "name": "Westpac Stadium",
  "location":{"city":"Wellington","country":"NZ" },
  "seats": 35000
}
{
  "stadium_id": 3,
  "name": "Forsyth Barr Stadium",
  "location":{"city":"Dunedin","country":"NZ" },
  "seats": 30000
}
{
  "stadium_id": 4,
  "name": "Stadium Australia",
  "location":{"city":"Sydney","country":"AUS" },
  "seats": 83000
}

Create a table and insert the data

create table TEST.JSON_DATA(
    JSON VARCHAR(65000)
);
insert into TEST.JSON_DATA(JSON) values ('{ "stadium_id": 1, "name": "Eden Park", "location":{"city":"Auckland","country":"NZ" }, "seats": 60000}');
insert into TEST.JSON_DATA(JSON) values ('{ "stadium_id": 2, "name": "Westpac Stadium", "location":{"city":"Wellington","country":"NZ" }, "seats": 35000}');
insert into TEST.JSON_DATA(JSON) values ('{ "stadium_id": 3, "name": "Forsyth Barr Stadium", "location":{"city":"Dunedin","country":"NZ" }, "seats": 30000}');
insert into TEST.JSON_DATA(JSON) values ('{ "stadium_id": 4, "name": "Stadium Australia", "location":{"city":"Sydney","country":"AUS" }, "seats": 83000 }');

Using the JSON_EXTRACT_PATH_TEXT function you can write queries like the following to get value from the data

select
  json_extract_path_text(JSON,'stadium_id') stadium_id,
  json_extract_path_text(JSON,'name') name
from test.json_data
order by 1;
stadium_id    name
1             Eden Park
2             Westpac Stadium
3             Forsyth Barr Stadium
4             Stadium Australia
select
  json_extract_path_text(JSON,'stadium_id') stadium_id,
  json_extract_path_text(JSON,'name') name
from test.json_data
where json_extract_path_text(JSON,'seats') >50000
  and json_extract_path_text(JSON,'location','country') ='NZ'
order by 1;
stadium_id    name
1             Eden Park

Note the path element is case sensitive so if you get the case wrong you will get empty strings

select
  json_extract_path_text(JSON,'stadium_id') stadium_id,
  json_extract_path_text(JSON,'NAME') name
from test.json_data
order by 1;
stadium_id    name
1
2
3
4

 

 
All the code, all the fun – Ben
Ben writes blogs about the technical side of BI the code, all the code and not much other than the code.
You can read Ben’s blog Impression of BEAM✲ all of Ben’s blogs here.
We run regular Agile courses with a business intelligence slant in both Wellington and Auckland. Find out more here.

1 Comment
Submit a Comment

Your email address will not be published. Required fields are marked *