Get the CSV file into S3 -> Define the Target Table -> Import the file
Get the CSV file into S3
Upload the CSV file into a S3 bucket using the AWS S3 interface (or your favourite tool). Make sure you have the right permissions on the bucket; The Access key you’ll use later needs the ability to read the file (by default only the User that created the bucket has access).
Define the Target Table
If not already created, you’ll need to create the table you’ll be populating. Make sure you have defined the columns correctly otherwise you’ll have some debugging to do later.
Something like (I say something like as I am not SQL guru 😉 )
CREATE TABLE <table_name> ( <variable_name> attributes, …);
Another thing to remember is to make sure you have granted the right permissions for the user that will be querying this table
GRANT SELECT on <table_name> to <user or group>;
Import the file
The simple bit, loading the CSV file into Redshift from S3 is one command.
COPY <table_name> FROM ‘s3://<bucket_name>/<csv_file>‘ CREDENTIALS ‘aws_access_key_id=< aws_access_key_id >;aws_secret_access_key=< aws_secret_access_key >‘ CSV <other_options> ;
And that is basically it.
Like always there are plenty of options on the COPY command that you’ll need to look at and there are good AWS resources to read
Tutorial: Loading Data from Amazon S3 (http://docs.aws.amazon.com/redshift/latest/dg/tutorial-loading-data.html)
Barry, Preventer Of Chaos.
Barry blogs about how to stop chaos in your systems
You can read Barry’s blog, AWS TIPS AND TRICKS: Automatically create a cron job at Instance creation, or all of Barry’s blogs here.
We run regular Agile courses with a business intelligence slant in both Wellington and Auckland. Find out more here.