I have been researching different ways that we can get data into AWS Redshift and found importing a CSV data into Redshift from AWS S3 is a very simple process.
Get the CSV file into S3 -> Define the Target Table -> Import the file
Get the CSV file into S3
Upload the CSV file into a S3 bucket using the AWS S3 interface (or your favourite tool). Make sure you have the right permissions on the bucket; The Access key you’ll use later needs the ability to read the file (by default only the User that created the bucket has access).
Define the Target Table
If not already created, you’ll need to create the table you’ll be populating. Make sure you have defined the columns correctly otherwise you’ll have some debugging to do later.
Something like (I say something like as I am not SQL guru 😉 )
CREATE TABLE <table_name> ( <variable_name> attributes, …);
Another thing to remember is to make sure you have granted the right permissions for the user that will be querying this table
GRANT SELECT on <table_name> to <user or group>;
Import the file
The simple bit, loading the CSV file into Redshift from S3 is one command.
COPY <table_name> FROM ‘s3://<bucket_name>/<csv_file>‘ CREDENTIALS ‘aws_access_key_id=< aws_access_key_id >;aws_secret_access_key=< aws_secret_access_key >‘ CSV <other_options> ;
And that is basically it.
Like always there are plenty of options on the COPY command that you’ll need to look at and there are good AWS resources to read
Tutorial: Loading Data from Amazon S3 (http://docs.aws.amazon.com/redshift/latest/dg/tutorial-loading-data.html)
Redshift Database Developer Guide (http://docs.aws.amazon.com/redshift/latest/dg/welcome.html).
Barry, Preventer Of Chaos.