Select Page

Trifacta Logo

Over the last few weeks I’ve been working through a series of ‘Question Time’ blogs, taking a look at the many sets of Open Data available from NZ and the wider world. Putting these together took a bit of time and a couple of tools so thought as well as sharing the outputs, I’d share the journey and learning too. Those awesome tools were Trifacta and Qlik Sense … and a little bit of Excel. Most of the data came in XLS or CSV form, with a KML thrown in for some mapping. To get from raw source to report ready I used Trifacta to prep my data.

When I started with Trifacta I had the wrong idea about it, I was trying to use it to change individual values in the data, but that’s not really what it’s designed for. I needed to change my thinking, it’s designed as an ETL (Extract, Transform, Load) tool. For those that aren’t familiar with the term, think about it as a tool to apply the same change across all of your data, not to edit individual values. While, technically, you can do that, it’s not really Trifacta’s purpose. Once I shifted my thinking, I realised it was an awesome on-the-fly ETL tool to prep my data. The most impressive part is that it doesn’t really need any kind of coding expertise just an understanding of the data itself and how you want to change it.

In the spirit of keeping my process consistent (and simple), I transformed in Trifacta before bringing it into Qlik Sense. That way I knew I had a clean, ready to go data set with minimal configuration (and mess) in Qlik.

First off, if you haven’t used Trifacta before, check out this guide to getting started. It’ll tell you all about installing, through to loading the data and the basics to get started. Once you’re familiar with the app, then the examples below will help you with some of those transformations.

When I first loaded the file I noticed a couple of pre-canned transformations, Trifacta worked out the format of my file and added in the transformations needed to parse the file. What’s cool is you can edit them, even loading the file is customisable! The three it added for my CSV were:

SPLIT ROWS
This looks for an end of line character and creates a new row each time it finds one.
EXAMPLE:  splitrows col: column1 on: ‘\n’ quote: ‘\”‘

SPLIT COLUMNS
This looks for a separator character, in my case it was a comma, Trifacta then breaks them out into columns.
EXAMPLE:  split col: column1 on: ‘,’ limit: 24 quote: ‘\”‘

REPLACE VALUE
The last function was to remove the quotes that signify a value. It does this by simply replacing them with nothing.
EXAMPLE:  replace col: * on: ‘\”‘ with: ” global: true

You can see and edit the script items using the Open Script button (top button circled below).

Trifacta Screenshot - Initial Parsing

On top of those initial transforms I needed to remove some extra lines and get rid of the empty columns. You can see by the bars at the top that my data is inconsistent due to the extra data in the file, so to get my data report ready I added a few more changes:

DROP COLUMN
Used to remove a column. Due to the formatting of the file, it had a small gap in between each column.
EXAMPLE:  drop col: column6

DELETE ROW
Used as a way to remove rows based on the value in a specific column, be careful – this will remove any rows which match that specific value.
EXAMPLE:  delete row: column2 == ‘Value to Exclude’

HEADER
Used to identify the top record as the header row, assigning those values to the column titles. In my example, I removed all of the descriptive rows first before applying the header row, ensuring the values were the top record.
EXAMPLE:  header

MERGE
Used as a way of merging two fields. For my example I had a month and year column as separate and wanted to join the two together separated by a hyphen.
EXAMPLE:  merge col: column3,’-‘,column2

RENAME
Used to rename columns in the data.
EXAMPLE:  rename col: column27 to: ‘Month’

Trifacta - Transformation Complete

Most of the transforms stripped out content I didn’t need, but there were a couple more; my dates were split over two columns (month and year), to chart them I needed to create a single value, Trifacta did this really simply.

Trifacta Merge Statement

I highlighted both columns (using the shift key) and scrolled along to Merge, it wasn’t quite perfect but close, I needed to flip them around and add a hyphen, so clicked on Modify script and edited it to the order I needed:

merge col: column3,’-‘,column2

After that I was ready to Generate Results and import into Qlik, ready to make something a bit more visual. If you want to find out what happened next, keep an eye out for Question Time: Behind the Scenes with Qlik Sense coming soon.

Keep exploring! Daniel.

%d bloggers like this: