Getting started with Trifacta

by | Nov 12, 2015

I’m doing some work over the next few weeks creating visualisations, using Trifacta for data prep and QlikSense to make them pretty. I’ve spent a little bit of time getting my head around Trifacta and thought I’d help others get started.
INSTALLING TRIFACTA
Trifacta is really easy to install, simply download the file and run through the setup.
First off; download either the Mac version or Windows installer from:
https://www.trifacta.com/trifacta-wrangler

  1. Mac: Open the downloaded file and drag Trifacta to your Applications folder
    Windows: Run installer, review and accept the license agreement and install
  2. Once finished, load up ‘Trifacta Wrangler’ from the start menu.
  3. You’ll need to register (or login if you already have an account), fill in the details, accept the license agreement (assuming you agree to it), register and you’re good to go.

LOADING THE DATA
After you’ve launched it, point it to the data you’re after. It’s smart, but the file structure still needs to be a table form. Just out of curiosity I pointed it at an Excel file which was part table, part descriptions and it wasn’t sure what to do with it. Not really surprising, but worth a test – it’s a smart tool, but not a mind reader.

  1. Load up Trifacta
  2. Click the ‘New Dataset’ button
  3. Add File > Choose File > Navigate to your file for upload
  4. Choose the sheet you want to import
  5. Give it a name and an optional Description > Create and Transform

This gets the data into Trifacta ready for manipulation. The next screen seems a bit daunting at first, unless you watch the tutorials. It’s not instantly clear what to do next, other than type some code in the transformation logic. Don’t worry, the application’s much smarter than that and a lot more user-friendly!
TRANSFORMING THE DATA
First off, you’ll notice it’s profiled the data for you, this gives you a great indicator as to what needs review. Trifacta works by selecting an example of what you want to transform, literally just highlight the piece of data you want to change and you’ll see options at the bottom of the screen.
Trifacta attempts to work out the types of fields you’re loading. My first column contains both text and commas, so it has detected it as a text field. Understandable mistake, and an easy one to resolve by clicking on the field type the ‘ABC’ button at the left of the field header.
Trifacta Example - wrong type
After I changed the type to number it told me that all of the data is wrong, so I needed to make some changes. I also noticed that there were a couple of values in my ‘Date_end_of_period’ field were causing issues too, represented in the Data Quality bar at the top by a little red blip.
Trifacta Example
First off, we need to select the problem by highlighting it with our cursor (click and drag). I highlighted just the first quotation before the number in row 1. Trifacta then worked out that I’m probably trying to do this for the quotations around my value and has some suggested fixes:
Trifacta Suggestions
I was happy with the first suggestion, so I selected Replace. In this case, I replaced it with nothing and just removed. I did that by simply clicking on the suggestion, then clicking on the ‘Add to Script’ button. I did the same for the comma to remove it, then the data quality bar changed to show that the majority were recognised but there were still a couple of problem values – the ‘ABC’ and ‘DEF’. I did exactly the same thing – highlight them and choose the Replace value.
Once you’ve chosen the transformation type, you can modify what it’s doing by clicking on the modify script button at the right.
Trifacta Modify Script
The bar on the right shows your transformation history and clicking one will roll the transformations back to that point. When you hover over you get the option to edit, allowing you to tweak the transformation code, e.g. Add a value to replace with, rather than blank, and delete, reverting it back to the original values.
Trifacta - Transformation Code
After you’ve told it what to do the only thing left is to ‘Generate Results’. Let Trifacta know the format you want then click the All Seeing Eye-con and it’ll show you where to find it:
Trifacta - View Results
Good luck on your Trifacta journey!
Keep exploring! Daniel.

0 Comments

Trackbacks/Pingbacks

  1. Guest Post - Getting Started with Trifacta | Trifacta - […] – Daniel Reed is a Process, Business and Data Analyst who has allowed us to repost his OptimalBI blog. If you’d also like…
  2. Question Time: Behind the scenes with Trifacta | OptimalBI - […] off, if you haven’t used Trifacta before, check out this guide to getting started. It’ll tell you all about…
  3. Guest Post - Question Time: Behind the Scenes with Trifacta | Trifacta - […] off, if you haven’t used Trifacta before, check out this guide to getting started. It’ll tell you all about…
Submit a Comment

Your email address will not be published. Required fields are marked *