I’m doing some work over the next few weeks creating visualisations, using Trifacta for data prep and QlikSense to make them pretty. I’ve spent a little bit of time getting my head around Trifacta and thought I’d help others get started.
Trifacta is really easy to install, simply download the file and run through the setup.
First off; download either the Mac version or Windows installer from:
- Mac: Open the downloaded file and drag Trifacta to your Applications folder
Windows: Run installer, review and accept the license agreement and install
- Once finished, load up ‘Trifacta Wrangler’ from the start menu.
- You’ll need to register (or login if you already have an account), fill in the details, accept the license agreement (assuming you agree to it), register and you’re good to go.
LOADING THE DATA
After you’ve launched it, point it to the data you’re after. It’s smart, but the file structure still needs to be a table form. Just out of curiosity I pointed it at an Excel file which was part table, part descriptions and it wasn’t sure what to do with it. Not really surprising, but worth a test – it’s a smart tool, but not a mind reader.
- Load up Trifacta
- Click the ‘New Dataset’ button
- Add File > Choose File > Navigate to your file for upload
- Choose the sheet you want to import
- Give it a name and an optional Description > Create and Transform
This gets the data into Trifacta ready for manipulation. The next screen seems a bit daunting at first, unless you watch the tutorials. It’s not instantly clear what to do next, other than type some code in the transformation logic. Don’t worry, the application’s much smarter than that and a lot more user-friendly!
TRANSFORMING THE DATA
First off, you’ll notice it’s profiled the data for you, this gives you a great indicator as to what needs review. Trifacta works by selecting an example of what you want to transform, literally just highlight the piece of data you want to change and you’ll see options at the bottom of the screen.
Trifacta attempts to work out the types of fields you’re loading. My first column contains both text and commas, so it has detected it as a text field. Understandable mistake, and an easy one to resolve by clicking on the field type the ‘ABC’ button at the left of the field header.
After I changed the type to number it told me that all of the data is wrong, so I needed to make some changes. I also noticed that there were a couple of values in my ‘Date_end_of_period’ field were causing issues too, represented in the Data Quality bar at the top by a little red blip.
First off, we need to select the problem by highlighting it with our cursor (click and drag). I highlighted just the first quotation before the number in row 1. Trifacta then worked out that I’m probably trying to do this for the quotations around my value and has some suggested fixes:
I was happy with the first suggestion, so I selected Replace. In this case, I replaced it with nothing and just removed. I did that by simply clicking on the suggestion, then clicking on the ‘Add to Script’ button. I did the same for the comma to remove it, then the data quality bar changed to show that the majority were recognised but there were still a couple of problem values – the ‘ABC’ and ‘DEF’. I did exactly the same thing – highlight them and choose the Replace value.
Once you’ve chosen the transformation type, you can modify what it’s doing by clicking on the modify script button at the right.
The bar on the right shows your transformation history and clicking one will roll the transformations back to that point. When you hover over you get the option to edit, allowing you to tweak the transformation code, e.g. Add a value to replace with, rather than blank, and delete, reverting it back to the original values.
After you’ve told it what to do the only thing left is to ‘Generate Results’. Let Trifacta know the format you want then click the All Seeing Eye-con and it’ll show you where to find it:
Good luck on your Trifacta journey!
Keep exploring! Daniel.