Use Trifacta to Wrangle your data

by | Nov 23, 2015

As mentioned in Shane Gibson’s blog, Trifacta = I can code! (and free data wrangling on your desktop), some of us at OptimalBI have had the pleasure of using Trifacta’s Desktop Data Wrangling tool.
A little something from their website

Data Wrangling to Simplify the Way People Work with Data

Trifacta enables organizations to use data to drive innovation by providing a more productive and accessible method of exploring and experimenting with data of all shapes and sizes. Data Analysts and Data Scientists have long relied on IT for access or for the preparation of diverse data. With Trifacta, a broad continuum of users, from business analysts to data scientists, are finally empowered to wrangle data themselves.

What is “Data Wrangling”?

Data wrangling is the process of taking data in its native format & making it usable for analysis.

I have been using it to see how easy it is to parse some logs (into csv or json) for a product of ours and I must say it is relatively easy to use (compared to other data manipulation tools) once you get your mind out of the coding mindset .
What do I mean by this?
Point it to a file and it will try to perform some initial simple transformations for you; For example splitting rows based on newline or carriage, and by delimiters if it thinks your file has them. If you don’t like what it does, you can simply delete the transformation with a click.
There is a point and click interface (although you can code if you want), with suggestions on what transformation and options to use. You highlight the text/area in the file that you want to do something with and Trifacta gives you options, like split just once or on the whole row or file. As you scroll through each option it displays what the result would look like.
suggestions
You build a recipe of transformation steps (sequential) one change at a time. Using the point and click method you systematically go through the steps you want to perform to ‘wrangle’ your file.  When you get the output/structure you require you can run the final recipe to get a structured csv or json file.
recipe
While you are transforming the file into the columns you want, the tool profiles the data in the columns. This enables you to spot anything out of the ordinary or see if you have missed something in the way you have split the file.
Profile
Now, this is nowhere the full functionality of Trifacta. In fact I have only used the tool to wrangle data into a format I like, the tool also provides profiling / validating and the ability to clean the data, etc. I am sure you’ll see some more blogs on this from my colleagues.
So if you need to play with some data and don’t know where to start I suggest having a look at Trifacta’s Desktop Data Wrangling tool.
 
Barry, Preventer of Chaos

2 Comments
  1. rondunn

    When you see something like Trifacta, how does it alter your thinking about ODE? The lines between “data wrangling” and “data warehouse automation” are becoming quite indistinct. We keep looking at each new release of Alteryx, and asking ourselves the same question.

    Reply
  2. Shane Gibson

    Hey Ron
    Data Vault is just one of the components that we need to deliver AgileBI. And thats the bit that ODE is focussed on delivering.
    Trifacta solves a small problem for us, which is the ability to quickly parse and transform data before we load it into the staging layer (http://www.ode.ninja/data-layers/).
    We have used Attunity Replicate successfully for a while to load CDC data in, but couldn’t find an easy to use (and cost effective) tool for parsing semi structured data.
    We have also been playing with the concept of having a fast view flow for discovery, vs a single view flow for governed. We have some blogs coming soon showing the results of us using Trifacta and then Qlik to do really quick discovery on data.
    Once discovery is over then we think you still need a governed and repeatable process for managing data, aka Data Vault.
    But it would be fair to say with innovative products like Trifacta, Snowflake etc arriving all the time, its hard to keep up!
    And agree on the fact that one day “data wrangling” and “DW Automation” will converge, perhaps when the hype over hadoop and Big data finally dies.
    Ps. Congrats on the speed your adding stuff to Ajilius, your making it hard for us to keep up! (Not to mention Roelant’s latest thing of beauty http://roelantvos.com/blog/?p=1570).
    Good to see ANZ innovating and kicking arse in the data vault software world, we need a world conference in Aussie!

    Reply

Trackbacks/Pingbacks

  1. Guest Post - Use Trifacta to Wrangle Your Data | Trifacta - […] Stevens, also known as the Preventer of Chaos, who wrote about data wrangling with Trifacta in his OptimalBI blog. If…
Submit a Comment

Your email address will not be published. Required fields are marked *