As mentioned in Shane Gibson’s blog, Trifacta = I can code! (and free data wrangling on your desktop), some of us at OptimalBI have had the pleasure of using Trifacta’s Desktop Data Wrangling tool.
A little something from their website
Data Wrangling to Simplify the Way People Work with Data
Trifacta enables organizations to use data to drive innovation by providing a more productive and accessible method of exploring and experimenting with data of all shapes and sizes. Data Analysts and Data Scientists have long relied on IT for access or for the preparation of diverse data. With Trifacta, a broad continuum of users, from business analysts to data scientists, are finally empowered to wrangle data themselves.
What is “Data Wrangling”?
Data wrangling is the process of taking data in its native format & making it usable for analysis.
I have been using it to see how easy it is to parse some logs (into csv or json) for a product of ours and I must say it is relatively easy to use (compared to other data manipulation tools) once you get your mind out of the coding mindset .
What do I mean by this?
Point it to a file and it will try to perform some initial simple transformations for you; For example splitting rows based on newline or carriage, and by delimiters if it thinks your file has them. If you don’t like what it does, you can simply delete the transformation with a click.
There is a point and click interface (although you can code if you want), with suggestions on what transformation and options to use. You highlight the text/area in the file that you want to do something with and Trifacta gives you options, like split just once or on the whole row or file. As you scroll through each option it displays what the result would look like.
You build a recipe of transformation steps (sequential) one change at a time. Using the point and click method you systematically go through the steps you want to perform to ‘wrangle’ your file. When you get the output/structure you require you can run the final recipe to get a structured csv or json file.
While you are transforming the file into the columns you want, the tool profiles the data in the columns. This enables you to spot anything out of the ordinary or see if you have missed something in the way you have split the file.
Now, this is nowhere the full functionality of Trifacta. In fact I have only used the tool to wrangle data into a format I like, the tool also provides profiling / validating and the ability to clean the data, etc. I am sure you’ll see some more blogs on this from my colleagues.
So if you need to play with some data and don’t know where to start I suggest having a look at Trifacta’s Desktop Data Wrangling tool.
Barry, Preventer of Chaos