OptimalBI | We do cool sh!t with data

What do I want you to get out of this blog? Well I want you to use EG to lay out SAS processes and tasks beautifully.

Imagine the best hotel room you ever stayed in, think of how it was the first moment you walked in, everything was in its place, everything was visible, easy to get at. Imagine the bathroom; soap, shower gel, shampoo, shower cap; everything neat and arranged just where you need them. Now think of your house now or even your flat when you were at University – are things where you need them or is everything is under a pile of clothes or dishes or pizza boxes? Right now when we build our process flows in EG we have the choice to be scruffy students or the Burj al arab in Dubai.

Imagine if everyone in your company used EG in the same way, laid their projects out in such a way that the analyst in accounts could easily pick up and edit the analyst from marketing’s project, or the graduate with 2 weeks experience can find their way around the project that the SAS veteran of 15 years has created, sound good to you? Then keep reading.

Process flows
Let’s get straight into it! One of the best things about EG is process flows. I have heard these in EG compared to tabs in Excel, which makes sense to a degree. We have all seen Excel spread sheets with 100’s of tabs, please don’t try this in EG – 10 to 15 would be a maximum.
The process flows allow you to split your tasks into logical groupings.
When a whole project is run the process flows will run in the order that they are in the process explorer (you can grab these and move them up and down). Number and name your process flows for neatness and in case they get moved around you know which order that they need to run in.
Put in additional flows at the start your project, these flows will tell users what the project is for, how and when the project should be run, who can run the project, what each process flow does, what variable can and may need to be changed and most importantly a change log.

A neat project could look something like this
Process flows
0. Introduction and change control
1. Parameters
2. Extract data
3. Prepare Data
4. Clean Data
5. Transform data
6. Verify Data
7. Output Data

Comments node
The comments node is awesome. This little node should be used liberally – everywhere. Each process flow should have at least one comment node in the top left hand corner to tell any user what is contained in that flow and what it does.
Logical flow separation
Neatly spread your linked tasks out in your workspace window. If you are finding you have more than 40 nodes in your window consider splitting into another process flow. Say your ‘Accounts and Billing’ process flow gets to 60 nodes, then split it into 1 process flow for billing and 1 for accounts of 30 each. If your project gets to more than 15 process flows then I would split it into two projects. EG and your desktop only have a certain amount of resource, you need to be cognisant of this.
Your flow would ideally have lines of nodes tied together, sometimes merging sometime diverging, but no criss-crossing and random connections all over the place.
Flow labelling
I recommend you use a comments node at the start of each path of nodes. Say you have one ‘Import data’ task that pulls from a spread sheet , and one ‘Query’ task that pulls in data from a library, each path does cleaning and sorting tasks before they meet in a query task to join the data. Each path should have a comments node explaining very simply what the data is and where it is coming from.
Part 3b coming soon…

Enterprise guide for SAS coders.(Part 3a – Hotel Paradiso)