This is the short version:
- A new data core to improve memory with an open API for developers
- Improved integration with Cloud
- Better graphs
- Easier access to help and new tutorials for beginners
- Lots of bug fixes
The new data core
RapidMiner says that the headline feature of 7.5 is the new data core that manages data sets in a more efficient way. This is part of their focus on performance.
Some of the benefits they list with the new data core are:
- Lower memory footprint through more compact data representations. Better data management allows larger datasets to be processed.
- Sparse datasets are detected and special data structures handle them for an increase in efficiency.
- New data management modes (speed-optimized, auto, memory-optimized) allow you to select the optimal representation for your data.
- The data core now keeps data in a columnar format, which significantly speeds up many data preparation tasks that modify or create new attributes.
Developers are going to be excited that the new data management has an open API allowing them to create their own extensions. We have written a few blogs on APIs such as Getting Started with a Web API and Using a REST API for the first time if you want to find out how this could help you.
Connecting to the Cloud
RapidMiners ability to connect with data sources in the Cloud has been improved in two ways:
- New operators have been added for reading and writing data to/from Microsoft’s Azure Blob Storage.
- The operators Read Amazon S3, Loop Amazon S3, and Write Amazon S3 now support KMS encryption.
This will help you getting your data into RapidMiner and keeping it secure.
Getting data into RapidMiner and manipulating it is all very well but it isn’t much good unless the output is functional. Therefore, it’s great to hear that RapidMiner has made improvements to its graphs.
- In tree graphs, lines now have different thickness. The larger the branch the more information is being carried.
- For clustering graphs, the node size is now scaled according to the actual size of the cluster to make it easier to identify bigger clusters.
Easier access to help and new tutorialsThis is a hard one to tell you about. RapidMiner says that; ‘there is a new mechanism to provide help, advice messages, and even important announcements to the user.’ But they don’t say anything about what that mechanism is or how it works. Judging by the screenshot they show in the release documents (above) it seems to be a system that prompts you when content when you do something. It will be interesting to see what this is like to use.
The new tutorials are to help beginners with RapidMiner Server and RapidMiner Radoop.
The very detailed What’s New in RapidMiner Studio 7.5.0? outlines the new features and enhancements including improvements to:
- The undo and redo functionality
- Navigation up and down subprocesses
- The Remove Duplicates function
- The Execute Script
- Loading context data
- AutoMLP performance
There are also these additions and fixes:
- The memory leaks for Handle Exception, Select Subprocess, and Branch are fixed
- There is now more usable date and date time format defaults to choose from when importing data
- There is now a folder called buildingblocks in the .RapidMiner directory which will also be searched for .buildingblock files on startup.
There are also numerous bug fixes.
The absolute latest release is RapidMiner Studio 7.5.1 but this only come with two enhancements:
- There are now fewer unnecessary copies of example sets while running processes.
- The missing source description when opening data from the App Objects panel has now been added.
Success is preparation meets opportunity – Jack
Jack blogs about community, social media and how all this data stuff impacts the rest of us.
Want to read more? Try … What’s the difference between Qlik Sense and Qlik View? or more from Jack.
We run regular Data Requirements and Agile data warehouse training courses with an Agile business intelligence slant in both Wellington and Auckland