Recently I discovered that the concept of Machine Learning often seems confusing or even mysterious, even to IT professionals.

I conducted a small questionnaire asking my colleagues and friends what they knew and was surprised to get a number of different answers. Is Machine Learning just Neural Networks recognizing a cat in a picture? Or maybe a collection of powerful algorithms that can recommend a user a new blog to read based on previous behaviors? What about genetic algorithms? And of course, don’t forget Artificial Intelligence.

After completing a Machine Learning course and researching the problem I have summarized and structured what I learned to answer these common questions:

**What is machine learning?**

One of the most common formal definitions given by Tom Mitchell a known American computer scientist says:

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.

According to the above the main components are:

**experience**– data coming in different forms and formats: quantitative or qualitative; continuous or discrete; it can be represented as images, audio, video, text, time series, etc.**class of tasks**identifies what kind of problem we are willing to solve – either to recognize a cat on the picture, or identify if an email in your mailbox is a spam, or recommend Youtube channels you might like.**a computer program**– a black box algorithm (model, function) that given the experience as an input, for instance a financial transaction, can tell if it is fraudulent or not.**performance measure**– how good is a computer program at identifying objects on a given image? how many times it correctly identifies spam emails?

To paraphrase, given the set of situations **X (inputs)** and the set of the related reactions **Y (outputs)** we assume the implicit dependence between them exists. Set of the pairs **(X,Y)** is called **a training se**t. Then the ML problem can be described as finding the function that is able to **generalize** the previously unseen experience.

**Why do we use it all?**

The most important feature of ML algorithms is to learn from data and improve over time without being **explicitly programmed**. Can you imagine solving an email filtering task by enumerating the complete list of words that can be used in spam e-mails? Probably not. It is a rather daunting exercise if possible at all. Instead, we can use inductive and deductive reasoning the same way as we humans do in our daily life – inductively generalizing from experience and deductively predicting outcomes using the generalized knowledge we acquire.

**What are the main types of Machine Learning?**

There are several different ways to categorize the types of ML algorithms depending on **experience** structure and ways an algorithm interacts with it.

**supervised learning**is known also as learning with a teacher. Each example of training data set consists of an input and a labelled output value. For email filtering task it is a set of emails and {spam/not a spam} labels assigned to them. The learning task is to find the function that makes the most accurate estimation of Y on the training data-set;**unsupervised learning**or learning without a teacher, training examples consist simply of inputs X and the machine should explore data and detect patterns. For example, in a customer segmentation problem the algorithm should split customers into the groups with the similar features;**semi-supervised learning,**in many real-life scenarios, getting labelled data is a costly process, so a mixture of labelled and unlabeled examples may be used to train the model. Some of the use cases for this type**reinforcement learning,**the learning algorithm is constantly interacting with the environment and utilizing the observations to take actions that would minimize the risk or maximize the reward. The most typical example is a self-driving vehicle problem.

**What are the problems it can be used to solve?**

Among the some of the most common ML tasks are:

**classification**is assigning objects to the known groups according to the certain characteristics of the objects (is this a letter A on the picture?);**clustering –**the task is to explore the objects and group it according to their similarities (for instance, discover areas with a similar land use in an earth database);**regression**explores the dependency between variables and predicts a numeric value the opposite to classification which predicts a class label (say, estimate the price of the house if it has 2 bedrooms and 200 square meters in size and located in Mt Victoria area of Wellington?);**association rules**is a method of discovering non-obvious relationships in the data (what kind of products we can also suggest the customer should buy if they purchase a laptop?);**dimensionality reduction**comes into play when we are trying to find the minimal subset of the data that contains the same information as the initial data-set (for instance, compressing an image).

Genetic algorithms, neural networks, support vector machines, logistic or linear regression, k-means clustering, principal component analysis are only a few of the methods among a large number of others which are used to solve Machine Learning problems.

**How is experience organized?**

Organizing experience is a crucial part of the success of a learning process. The first step is called a **feature selection** and chooses data features that are non-redundant and relevant to the solving task. Then data is normally split into three subsets. Depending on the class of tasks and the data-set size the proportion of each sub-set may vary. The most important principle is to use non-overlapping examples in training and test sets.

**a training set**is used to train a model and optimize its parameters.**a cross-validation set**comes in handy when we choose among the models with different complexity in order to identify the best and the fastest learning one.**a test set**helps to evaluate

**How do we measure model performance?**

Performance metrics for the model vary depending on the class of ML tasks and are designed to measure how well the learning algorithm generalizes the knowledge, in other words, see how big the learning error is. The cross-validation and test sets described in the previous paragraph are commonly utilized here.

**Cost function **or** objective function** determines the algorithm’s learning error. It is manually constructed to compare actual values with predictions given by the model. The choice of cost function requires a good understanding of the subject area and what you are trying to achieve. Optimizing the cost function means tweaking the model’s internal parameters to make better predictions. In most cases, algorithms of different complexity are evaluated at the same time. Ideally, you want to select the model to avoid **over-fitting (or high variance)** when model fits training data too well and the opposite case called **under-fitting (or high bias)**.

The following step is to evaluate predictions given by the trained model. To name a few of the metrics:

- accuracy (what is the ratio of correct predictions out of the total number of examples), recall (how many examples out of truly positive cases have been classified as positive), precision, F1 score, etc – for classification problems;
- mean squared error, mean absolute error – for regression problems.

In the end, we will select the algorithm that gives us the most satisfactorily accurate results.

**Instead of conclusion**

Machine Learning is a vast cross-disciplinary area, combining knowledge from diverse subjects such as linear algebra, probability theory, calculus, algorithm design and data preparation. In many cases, it can be considered to be an art.

In reality, the most time-consuming part of the learning process is working to understand the task which needs to be solved in depth, studying the problem domain area, identifying the goals, then collecting necessary data, integrating them, cleansing, transforming into the format understandable by machine. All of them involve human participation. Is machine objective in the end? I find the saying by one computer science professor being appropriate:

Computers are stupid. You only get what you code.

Anastasia