The Most Common Machine Learning Algorithms
For many novices, machine learning algorithms may seem too complicated to be mastered. Some may even find them uninteresting. In most instances, you may come across page-long descriptions of each algorithm, and admittedly, that’s hard to slog through. Putting the time and energy into memorizing every little detail of each algorithm can be a trying and painstaking experience, but if you truly want to be a machine learning expert you’ll need to brush up on your algorithm knowledge. There’s no way around it. But don’t stress! We’ll take this opportunity to simplify this learning experience and explain the core principles of the most common machine learning algorithms in clear terms. Broadly speaking, there are three types of machine learning algorithms, with specific algorithms in each category. We’ll break them down below.
There are two types of supervised learning algorithms, regression and classification. These learning models are built to make a prediction given an unknown input instance. These algorithms take a known set of input data and its known responses to the data (output) to learn the regression/classification model. A learning algorithm then trains a model to produce a prediction for the response to new data or the test dataset.1 Types of supervised learning algorithms include:
- Linear Regression:2 This regression is used to estimate real values, such as the cost of cars, total sales, number of enrollments, etc., based on continuous variables. In this algorithm, we establish a relationship between independent and dependent variables by fitting a best line. An easy way to think of linear regression is to imagine you’re asked to arrange a set of cars by increasing cost, but you’re not told their actual worth. You’ll likely use visual clues and line them up using a combination of visual parameters.
- Logistic Regression:3 Don’t let the name fool you. This algorithm is used for classification, not regression. Logistic regressions are specifically binary classification models. They can only be utilized to differentiate between two separate categories — like if someone is tall or short, or if a car is cheap or expensive given its price. This means that our data has two kinds of observations: Category 1 and Category 2 observations. You can also use this model to solve much more complex problems using a variety of traits, not just a single one.
- Naive Bayes:2 A Naive Bayes classifier assumes independence between predictors; it presumes that the presence of a specific trait in a group is unrelated to the presence of any other trait. For instance, a ball may be considered to be a football if it is brown, oblong and about seven inches in diameter. Even if these characteristics rely on each other or upon the presence of the other traits, a naive Bayes classifier would consider all of these attributes to individually contribute to the likelihood that this ball is a football. This model is known for its simplicity, usefulness for very large data sets, and its ability to surpass highly advanced classification methods.
- KNN:4 K-Nearest Neighbors (KNN) algorithm can be used for classification and regression problems, but the industry generally uses it for classification problems. KNN is an easy algorithm that saves all available cases and groups new cases by a popular vote of its k-nearest neighbors. The case allocated to the group is the most common amongst its K-nearest neighbors measured by a distance function. KNN can translate to our real lives easily: If you want to learn about someone, but know nothing about them, you may learn about their close friends and the groups they are involved in to gain more information.
There is no target or outcome variable to predict in this algorithm. This is used for grouping populations into different segments, much like customer segmentation. Examples of this type of algorithm include K-means, principal component analysis (PCA) and Apriori.
- K-Means:5 The purpose of this algorithm is simple: Group related data points together and identify underlying patterns. Its protocol follows a simple and straightforward approach to classify a given data set through a specific number of clusters (think k clusters). Data points within a cluster are homogeneous and heterogeneous to peer groups. Remember figuring out shapes from inkblots? K-means is comparable to this exercise. You look at the form and spread to interpret how many different clusters are present.
- Apriori:6 This is a traditional algorithm in data mining, used for mining frequent itemsets and related association rules. It is designed to run on a database containing many transactions; for example, items purchased by consumers in a store. It is vital for efficient Market Basket Analysis, and it assists the customers in buying their items more easily, which boosts market sales. The healthcare industry uses this algorithm to detect adverse drug reactions. It generates association rules that show which combinations of medications and patient characteristics lead to these reactions.
- Principal Component Analysis (PCA):7 If you've ever tried to solve a problem and found there are too many variables, you know how overwhelming it can be. So how do you reduce the variables and only focus on certain ones? Principal component analysis is a system for feature extraction — so it combines input variables in a precise way, enabling you to cut the "least important" variables while still considering the most relevant aspects of all of the variables. Each "new" variable after PCA are all independent of each other, which is an advantage since the assumptions of a linear model require these variables to be independent. If we choose to fit a linear regression model with these "new variables, this assumption will unquestionably be satisfied.
Much like teaching your dog to do tricks, you can train a machine to make specific decisions. Reinforcement learning exposes a machine to a specific environment where it continuously trains itself through trial and error. It attempts to find the best possible knowledge to make accurate business decisions. Types of reinforcement learning algorithms are the Markov decision process, Monte Carlo methods and Temporal Difference (TD) algorithms.
- Markov Decision Process:8 The model enables machines and agents to learn the ideal behavior within a particular environment in order to maximize the model’s ability to achieve a specific state in an environment or even multiple states, depending on what you want to accomplish. This goal is decided by a policy, which is applied to the agent’s actions depending on the environment, and the Markov Decision Process endeavors to optimize the measures taken to reach such a goal. This optimization is achieved with a reward feedback system, where various actions are weighted depending on the predicated state these actions will cause.
- Monte Carlo Methods:9 Monte Carlo methods are a subgroup of computational algorithms that use the system of repeated random sampling to produce numerical estimates of unknown criterion. They provide for the modeling of complicated situations where several random variables are included and evaluating the impact of risk. The uses of these methods are astonishingly broad and have led to several groundbreaking discoveries in the fields of physics, game theory and finance. There are an extensive range of Monte Carlo methods, but they all similarly rely on random number production to solve deterministic problems.
- Temporal Difference Method:10 Imagine you're traveling, and at the beginning of your journey, the GPS provides an estimated time of arrival (ETA) based on statistical data (i.e., traffic, road conditions, speed and number of stops). As you carry on and encounter traffic, clear roads, or make pit stops, the GPS refines its estimate and gives you updated ETAs. You see that at each stage of the journey, the GPS provides an ETA. This is what Temporal Difference (TD) provides. In TD, we amend the current state based on the estimate of the subsequent state. Referring back to the GPS example, your GPS may notice that you've slowed down to 30 mph so it updates its ETA estimate to add 30 minutes to the trip. But then you pass the traffic jam within a few minutes and are speeding back up, so the GPS takes 20 minutes off the ETA. The more you take this journey (playing more episodes), the more refined your estimates will become with each trip.
Dive into Data Science with EmergingEd
If you’re a professional aspiring to advance your career with high-impact skills in machine learning and data science, consider how an online data science course from EmergingEd can help you make your next move.
- Retrieved on March 6, 2020, from sciencedirect.com/topics/computer-science/supervised-learning
- Retrieved on March 6, 2020, from analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/
- Retrieved on March 6, 2020, from towardsdatascience.com/logistic-regression-explained-9ee73cede081
- Retrieved on March 6, 2020, from towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761
- Retrieved on March 6, 2020, from towardsdatascience.com/understanding-k-means-clustering-in-machine-learning-6a6e67336aa1
- Retrieved on March 6, 2020, from hackerearth.com/blog/developers/beginners-tutorial-apriori-algorithm-data-mining-r-implementation
- Retrieved on March 6, 2020, from towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c
- Retrieved on March 6, 2020, from towardsdatascience.com/understanding-markov-decision-processes-b5862c192ddb
- Retrieved on March 6, 2020, from towardsdatascience.com/an-overview-of-monte-carlo-methods-675384eb1694
- Retrieved on March 6, 2020, from towardsdatascience.com/td-in-reinforcement-learning-the-easy-way-f92ecfa9f3ce