The first question all newbies willing to learn a new skill in machine learning is which algorithm they should go for? Well, this question is always a good starting point since there is a wide variety of machine learning algorithms out there and you can easily get lost especially for beginners.
More so, even experienced data scientists cannot come with a clear-cut answer on which algorithm is best before putting them all to test. However, to get the right answer on which algorithm you should go for, there are a couple of factors you ought to consider firsthand.
- Available processing time.
- How quickly do you need the job done?
- The type of data you are working with.
Different types of machine learning tasks.
Typically, the factors above will most certainly lead you to the various types of machine learning tasks present that we are going to discuss in a short while.
- Reinforcement Learning
This kind of machine learning task leans more towards how software developers need to take some action in specific settings to boost cumulative reward.
- Supervised Learning
Supervised learning involves inferring a function from a labeled training set.
- Unsupervised Learning
Here in you have less information about the training set meaning that your training data is unlabeled.
- Semi-supervised Learning
In standard fashion, this type of learning is like a blend between supervised and unsupervised learning. In this instance, you use both labeled and unlabeled data.
Well, with all that being said and done, here is a list of standard machine learning algorithms.
They tend to mimic a human’s decision-making process. Decision trees are famous because they are very quick for making predictions, easy to interpret, you don’t have to do any special preparation for your data, and are in most instances accurate.
If you are tackling binary classification, then logistic regression is the go-to algorithm. Logistic regression offers a sigmoid function for binary classification and a linear combination of parameters. Ideally, logistic regression is more efficient when you eliminate attributes that are correlating and those that are not related to the output.
Commonly identified as the new kid in the block of data science. In most cases, neural networks can be used on several tasks. However, one drawback about them is its training always involves large and advanced computations.
K-means aka K- Nearest Neighbors
K-means is a straightforward and efficient machine learning algorithm. Predictions are arrived at by searching through the training data for the K closest instances and summarizing them to find the output variable.
This algorithm requires vast storage space but only trains when a prediction is required.
I bet it is one of the most popular and well-understood machine learning techniques. If you are new into machine learning, linear regression can be an excellent technique to start with. Point to note, when using linear regression make sure you get rid of correlated variables and if possible remove any noise present in your data.