Machine Learning Glossary

title: Glossary


  • A/B testing – A statistical way of comparing two (or more) techniques, typically an incumbent against a new rival. A/B testing aims to determine not only which technique performs better but also to understand whether the difference is statistically significant. A/B testing usually considers only two techniques using one measurement, but it can be applied to any finite number of techniques and measures.
  • Machine Learning – Intersection of statistics and computer science in
    order to teach computers to perform tasks without explicitly being programmed.
  • Neural Networks – Learning algorithms based on the brain’s neural structure. Neural networks consist of neurons that are connected to each other in different layers. The weight of the connections between the neurons is varied.
  • Deep Learning – An umbrella term for machine learning methods based on learning data representations as opposed to algorithms based on fulfilling a given task. It includes architectures such as deep neural networks, deep belief networks and recurrent neural networks.
  • Neuroevolution – An umbrella term for machine learning methods based on generating neural networks through weight, bias, and architecture through random mutations of the network. The most common forms of neuroevolution are Neuroevolution of Augmenting Topologies(NEAT) and Interactively Constrained Neuro-Evolution (ICONE).
  • Statistical Learning – the use of machine learning with the goal of
    statistical inference, whereby you make conclusions of the data rather than
    focus on prediction accuracy
  • Supervised Learning – Using historical data to predict the future. Example: Using historical data of prices at which houses were sold to predict the price in which your house will be sold. Regression and Classification come under supervised learning.
  • Unsupervised Learning – Finding patterns in unlabelled data. Example: Grouping customers by purchasing behaviour. Clustering comes under unsupervised learning.
  • Reinforcement learning – Using a simulated or real environment in which a machine learning algorithm is given input and sparse rewards to build a model to predict actions. Reinforcement learning has been used to train virtual robots to balance themselves and to beat games designed for humans.
  • Regression – A machine learning technique used to predict continous values. Linear Regression is one of the most popular regression algorithm.
  • Classification – A machine learning technique used to predict discrete values. Logistic Regression is one of the most popular classification algorithm.
  • Association Rule learning – A rule-based machine learning method for discovering interesting relations between variables in large databases.
f: x -> y Here 'f' is a function that takes 'x' as input and produces 'y' as output. If the output value 'y' is a real number / continous value then the function is a regression technique. If the output value 'y' is a discrete / categorical value then the function is a classification technique.
  • Clustering – Grouping of unlabelled data. Identifying patterns using statistics.
  • Dimensionality Reduction – Reducing the number of random variables in the data to get more accurate predictions.
  • Random forests– Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes or mean prediction of the individual trees.
    Bayesian networks-A Bayesian network is a probabilistic graphical model which relates a set of random variables with their conditional independencies via a directed acyclic graph (DAG). In a simple way it relates the random variable with their conditional independencies for the event prediction.It plays a crucial role in clues-to-cause relation.
    Bias-variance tradeoff– Bias is helpful because it helps us determine the average difference in predicted values and actual values, whereas variance helps us determine how different predications on the same dataset are differ from each other. If bias increases, then the model has a high error in the predictions, which makes the model underperfomed. A high variance makes the model overfit as the model trains itself continuously at only the given dataset and performs poorly on the data that it hasn’t seen yet. Finding a balance between bias and variance is the key to making a good model.

More Information:

This article needs improvement. You can help improve this article. You can also write similar articles and help the community.