title: Data Mining
Data mining is the process in which large sets of data are analysed by computers in order to identify trends, and to solve problems which require complex data analysis.
It is an intermix of statistics, machine learning, and database systems.
What is Data Mining?
Data mining is the way in which computers search huge repositories of data and apply an algorithm to explore specific trends, patterns etc that the user is looking for.
The way in which data mining works is through the use of models, built by data scientists/programmers, which usually use complex mathematical algorithms. This utilises both the ingenuity of the programmer, and in turn the brute force of the computer. As a result, large-scale patterns can be identified by the computer, which help to better answer the question posed by the researcher.
Task Classes of Data Mining
Anomaly detection (outlier/change/deviation detection)
The identification of unusual data records, that might be interesting or data errors that require further investigation.
Association rule learning (dependency modelling)
Searches for relationships between variables. For example, a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.
It is the task of discovering groups and structures in the data that are in some way or another “similar”, without using known structures in the data.
It is the task of generalizing known structure to apply to new data. For example, an e-mail program might attempt to classify an e-mail as “legitimate” or as “spam”.
It attempts to find a function which models the data with the least error that is, for estimating the relationships among data or datasets.
It provides a more compact representation of the data set, including visualization and report generation.
Data mining has many uses, some of which include: banking, retail, astronomy, medicine and government security. It can also be used for deep learning, which involves the training of virtual ‘neural networks’ for specific tasks. A notable use of this was by the DeepMind team, who created the system AlphaGo, which beat the go world champion 4-1 in a series heralded as a historic breakthrough for artifical intelligence. AlphaGo involved the use of thousands of previous game records to train the neural network (i.e data mining).
The term “Data mining” was introduced in the 1990s, but data mining is the evolution of a field with a long history. Early methods of identifying patterns in data include Bayes’ theorem (1700s) and regression analysis (1800s). The proliferation, ubiquity and increasing power of computer technology has dramatically increased data collection, storage, and manipulation ability. As data sets have grown in size and complexity, direct “hands-on” data analysis has increasingly been augmented with indirect, automated data processing, aided by other discoveries in computer science, such as neural networks, cluster analysis, genetic algorithms (1950s), decision trees and decision rules (1960s), and support vector machines (1990s).
Data mining is the process of applying these methods with the intention of uncovering hidden patterns in large data sets. It bridges the gap from applied statistics and artificial intelligence (which usually provide the mathematical background) to database management by exploiting the way data is stored and indexed in databases to execute the actual learning and discovery algorithms more efficiently, allowing such methods to be applied to ever larger data sets.