Data mining. Textbook - страница 3

Шрифт
Интервал


By collecting and analyzing many different kinds of data, and performing mathematical analysis on the data, the data can be analyzed and statistics and other statistical tools can be used to produce results. In many cases, the use of numerical calculations to obtain real data can be very effective. However, this process usually requires real-world testing before data analysis.

Agent mining

Agent -based mining is an interdisciplinary field that combines multi-agent systems with data mining and machine learning to solve business problems and solve problems in science.

Agents can be described as decentralized computing systems that have both computing and communication capabilities. Agents are modeled based on data processing and information gathering algorithms such as «agent problem» which is a machine learning technique that tries to find solutions to business problems without any data center.

Agents are like distributed computers where users share computing resources with each other. This allows agents to exchange payloads and process data in parallel, effectively speeding up processing and allowing agents to complete their tasks faster.

A common use of agents is data processing and communication, such as the task of searching and analyzing large amounts of data from multiple sources for specific patterns. Agents are especially efficient because they don’t have a centralized server to keep track of their activities.

Currently, there are two technologies in this area that provide the same functionality as agents, but only one of them is widely used: distributed computing, which is CPU-based and often uses centralized servers to store information; and local computing, which is typically based on local devices such as a laptop or mobile phone, with users sharing information with each other.

Anomaly detection

In data analysis, anomaly detection (also outlier detection) is the identification of rare elements, events, or observations that are suspicious because they differ significantly from most of the data. One application of anomaly detection is in security or business intelligence as a way to determine the unique conditions of a normal or observable distribution. Anomalous distributions differ from the mean in three ways. First, they can be correlated with previous values; second, there is a constant rate of change (otherwise they are an outlier); and third, they have zero mean. The regular distribution is the normal distribution. Anomalies in the data can be detected by measuring the mean and dividing by the value of the mean. Because there is no theoretical upper limit on the number of occurrences in a dataset, these multiples are counted and represent items that have deviations from the mean, although they do not necessarily represent a true anomaly.