The research aims to conduct a study on the technique of detecting anomalies in large-scale systems using machine learning and provide a rate of efficiency of the different algorithms used in the anomaly detection procedure.
Research Problem
Anomaly detection is a key activity in the monitoring of performance of large-scale systems. Anomaly detection allows noting cases, which are unusual within a large pile of data (Bao et al. 2018). In the present times, the world is driven by data and therefore it is very important to analyze such data accurately. Large-scale systems generally include a vast number of software components functioning on various computing nodes. The data are constantly stored in files and analyzed to note any cause for the problems in the system. As these logs will pile up with time, manually detecting anomalies becomes a hefty task and may lead to error prone analysis. Therefore, it is important that automatic anomaly detection technique is incorporated to point out every chaos that may be overlooked through manual analysis (Rouzbahani et al. 2020). Various industries include the anomaly detection techniques in their work infrastructure like financial services, retail companies, IT industries, defense and healthcare.
This research will study the technique used in anomaly detection. The chosen technique for research is machine learning (Erfani et al. 2016). The machine earning technique is chosen as it can efficiently handle large datasets and is highly adaptive. The machine learning algorithms will effectively detect and classify the anomaly through the vast complex datasets of large-scale systems. The main machine language algorithms that will be used on the data are KNN algorithms and Bayesian networks (Injadat et al. 2018). Anomalies detected in this research will be on a labeled dataset incorporating both anomalous and normal samples that will allow developing a predictive model and classifying the upcoming data points (Ying et al. 2021). A comparative evaluation will be conducted to check the efficiencies of the various machine-learning algorithms used in the research paper to detect anomalies. The following section will point out the objectives upon which the research is aligned.
Research Objectives
The various objectives of the research are –
The following section will provide the questions whose answers will be researched and provided in the project.
Research Questions
The research questions of the project are stated below.
RQ1: What are the machine learning algorithms used in the research? This question will work on describing the various algorithms of machine learning that are used in detecting anomalies in the system.
RQ2: How are the algorithms applied to detect anomaly in the provided dataset? This question will provide the working of the algorithm in finding the anomaly in the dataset
RQ3: What are the outputs gathered from the analysis of the algorithms? This question will compare the results gathered from the different algorithmic analysis
RQ4: Which machine language algorithm is most suited for anomaly detection in large-scale systems? This question will provide the answer as to which algorithm is most superior in detecting anomalies in large-scale systems.
Research Hypothesis
H0: The machine learning algorithms are not efficient in detecting anomalies in large-scale systems.
H1: The machine learning algorithms are efficient in detecting anomalies in large-scale systems.
The data used for the research is a primary data incorporating both normal and anomalous data values, which will be used to check the anomaly detection efficiency of the machine learning algorithms. Quantitative data analysis from a non-probability sampling is used to conduct the research and the data analysis methods are the four algorithms chosen in the research namely-KNN algorithm and Bayesian networks (Baker et al.2018).
As the research aims to measure the effectiveness of the above-mentioned algorithms, a statistical analysis will be provided after the results gathered from the algorithms are compared via inferential statistics using Excel chart tool. The research approach is an experimental one and the research philosophy of the research is realistic. The research aims to provide the real data without any tampering of values. The research will be aligned to the objectives set in the proposal and will comply with the laws and regulations of the country. The section below will present the timeline estimated for completion of the research project encompassing all the research objectives and questions.