Introduction
Big Data makes extensive use of data mining techniques in order to make use of data that has been collected (Solove, 2008). Data mining extracts data from large databases such that they can be used by contemporary organizations. The process of data mining possesses substantial security, ethical and privacy threats. The current scope of analysis includes discussing implications that contemporary organization faces with relevant examples.
Big Data collects large and varied types of data and then data mining extracts such data. These data are further processed through data analytics such as to arrive at decisions (Abouelmehdi, Beni-Hssane, Khaloufi & Saadi, 2017). These data encompasses company related data, consumer data and other related information. This information extracted consists of various personal and confidential based information, which might pose ethical, security and privacy related threats. Therefore, companies need to make evaluation prior to using such data related to their data analytics purposes.
Security threats comprise of using data from old along with new databases. Data mining professionals working with large volumes of data might be making use of extensive data by extracting them (Jensen, Jensen & Brunak, 2012). Availability of huge volumes of data and information such as consumer records and confidential information might lead to considerable security threats.
Example of Security Threat: Customer using credit card in their purchase at a retail store and such information being made available to varied data analytics professional might pose considerable threats. It is for this reason that Tesco allows other associate companies to use limited part of its databases. It has special programs for data coding and protection for data extraction.
Privacy threats in data analytics might arise in case private information is collected from customers. Companies using Big Data might be making use of consumer information using technologies available. This might put consumer information at risk.
Example: IBM is devising mechanism for providing accurate data models, which can protect consumer privacy and confidentiality. Privacy threats are so well-spread that almost every other company today is making use of consumer data in analytics (Xu, Jiang, Wang, Yuan & Ren, 2014).
Ethical concerns connecting to data analytics might arise as consumers are never aware of their private information being used. Companies and government at certain times makes use of private consumer information for greater good. But in every case they need to make consumers aware regarding use of such information (Chen, Chiang & Storey, 2012). This would prevent any rise in ethical concerns of the Company and Government as well.
Example: Government making use of Income Tax return to estimate earning of a portion of population is an ethical concern.
Implications for Businesses Regarding Security, Privacy and Ethical Issues
Implications regarding security, privacy and ethical threats are considerable in the Big Data sector. Data mining has raised several concerns on brand names causing their defamation. Companies while making use of consumer data needs to make them aware through a disclaimer (McDermott, 2017). This would prevent any sort of ethical implications or legal proceedings in the future. It would also allow companies the opportunity to manage and establish their brand names for the future and build on consumer brand loyalty. Consumer awareness in recent period has led to rise to companies seeking steps towards information consumers. This would lead to better dealings with consumers and avoid legal litigation (Michael & Miller, 2013). Number of countries is now developing rules and regulations that prevent companies from undertaking unethical data and information of their consumers. Therefore, companies can avoid such legal proceedings and adopt sustainable procedures in their Big Data.
Data mining aimed at providing benefits in data analytics to businesses, poses considerable threats to general consumers. Companies need to consider possibilities to reduce any threats occurring from security, privacy and ethical related concerns. Consumers need to be made aware regarding usage of such data so that there is no defamation case against the brand name. This would allow prevention of brand name deterioration and conduct in an ethical manner.
- ANS: The DATA.txt file was created by including relation, attribute, and data keywords preceded by a “@” symbol. The ‘DATA.txt’ was later saved in .arff format, and opened in Weka. The screenshot of the initial screen of Weka has been provided in Figure 1.
- ANS: The data was analyzed in Weka and the histograms in Figure 2 reveal the characteristics of the information. Descriptive values of the attributes were observed in the process tab page and screenshots have been provided (Figure 3 – Figure 7). The number of missing values, mean, standard deviations along with maximum and minimum values were provided by the compiler for each attribute. The scatter plots from the Visualize tab page has been presented in Figure 8. High scattered (low correlation) nature of the data points was observed from the scatter plots.
- Unsupervised Discretize filter was applied to the Assignment-4 attribute and a screenshot of the filter output has been provided in Figure 9. The filter transformed the range of numeric attributes into nominal attributes, by simple binning. The shape seemed to be rightly skewed.
- The filling of the missing values in Weka Viewer window manually was possible and the screenshot has been provided in Figure 10. The ‘Replace Missing Value’ filter was used and the missing values were replaced by the mean values of the attributes. The filter outputs have been provided using screenshots (Figure 11 – Figure 15). ‘Replace Missing Value’ filter screen has been put in Figure 16, which suggested mode and mean as the probable replacement for the missing values
References
Abouelmehdi, K., Beni-Hssane, A., Khaloufi, H., &Saadi, M. (2017). Big Data security and privacy in healthcare: a review. Procedia Computer Science, 113, 73-80.
Chen, H., Chiang, R.H. and Storey, V.C., 2012. Business intelligence and analytics: from big data to big impact. MIS quarterly, pp.1165-1188.
Jensen, P.B., Jensen, L.J. and Brunak, S., 2012. Mining electronic health records: towards better research applications and clinical care. Nature Reviews Genetics, 13(6), p.395.
McDermott, Y. (2017). Conceptualising the right to data protection in an era of Big Data. Big Data & Society, 4(1), 2053951716686994.
Michael, K. and Miller, K.W., 2013. Big data: New opportunities and new challenges [guest editors' introduction]. Computer, 46(6), pp.22-24.
Solove, D. J. (2008).Data mining and the security-liberty debate. The University of Chicago Law Review, 75(1), 343-362.
Xu, L., Jiang, C., Wang, J., Yuan, J. and Ren, Y., 2014. Information security in big data: privacy and data mining. IEEE Access, 2, pp.1149-1176.