Main objective of this project is analysis the provided data file by using the data mining tools. This project divided into five tasks such as data acquisition, data pre-processing, mining tool preparation, clustering analysis and visualization. In data acquisition, user needs to download the project data file like Ebola Discussion. In Data pre-processing, user needs to extract the substring on the each field. This process is used to preserve the data mining analysis and it improve the performance like tokenization, steaming, name entity recognition and stop word removal. It also impute the missing values in the each fields. In mining tool preparation, user needs to download and install the Weka explorer. After, install the Explorer. Then, open the provided data file. Finally, remove the attributes or fields that user think are not meaningful for pattern analysis. In clustering Analysis, user needs to cluster the provided data file. Finally, user needs to provide the visualization of the provided data file. These are will be discussed and analysed in detail.
In data acquisition, user needs to download the project data file like Ebola Discussion. The Provided data file is illustrated as below (Han, Kamber & Pei, 2012).
In Data pre-processing, user needs to extract the substring on the each field. This process is used to preserve the data mining analysis and it improve the performance like tokenization, steaming, name entity recognition and stop word removal. It also impute the missing values in the each fields. The provided data file is successfully completed the data pro-processing process (Hancock, 2012).
The Weka is one of data mining software which is used to provide effective data mining process and it uses a collection of machine leaning algorithms to provide the effective mining process. Weka is a collection of tools for:
Here, user needs to download and install the Weka explorer. After, install the Explorer Men. Then, open the provided data file. It is illustrated as below (Mitsa, 2010).
Finally, remove the attributes or fields that user think are not meaningful for pattern analysis by using the below steps. Choose Filter to apply the String to Word Vector, for transforming MESSAGE string into a vector of words.
The cluster analysis is used to identify the occurrences groups and similarities within the provided data file that is Ebola Discussion. Basically, the cluster analysis uses the training set, percentage split, and classes and supplied set. Also, clustering analysis has options to ignore the some attributes the from the provided data file based on the requirements. The clustering algorithms has the following schemes such as farthest first, x-means, EM, K-Means and cobweb. Here, we are using the k-Means analysis to analysis the Ebola Discussion data file. Generally, the clustering allows a user to create the groups of data to determine the data patterns on the given data file based on the project requirements. The clustering has one defining benefit compared to the classification is that every attributes are used to analyse the provided data (Stahlbock, Abou-Nasr & Weiss, 2018).
In clustering Analysis, user needs to cluster the provided data file by using the below steps.
K Means
======
Number of iterations: 2
Within cluster sum of squared errors: 60131.00000000001
Initial starting points (random):
Cluster 0: 'A professor in U S is telling Liberians that the Defense Department manufactured Ebola _URL_ via','Mon Sep 29 13:51:10 +0000 2014'
Cluster 1: 'Goodluck Jonathan We Conquered Ebola We ll Crush Boko Haram President says President Goodluck Jonathan sai _URL_','Mon Sep 29 12:35:57 +0000 2014'
Missing values globally replaced with mean/mode
Final cluster centroids:
Time taken to build model (full training data) : 0.07 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 30434 (100%)
1 4 (0%)
Visualization of K Means is illustrated as below.
The K means results windows is used to display the centroid of each cluster as well as statistics on the number and percentage of instances assigned to different clusters. Cluster centroids are the mean vectors for each cluster. Thus, centroids can be used to characterize the clusters. Finally, we want to adjust the attributes of our cluster algorithm by clicking Simple K-Means. The output of simple K means algorithms shows the cluster 0 and cluster 1. The cluster 0 is used to shows the A professor in U S is telling Liberians that the Defense Department manufactured Ebola _URL_ via and the cluster 1 is used to shows the information about the Goodluck Jonathan We Conquered Ebola We ll Crush Boko Haram President says President Goodluck Jonathan sai _URL_. Each cluster shows us a type of behaviour in provided data file. The evaluation of training set is provided the following results (Veart, 2013).
Clustered Instances |
|
0 |
30434 (100%) |
1 |
4 (0%) |
This project successfully analysed the provided data file by using the data mining tools. This project divided into five tasks such as data acquisition, data pre-processing, mining tool preparation, clustering analysis and visualization. In data acquisition, user successfully downloaded the project data file like Ebola Discussion. In Data pre-processing, user effectively extract the substring on the each field. This process is used to preserve the data mining analysis and it also improve the performance like tokenization, steaming, name entity recognition and stop word removal. It also impute the missing values in the each fields. In mining tool preparation, user successfully downloaded and installed the Weka explorer. After, installed the Explorer. Then, open the provided data file. Finally, removed the attributes or fields that user think are not meaningful for pattern analysis. In clustering Analysis, user effectively cluster the provided data file. Finally, user effectively provided the visualization of the provided data file. These are discussed and analysed in detail.
Han, J., Kamber, M., & Pei, J. (2012). Data mining. Waltham: Morgan Kaufmann.
Hancock, M. (2012). Practical data mining. Boca Raton, FL: CRC Press.
Mitsa, T. (2010). Temporal Data Mining. Hoboken: CRC Press.
Spendler, L. (2010). Data mining and management. New York: Nova Science Publishers.
Stahlbock, R., Abou-Nasr, M., & Weiss, G. (2018). Data Mining. Bloomfield: C. S. R. E. A.
Veart, D. (2013). First, Catch Your Weka. New York: Auckland University Press.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2021). Information For Representation, Processing And Visualization. Retrieved from https://myassignmenthelp.com/free-samples/ait-664-information-for-representation-processing-and-visualization/entity-recognition.html.
"Information For Representation, Processing And Visualization." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/ait-664-information-for-representation-processing-and-visualization/entity-recognition.html.
My Assignment Help (2021) Information For Representation, Processing And Visualization [Online]. Available from: https://myassignmenthelp.com/free-samples/ait-664-information-for-representation-processing-and-visualization/entity-recognition.html
[Accessed 14 April 2021].
My Assignment Help. 'Information For Representation, Processing And Visualization' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/ait-664-information-for-representation-processing-and-visualization/entity-recognition.html> accessed 14 April 2021.
My Assignment Help. Information For Representation, Processing And Visualization [Internet]. My Assignment Help. 2021 [cited 14 April 2021]. Available from: https://myassignmenthelp.com/free-samples/ait-664-information-for-representation-processing-and-visualization/entity-recognition.html.
MyAssignmenthelp.com has been providing affordable essay help to thousands of students round the globe. Our essay assistance services can provide solutions and works on essay topics, which are intricate and rare. 200K + students trust us , not without a reason. It is because every assignment and essays have been crafted to precision, edited on various parameters by known essayists and finally proofread and referenced properly to give essays that drive envy
Answer Introduction Theologians, historians, philosophers along with common men have long been pondering on the identity of Jesus along with his attributes and purpose. According to Bockmuehl, Jesus posed questions to his disciples who people of His day perceived he was and further offered explanations ranging from John, the Baptist to one of the prophets of historic times. Several common men perceived Jesus as the son of living God which exh...
Read MoreAnswer Event Brief Analysis In the recent scenario, the people of metro cities have faced more pressure from their work as well as life. They lack in their time as well as energy to get healthy nutrition and take steps towards the physical activities which greatly affect the health of the individuals (Seys et. al., 2013). Perth is one of the biggest cities which are facing such issues. The health of the individuals in Perth affects at a great...
Read MoreAnswer Organizational Behaviour The essay is intending to provide an understanding of the applicability of organisational behaviour in the workplace through critical reading and writing, analysis as well as argumentation skills. The essay argument is ‘a servant leadership management style is the best approach for managing people in organisations’. Hence, a discussion about the pros and cons of this argument will be a major part of...
Read MoreAnswer: Introduction In order to analyze the marketing strategies adopted by a firm, it is necessary to have an overall idea about the operational process and marketing approaches adopted by the management of such business. The given section briefly focuses on the evaluation of Amazon’s operational process in terms of its marketing initiatives through the SWOT framework. In addition, the brief corporate aims and objectives have also bee...
Read MoreAnswer: Introduction: The upcoming development of the smart devices at various of the layers with the following by the integration of the network of the communications tends to introduce severe cyber threats. The interdependencies of the functioning of the various sub system in the smart grid is generally affected by the smart attack tending to be vulnerable and reduction in the efficiency and reliability (Ben-Asher & Gonzalez, 2015). The...
Read MoreJust share requirement and get customized Solution.
Orders
Overall Rating
Experts
Our writers make sure that all orders are submitted, prior to the deadline.
Using reliable plagiarism detection software, Turnitin.com.We only provide customized 100 percent original papers.
Feel free to contact our assignment writing services any time via phone, email or live chat. If you are unable to calculate word count online, ask our customer executives.
Our writers can provide you professional writing assistance on any subject at any level.
Our best price guarantee ensures that the features we offer cannot be matched by any of the competitors.
Get all your documents checked for plagiarism or duplicacy with us.
Get different kinds of essays typed in minutes with clicks.
Calculate your semester grades and cumulative GPa with our GPA Calculator.
Balance any chemical equation in minutes just by entering the formula.
Calculate the number of words and number of pages of all your academic documents.
Our Mission Client Satisfaction
HE DID AN AMAZING JOB, here\'s my teacher\'s feedback: Nice work on a tough assignment. You are able to define important economic terms and put them to use with the conflict at Lake Smiley. Nice work defending your position with evidence from the re...
Australia
This assignment was delivered before the anticipated due date and exceeded my exptations. Very well written Thank you so much!!
Australia
Thank you for an excellent assignment, it only needed minor adjustments. The reference sources was absolutely magnificently done.
Australia
they did a good job I got an 81 precent which I am happy just wished it could\'ve been higher
Australia