Get Instant Help From 5000+ Experts For
question

Writing: Get your essay and assignment written from scratch by PhD expert

Rewriting: Paraphrase or rewrite your friend's essay with similar meaning at reduced cost

Editing:Proofread your work by experts and improve grade at Lowest cost

And Improve Your Grades
myassignmenthelp.com
loader
Phone no. Missing!

Enter phone no. to receive critical updates and urgent messages !

Attach file

Error goes here

Files Missing!

Please upload all relevant files for quick & complete assistance.

Guaranteed Higher Grade!
Free Quote
wave

What is Data Science?

What is Data Science?

According to IBM estimation, what is the percent of the data in the world today that has been created in the past two years?

What is the value of petabytestorage?

For each course, both foundation and advanced, you find at briefly state (in 2 to 3 lines) what they offer?Based on the given course description as well as from the video. The purpose of this question is to understand the different streams available in Data Science.

Read the following research paper from IEEE Xplore Digital Library

Ali-ud-din Khan, M.; Uddin, M.F.; Gupta, N., "Seven V's of Big Data understanding Big Data to extract value," American Society for Engineering Education (ASEE Zone 1), 2014 Zone 1 Conference of the , pp.1,5, 3-5 April 2014 and answer the following questions:

Summarise the motivation of the author (in one paragraph)

What are the 7 v’s mentioned in the paper? Briefly describe each V in one paragraph.

Explore the author’s future work by using the reference [4] in the research paper. Summarise your understanding how Big Data can improve the healthcare sector.  

In order to build a big data platform - one has to acquire, organize and analyse the big data. Go through the following links and answer the questions that follow the links: Check the videos and change the wordings

Please note: You are encouraged to watch all the videos in the series from Oracle.

How to acquire big data for enterprises and how it can be used?

How to organize and handle the big data?

What are the analyses that can be done using big data?

Part B answers should be based on well cited article/videos – name the references used in your answer.For more information read the guidelines as given in Assignment 1.

Google is a master at creating data products. Below are few examples from Google. Describe the below products and explain how the large scale data is used effectively in these products.   

  1. Google’s PageRank
  2. Google’s Spell Checker
  3. Google’s Flu Trends
  4. Google’s Trends

Like Google – Facebook and LinkedIn also uses large scale data effectively. How?

Briefly explain why a traditional relational database (RDBS) is not effectively used to store big data?

What is NoSQL Database?

Name and briefly describe at least 5 NoSQL Databases  

What is MapReduce and how it works?

Briefly describe some notable MapReduce products (at least 5)

Amazon’s S3 service lets to store large chunks of data on an online service. List some 5 features for Amazon’s S3 service.

Getting the concise, valuable information from a sea of data can be challenging. We need statistical analysis tool to deal with Big Data. Name and describe some (at least 3) statistical analysis tools.

Name 3 industries that should use Big Data – justify your claim in for each industry using proper references.

What is Data Science?

Exercise 1: Data Science 

What is Data Science?

This is a new field which has emerged and it involves the scientific techniques, the process and the system to be able to extract the knowledge or even insights from the information to variety of forms, which can either be structured or perhaps unstructured (Dhar, 2013). The field involves the analysis as well as other procedures aimed to understand and analyze a given phenomenon with the use of the data, which is available (Dhar, 2013). The field of the data science would entails use of the automated procedures that are used in the analysis of the data, which is massive, and be able to extract information from it. With the use of these procedures from various genomics to the high-energy physics. Moreover, the data science has been a branch of science, which has been involved in the influence of the social science and the humanities to help to give rise to the individuals who are the data engineers, statisticians or perhaps the data analysts.

According to IBM estimation, what is the percent of the data in the world today that has been created in the past two years?

According to the estimation of the IBM the data that can be estimated in about ninety percent of the data in the world today which have been created for the past 2 years. Moreover, over the years there have been emergence of new trend of technologies, which are aimed to organize and help to make sense of the avalanche data, which exists.

What is the value of petabytestorage?

A petabyte refers to the million gigabytes for the storage. Previously the data storage devices were essentially measured using the gigabytes as well as the terabytes. The petabytes are in ten to fifteen power; this is the same as one thousand terabytes or even a million gigabytes.

For each course, both foundation and advanced, you find at state (in 2 to 3 lines) what they offer? Based on the given course description as well as from the video. The purpose of this question is to understand the different streams available in Data Science.

Foundation courses

Research design and the application for the data and analysis

The course is all about the issues of the social science and the way to which data interact. Moreover, there is also application of the data science reasoning methods.

The Importance of Big Data

Statistics for the data science

The skills one is equipped with in this unit is on the research design and the statistical analysis. The learner gains knowledge on the quantitative research techniques.

Storage and retrieving data

The student acquires the skills for storing, processing and managing of the datasets. The basis of the unit is to equip the learner to gain knowledge on data storage, retrieving of data and processing.

Applied machine learning

The requirement of the course is providing the learner broad introduction to the machine learning. The focus on learning is on both practical application and intuition of individuals and not theoretical aspects.

Data visualization and communication

The focus on the course is on the design and the implementation of complementary visual and the verbal representation of the patterns.

Advanced courses

Experiments and causality

The skills, which the learner will gain from this course, is on the experimental design along with the mining and the exploitation of the data. It is all about putting the science back to the social science via the use of the practical experiments.

Behind the Data: Humans and the values

The three units found in the course introduces an individual to the legal, policy and the implication of data. They will also explore on the legal and ethical issues as well as policy in the life cycle of the data science.

Scaling up! Really Big Data

This course focus is on the introduction of the large-scale data sets and the practical issues, surrounding how the data can be stored, processed and its analysis. Further, the learner will get to work with the cloud computing system (Provost & Fawcett, 2013).

Statistical methods for the discrete response, time series and the data panel.

 The learner get to learn about the concept of the linear and the regression models. The units provide real case scenario, which have occurred to analyse.

Natural Language Processing

The course enable a student to grasp the fundamental language, which is vital to the human interaction. It is all about the linguistic phenomena and the machine learning. The classes are based on practical applications.

Machine learning at Scale

The course equips the learner with the skills to be able to work on problems, which are related to terabytes of the data, machine learning and the design of the Algorithm.

Read the following research paper from IEEE Xplore Digital Library and answer the following questions:

Courses in Data Science

Summarise the motivation of the author (in one paragraph)

What motivated the author in writing of the paper came from the aspect that the concept of the big data has become part of our daily life (Dhar, 2013). This concept has been able to solve many solutions to the problems, which have occurred in various industries. The big data provide insight on how the machine would take over the world especially the robotic technology as well as the internet.  Big data plays an important role in the understanding on the humans that they are the data agents. This individual has emphasized that before individuals look for the big data it is important to start within themselves first.

What are the 7 v’s mentioned in the paper? Briefly describe each V in one paragraph.

The 7 vs, which have been mentioned in the paper, are the validity, volume, veracity, variety, volatility as well as the value.

The volume entails the size of the data that is created from different sources for example the audio, text and the research studies.

The velocity entails the speed for generating of the data, which is used. The reason for this is the high velocity for the data that comes to the business entails to have the technology and the databases engines in order the data to be processes (Dhar, 2013). The important aspect on the velocity is all about the speed for the feedback loop that takes the data from the inputs via the decision-making.

Variety focusses about the form of the data for example the Audio, text and the images. Sometimes it can become a challenge in the establishment or even building of the system that can integrate the mix of the data (Provost & Fawcett, 2013). The variety of the data influence on the integrity of the data transmitted.

 Veracity is all about the truthfulness of a given data. It focusses on how a given data is certain. It is also important to note that the meaningfulness of the result, which has been achieved from a given data, is crucial for a particular space.

Validity of the data is all about the correctness and the accuracy of the data in terms of the intended usage (Wu,Zhu, Wu & Ding, 2014). A given data set could be valid in one program but at the same time invalid in another. It is important to verify the relationship that exists between the various elements of the data.

Research Paper from IEEE Xplore Digital Library

Volatility is the recall of the retention policy on the structure of the data as well as its implementation, which occurs on the daily basis of a business.

Value of the data.is all about the outcome, which are desired in the processing of the Big Data. There has been numerous interest to the extraction of the optimal value from the big data set, which that one is working with (Dhar, 2013). Individuals should also pay attention to the investment of the data for storage that is supposed to be cost effective as well as cheap.

Explore the author’s future work by using the reference [4] in the research paper. Summarise your understanding how Big Data can improve the healthcare sector in 300 words.  

The big data can be utilized in the improvement of the health sector in the following ways. One of the ways in which it could be used is the design of the special tools and the methods for the extracting the value for the big streams of the data. In the second way is the development of a given algorithms to be able to utilize on the six Vs and achieve the seven V which is attainment of the overall value. The big data when they are used in the development of the algorithm this is used to develop the evidence for the base medicine and personalized medicine via the use of the existing scientific evidence as well as the test data (Provost & Fawcett, 2013). The other way of utilizing of the big data is on the development of the intelligence machine by using the cognitive science. The big data has also been found to address the issue on inadequate education especially in the rural areas and it also it can help to promote the renewable energy and clean environment. The big data contains the raw data, which is used to make invincible machines. In the health sector through understanding and utilizing of this data it could be possible to solve the health issues and solve some of the current diseases because of the data, which has been gathered (Wu, Zhu, Wu & Ding, 2014). When it comes to utilization of the intelligent Algorithms big data can help the growth of the field of the predictive and personalized medicine that make efficient use of the case studies.

In order to build a big data platform - one has to acquire, organize and analyse the big data. Go through the following links and answer the questions that follow the links: Check the videos and change the wordings

Big Data Tools and Applications

Please note: You are encouraged to watch all the videos in the series from Oracle.

How to acquire big data for enterprises and how it can be used?

One can acquire the data using the oracle. This can be through the social media postings comment, through the sensor data, or perhaps the online profiles. After this data has been acquired, it is then analysed and then it is plotted in the key trends (Dhar, 2013). Another way of acquiring the data could be the creation of the online profiles as well as creating of the online sensor for the purpose of the updates (Dhar, 2013). The data can be stored through use of the batch Hadoop as well as the real time response. While in the Hadoop, the data is stored to what we refer as the HDFS.

How to organize and handle the big data?

The Big data is normally in the numerous format such as the web blog, images and the social media. The data should be filtered as well as aggregated (Mattmann, 2013). For instance when it comes to the web blog data for it be analysed the initial thing to undertake is finding the actual for each of the users and then combined them through use of what we regarded as the sectionalisation. This therefore, could tell exactly the page each one has visited previous and where they went next (Dhar, 2013). One could use the semantic analysis to review the users by looking at a specific comment for a given product. When it comes to the product sensor one could look at the bars, which appears to be out of the normal range in order to signify if there is any possible fault (Mattmann, 2013). The infrastructures, which are used in the organizing of the big data, need to have the capability of processing and manipulating of the data in their original location for purpose of storage (Dhar, 2013). The use of the Hadoop it is the current technology that is widely used to organize of the large volume of the data and processing. It is also keeping of the data to their original storage cluster. This system is the one used for the long-term storage especially for the web logs.

What are the analyses that can be done using big data?

The analysis of the Big Data could be undertaken in an environment that is distributed where data may have been stored initially and they can be accessed from the warehouse data (Erl, Khattak & Buhler, 2016). The equipment, which would be used in the analysis of the Big Data, should have the capability of supporting a deep analytic such as that of statistics and data mining to variety of data types, which is stored in the diverse system. Moreover, the infrastructure, which is used, should have the capability of integrating the analysis via the combination of the big data. Some examples of the analysis that can be done is looking at the sale of a given product (Erl, Khattak & Buhler, 2016). On this, one could identify the data on the region sales done, the month or even the sales team involved.

Part B answers should be based on well cited article/videos – name the references used in your answer. For more information read the guidelines as given in Assignment 1.

Google is a master at creating data products. Below are few examples from Google. Describe the below products and explain how the large-scale data is used effectively in these products.   

  1. Google’s PageRank

This is an algorithm, which is used by Google search to be able to rank the websites when it comes to their search engine results.

  1. Google’s Spell Checker

The Google uses this program to be able to suggest the correct meaning of different words. This feature has been existence and Google continues to improve on it.

  1. Google’s Flu Trends

This is essentially a web service that over the past has provided an approximate of the influenza activity to more than 25 countries. Over the past when there was outbreak of the Influenza disease Google had tried to predict the flu activity(Mattmann, 2013). The idea behind this was to monitor millions of the users on their health behavior while they were online. This information was then analysed to reveal if there was presence of the flu illness in a particular population.

  1. Google’s Trends

Its based to the Google search engine where it shoes a given search term has been entered basing on the relative to the total search volume across the various regions in the world and numerous languages.

Like Google – Facebook and LinkedIn also uses large-scale data effectively. How?

These two organization unlike the Google where data on individuals is often based on guesswork on where people have visited and what they have been searching, they do not ask individuals who they are or maybe where they live (Erl, Khattak & Buhler, 2016). Facebook and linkeldln are utilizing a data strategy that has been undertaken by their data science team to help in updating of the posts especially on the insights, which they have been able to glean from the analysis of millions of people who browse the sites.

Briefly, explain why a traditional relational database (RDBS) is not effectively used to store big data?

The reason why RDBS is not able to store the big data is due to the fact that the data size is essentially increased to large extents of the petabytes (Erl, Khattak & Buhler, 2016). The RDBS has a challenge when it comes to handling a huge volume of such a data. Another reason is that the primary data comes in form of semi-structured or even unstructured format.

What is NoSQL Database?

NoSQL comprises of the wide variety of various databases technologies, which were essentially developed in the response to the demands that is presented to the building of the modern applications.

Name and briefly describe at least 5 NoSQL Databases  

There are various types of the NoSQL types these are as follows:

Document databases: These databases are able to pair each of the key with the complex data structure referred to as the document. They contain various key value pairs, or perhaps the key-array pairs.

Graph stores: these databases are usually used for storing of the information in regards to the network data such as the social connections.

Key value stores: these are the simplest databases, every single item which is contained in the database is stored as an attribute name, together with its value.

Wide column stores:  an example of these are the Cassandra and the Hbase, which are optimized for the queries over the large datasets, as well as the store column of the data together, rather than the rows.

Oracle database: these are the Big data appliance database that are integrated to the Hadoop  as well as the R system language.

What is MapReduce and how it works?

This regards the main component of the Apache Hadoop framework of the software. The Hadoop helps in the resilient as well as to the distribution of the process especially to the unstructured data sets all across the commodity computer clusters. The MapReduce is a procedure, which is used especially in the harnessing of the power of many computer that are working parallel (Erl, Khattak & Buhler, 2016). The concept of the big data collection is broken down to pieces, which could be known as the Map phase.

Briefly describe some notable MapReduce products (at least 5)  

The main MapReduce are the Apache Hadoop, Riak, Couchdb. Infinispan and the Disco project.

Disco project:  this is an open source framework that has been used when it comes to the distributed computing because the MapReduce paradigm.

Riak: This Distributed NoSQL database is easy for one to operate with. It can easily distribute the data across the clusters to be able to perform much faster with minimal fault.

Infinispan: this is essentially a distributed cache and the key value for the NoSQL storage data software, which has been embedded to the java program because it has a library.

Couchdb: this is an open source database software that is easy to use and it has an architecture that has been embraced by the web.

Apache Hadoop: this is what we regard as the open source application platform that is used for the purpose of distributed storage as well as the distributed processing of the large set of data, which are on cluster of computer built on commodity hardware (Mattmann, 2013).

Amazon’s S3 service lets to store large chunks of data on an online service. List some 5 features for Amazon’s S3 service.

The features of the Amazon 3 S services are the scalability, storage, simple preference, availability and the durability.

Getting the concise, valuable information from a sea of data can be challenging. We need statistical analysis tool to deal with Big Data. Name and describe some (at least 3) statistical analysis tools.

There are various statistical analysis tool which deals with the big data some of these are the excel spreadsheet, IBM statistical tools and the R analysis tool. The R tool it requires some knowledge to be able to perform through the command line interface using it.  The statisticians are able to enter the lines of the code on the R function of the software. The excel tools is found in the Microsoft office product, it is very useful when it comes to the statistical functions (Erl, Khattak & Buhler, 2016). The last is the IBM tool, which contains the syntax editor. It also contains the graphical interface and does not a lot of knowledge of programming to operate using it.

Name 3 industries that should use Big Data – justify your claim in 250 words for each industry using proper references.

Some of the industries, which should use the big data re the sales, manufacturing and financial industry.

Manufacturing industry

There are various ways in which the big data could be incorporated in the manufacturing industry. One of the ways could for the edging their competitors. The companies can be able to use the data, which is provided in the various platform to resonate to the segments groups in the market place (Erl, Khattak & Buhler, 2016). Through the various platform, the company can get information about the customer and what their competitors are providing and try to provide better services than them.

Sales

The big data could be used for the sales purposes. In an industry that is supported by sales if you are not utilizing the big data concept you are becoming disservice. The teams in the organization could utilize the big data to determine what the customer wants more and they are able to concentrate on it (Wu, Zhu, Wu & Ding, 2014). Big data has helped individuals to be creative in creation of value, which is one of the component of the process of the sales.

Financial industry

 This industry can utilize the big data to be able to know the existing customer and be able to segment their characteristics, insurance, and their demand (Mattmann, 2013). This would help the organization to offer one stop solution when it comes to matters of financials and thus help achieve optimal value for the customers.

References

Dhar, V. (2013). Data science and prediction. Communications of the ACM, 56(12), 64-73.

Erl, T., Khattak, W., & Buhler, P. (2016). Big Data Fundamentals: Concepts, Drivers & Techniques. Prentice Hall Press.

Mattmann, C. A. (2013). Computing: A vision for data science. Nature, 493(7433), 473-475.

Provost, F., & Fawcett, T. (2013). Data science and its relationship to big data and data-driven decision making. Big Data, 1(1), 51-59.

Wu, X., Zhu, X., Wu, G. Q., & Ding, W. (2014). Data mining with big data. ieee transactions on knowledge and data engineering, 26(1), 97-107.

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2021). Introduction To Data Science: Courses, Tools, And Applications Essay.. Retrieved from https://myassignmenthelp.com/free-samples/itech2201-cloud-computing/data-visualization-and-communication.html.

"Introduction To Data Science: Courses, Tools, And Applications Essay.." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/itech2201-cloud-computing/data-visualization-and-communication.html.

My Assignment Help (2021) Introduction To Data Science: Courses, Tools, And Applications Essay. [Online]. Available from: https://myassignmenthelp.com/free-samples/itech2201-cloud-computing/data-visualization-and-communication.html
[Accessed 15 July 2024].

My Assignment Help. 'Introduction To Data Science: Courses, Tools, And Applications Essay.' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/itech2201-cloud-computing/data-visualization-and-communication.html> accessed 15 July 2024.

My Assignment Help. Introduction To Data Science: Courses, Tools, And Applications Essay. [Internet]. My Assignment Help. 2021 [cited 15 July 2024]. Available from: https://myassignmenthelp.com/free-samples/itech2201-cloud-computing/data-visualization-and-communication.html.

Get instant help from 5000+ experts for
question

Writing: Get your essay and assignment written from scratch by PhD expert

Rewriting: Paraphrase or rewrite your friend's essay with similar meaning at reduced cost

Editing: Proofread your work by experts and improve grade at Lowest cost

loader
250 words
Phone no. Missing!

Enter phone no. to receive critical updates and urgent messages !

Attach file

Error goes here

Files Missing!

Please upload all relevant files for quick & complete assistance.

Plagiarism checker
Verify originality of an essay
essay
Generate unique essays in a jiffy
Plagiarism checker
Cite sources with ease
support
Whatsapp
callback
sales
sales chat
Whatsapp
callback
sales chat
close