Data Science: Big Data Courses & Platform

1. What is Data Science?

2. According to IBM estimation, what is the percent of the data in the world today that has been created in the past two years?

3. What is the value of petabyte storage?

Questions

Data science basically a topic that refers to the significance and organization of data avalanche creation in recent years. It allows the identification of patterns and patterns of data, and allows people with advanced scholarships to improve the conditions in which humanity creates social plus business value. The appearance of the "bid data" also enables us to comprehend these phenomena more deeply, ranging from biological systems and economic behavior 1 to human social entities.

According to IBM estimation, what is the percent of the data in the world today that has been created in the past two years?

It is measured or estimated that ninety percent of the world's data in last two years has been completed by IBM.

What is the value of petabytestorage?

Million gigabytes also written as (10 to 15^th power) is peta-byte.

For each course, both foundation and advanced, you find at https://datascience.berkeley.edu/academics/curriculum/briefly state (in 2 to 3 lines) what they offer? Based on the given course description as well as from the video. The purpose of this question is to understand the different streams available in Data Science.

Foundation course:

Foundation course or basic curriculum is n essential skills and knowledge that students provide in the data science. It includes storing, searching, designing, and analysing of research work in data science provide students with data visualization and practical application knowledge (Khan, Fahim Uddin & Gupta, 2014).

Advanced course:

Advanced course plays an important role in deep understanding and value and application of the data science. Analytical method comprises complex skills that address big data-related issues through experimental design and data visualization to help students explore and make them aware of the exact usage of data science.

Exercise 2: Characteristics of Big Data

Read the following research paper from IEEE Xplore Digital Library

Ali-ud-din Khan, M.; Uddin, M.F.; Gupta, N., "Seven V's of Big Data understanding Big Data to extract value," American Society for Engineering Education (ASEE Zone 1), 2014 Zone 1 Conference of the , pp.1,5, 3-5 April 2014 and answer the following questions: Summarise the motivation of the author (in one paragraph)

As the author has described, it comes from the fact that BD is emphasized because it now become important part of life and also hides solutions to any industry problem. The main reason for this paper is that they think big data is the main area of technology. In addition, it is written for "BD Ocean". As we all know, billions of statistics are generated every day, making big data as a style.

What are the 7 v’s mentioned in the paper? Briefly describe each V in one paragraph.

Velocity:Velocity is discussed from two perspectives. Basic thing is incoming of data that enterprise needs to prepare the technology plus database engine processes. The other is to move big data to a large storage area that needs a quick response when the data arrives.
Variety:It includes diverse shapes, such as video, text, which is a main difference between big data as well as traditional data. The challenging part is due to complexity that can lead to erroneous data integration.
Volume:Volume means size of information or data created from some sources including audio, text, video, research reports, spatial images, social networks, weather forecasts, crime reports to mention.
Veracity:Compared with traditional data, it focuses on the reliability of data because it can be standardized. These big data come directly from users. The reliability of these users is low. Therefore, cleaning up data is an important step for big data.
Volatility:When considering big data, volatility means data retention strategy. This is easily executed in a relational database furthermore can expand the type, speed, and amount of data in the big data world.
Value:Value is a significant V value because it is an ideal result of big data analysis and is also the result of previous analysis.
Validity:Validity means accuracy of the data and correct usage and data is real and does not want to be effective in dissimilar situations (Corea, 2016).

Explore the author’s future work by using the reference [4] in the research paper. Summarise your understanding how Big Data can improve the healthcare sector in 300 words.

As it has been stated that the cost or ownership and management data will be exceeded. The governance mechanism depends to a large extent on the value of the data. For structures and strategies, it is required to write and execute truth limit of project information extraction simultaneously. Data can be between layers, in short, there is less risk of data at higher levels. Therefore, it is recognized that there are higher storage costs and higher levels of protection to ensure these levels are related to costs. Advent of digitalized technology has provided many benefits for healthcare suppliers. One of the key advances is the utilizes of big information in medical business. Utilizing big data may help medical industry participants provides more effective operations moreover insight into patient as well as their well being. Healthcare business faces a variety of challenges, from a new disorder outbreak to maintain optimal operational efficiencies. The Big data analytic may also help solve these health care challenges. Utilizing a large amount of information in healthcare industry, such as clinical, financial, development and research, operational data, and management, The Big Data may gain meaningful insight and improve operational effectiveness of the business. “Healthcare companies can lower medical costs and provide better services Finding ways to treat diseases: Some drugs seem to works for several peoples, however not other, furthermore there are the various things to observe in single genome. It is impossible to learn all of these learning’s in detail, however big data may help reveal unknown correlation, hidden pattern, and insight also by examine huge amounts of information. In future, it can be used to create special drugs for the patient's human genome to obtain the best therapeutic effect. Combining all patients' electronic health records, dietary information, social factors, etc. with DNA sequencing can recommend customized treatment and personalized medicine. Aurora Health Care has begun a proof of concept for this, and they have been able to reduce the readmission rate by 10% and save $6 million annually (Abouelmehdi, Beni-Hessane & Khaloufi, 2018).

Data Science Definition

Exercise 3: Big Data Platform

In order to build a big data platform - one has to acquire, organize and analyse the big data. Go through the following links and answer the questions that follow the links: Check the videos and change the wordings

https://www.infochimps.com/infochimps-cloud/how-it-works/
https://www.youtube.com/watch?v=TfuhuA_uaho
https://www.youtube.com/watch?v=IC6jVRO2Hq4
https://www.youtube.com/watch?v=2yf_jrBhz5w

Please note: You are encouraged to watch all the videos in the series from Oracle.

How to acquire big data for enterprises and how it can be used?

From the video mentioned as well as Oracle's article the main change to infrastructure are the procurement phase. These 2 major use cases must be consider. First, for the social media update, forum comment and blogs, companies can simply remove analysis of overnight or weekly trends. Want to update, study, also store information for online profile moreover continue to monitor sensor. In case, the NoSQL database may be use to store a big data but it is extensible and flexible. Even the Hadoop distributed system files may be use for batch information. In this method, the system aims to capture all information by not parsing data and categorizing it in fixed mode. As a result, data can be easily accessed through simple keys and customer-based applications.

How to organize and handle the big data?

Stored data in the HDFS want to be a pre-processed, well organized, and converted so that it may be loaded into information warehouse using traditional enterprise data and data store in NoSQL. It moreover knows that BD is always in different formats. Procedure called sessions are for specific information. This procedure translates behaviour patterns and other related information into useful data so after that it may be aggregated as well as loaded into the relational database systems.

What are the analyses that can be done using big data?

Big data analysis is complete in distributed surroundings because big data analysed in some deeper analysis, i.e. due to the required infrastructure, data mining and statistical analysis of various systems for storing various data. Zooming can be done on large amounts of data. Analytical models can make better decisions automatically. Finally, the response time driven in changing behaviour can be delivered faster (Jee & Kim, 2013).

Part B (4 Marks)

Part B answers should be based on well cited article/videos – name the references used in your answer.For more information read the guidelines as given in Assignment 1.

Exercise 4: Big Data Products (1 mark)

Google is a master at creating data products. Below are few examples from Google. Describe the below products and explain how the large scale data is used effectively in these products.

Google’s PageRank

In 2005, the Google began to link Google's webmasters and blogs as "votes," a new attribute called a link to unfollow, which is a countermeasure against spam. The hyperlink page correspond to a single page vote, and a voting page is obtained through the significance of all linked pages. If there is no linked page, the page may have greater number of relations or no hierarchy.

Google’s Spell Checker

Petabyte Storage and IBM's Estimation

This spell checker are used to spell words. It is a standalone application. It is called electronic dictionaries, search engine, word processor furthermore email customers. This spellchecker are used to separate words when comparing during stem analysis.

Google’s Flu Trends

This trend of Google Flu are the web services operated by the Google that provide estimate of influenza activities in 25 and more than that countries. It estimates available historical information and present research information for download (Ko?cielniak & Puto, 2015).

Google’s Trends

Google Trends are Google search-based web-based tool. When search terms are entered in different languages ??for search in different regions of the world, they are usually displayed as search terms.

Like Google – Facebook and LinkedIn also uses large scale data effectively. How?

This is well-known facts, that generates a huge amount of information in website, because they are the social platform, moreover all of these information’s should be recognized against the user's behaviour pattern to get the recommendation. Such as, Face book are use for various activities that provide suggestion that the users need to purchase or attends that is likely to be post on the page with explore criteria.

Exercise 5: Big Data Tools

Briefly explain why a traditional relational database (RDBS) is not effectively used to store big data?

According to a XYZ there is the 3 major reason why RDBS are not efficiently use to store a big data. Initially, size of data drastically increased within the PB level, and the ability to process such a large amount of RDBS data was a tedious task. Most of the RDBS data is unstructured or semi-structured. This information frequently comes from a social media, texts, videos, emails. The Unstructured information is beyond the scope of the RDBS because relational database cannot identified in unstructured information. The RDBS are design for structured data or financial data for blog sensors. The high speed of big data is another reason for information retention instead off rapid development (Hoskins, 2014).

What is NoSQL Database?

The NoSQL database are defined as "a basically distributed system and a non-relational database that also enables quick ad hoc organizations to analyse extremely high volumes and different data type. It also known as a cloud database, a big data database because it has a huge number of Data generation, storage, etc. Another name are non-relational database.

Name and briefly describe at least 5 NoSQL Databases

Cassandra: Face book originally developed Cassandra, as well as then developed Apache open sources projects, which are ideal for social networking CC databases. This is a non-relational database that is used in conjunction with Google's Big Table.

Lucene: This is one of the ASF 4 Jakarta Task Forces in Jakarta. It is an open source tool for subprojects or full-text search engine toolkits. It is not a full-text search engine, but a full-text search engine architecture.

Oracle's NoSQL database: The big data machine are NoSQL database, integrated Hadoop, , an R system language, a Hadoop loader, and an Oracle database and Hadoop adapter. It released Oracle OpenWorld as a big data appliances on 4^th October.

Courses in Data Science

HBase: Called the Hadoop database, it provides high-performances, column-oriented, highly reliable furthermore scalable storage systems that are distributed through HBase technologies. The stored structure is cluster type of the computer server. The Google BigTable start source implementations are done on HBase because it is the same as the BigTable's files storage systems.

BigTable are non-relational database: they contains a multidimensional classification map for a storage, it is sparse, persistent as well as distributed. Because PB-level information processing are done on several machines, it is very reliable (Bughin, 2016).

What is MapReduce and how it works?

MapReduce are used for a parallel computing of several data sets because it is a model of programming. It helps programmers run on distributed systems just like parallel programming. Map functions are performed for a set of key-value pair to map the new set of pairs specify on the concurrent reduction functions to make sure that each shared key are mapped to a similar set of keys for current software.

Briefly describe some notable MapReduce products (at least 5)

Couchdb: This is an Apache open source database software that focuses on how to use and build a scalable architecture.
Apache Hadoop: It is big data open source software for MapReduce programming framework, it is scalable cloud computing.
Disco Project: This is a lightweight distributed computing system and an open source framework.
Riak: This is a scalable, easy-to-use, easy-to-use NoSQL database that is also distributed.
Infinispan: Software developed by Red Hat for key NoSQL and distributed cache data storage (Vis, 2013).

Amazon’s S3 service lets to store large chunks of data on an online service. List some 5 features for Amazon’s S3 service.

Amazon's S3 service has the following features, as described below:

Version Control: It allows each object in the bucket to save, retrieve, and retrieve each version. It is used to improve the dependability of storage as well as recover deleted or overwritten objects.

Life cycle: Objects using the life cycle will be automatically deleted and marked as a glacier storage at a specific time.

This tag marks the cost allocation like AWS billing aspect to easily track AWS costs and organize bucket tags.
Request pricing refers to the behaviour of using a store and accessing objects in a folder to get a list of files for all actions that AWS charges. Price is an important factor to consider when dealing with a large number of documents.

RRS is reduced, and redundant storage can be enabled and disabled in the storage to decrease the cost of reproducible data in a non-critical manner.

Getting the concise, valuable information from a sea of data can be challenging. We need statistical analysis tool to deal with Big Data. Name and describe some (at least 3) statistical analysis tools.

Some statistical analysis tools are:

[R]: R is a programming language for navigating the command line interface. It also uses circuits to perform R functions in a complex computer science environment, making it accurate and able to learn faster. R can run on various operating systems.
EXCEL spreadsheet: It is one of the Microsoft Office products and it is a powerful software. These tables and charts are easy to operate and manage. It is also used for data analysis and statistical analysis, which is due to the deficiencies caused by the slow operation.
SPSS Statistics: SPSS Statistics does not require extensive programming knowledge. In addition to the syntax editor, there is a point-and-click graphical interface. It is an IBM statistical tool for analysis. It has some control over the statistical output.

Implications of Big Data in Healthcare

Exercise 6: Big Data Application

Name 3 industries that should use Big Data – justify your claim in 250 words for each industry using proper references.

Financial industry: From the perspective of existing customers, use investment characteristics, asset management, banking services, product financial strategies, etc. to formulate customer demographic segmentation and data analysis of insurance demographics to provide one-stop financial customer solutions. Get the most value. It is used to manage duplicate transactions in the workflow. Blockchains are used to improve big data security, consistent compliance archiving and Blockchain analysis.
Insurance: This is one of the industries that need services and can reduce the time to process complex claims within 10 minutes. It also needs to eliminate millions of dollars in leaks and fraud. It is also a customer-centric profitable company. Another important use is to set office premiums because they set the profit price of premiums by covering risk to suit the customer's budget. This industry is also based on the principle of risk.
Retail industry: This is a data-driven cognitive technology used to increase the customer experience. It is also used to analyze social media data to improve product design and marketing to provide quality services. Big data analytics in the retail process can predict demand products, identify interested customers and research best forecasting trends, and optimize pricing to deal with the competitive advantage of the products to be sold.

Write a paragraph about memory virtualization.

Memory virtualization separates volatile random access memory (RAM) resources from individual systems in the data center and then aggregates these resources into any computer-available virtualized memory pool in the cluster. Operating system or application running on the operating system. The distributed memory pool can then be used as a cache for CPU or GPU applications, messaging layers, or large shared memory resources. Memory virtualization allows networked servers and distributed servers to share memory pools to overcome physical memory limitations, a common bottleneck for software performance. By integrating this functionality into the network, applications can use a large amount of memory to improve overall performance, system utilization, improve memory usage efficiency, and enable new use cases. The software on the memory pool node (server) allows nodes to connect to memory pools to contribute memory and store and retrieve data.

Based on the video answer the following questions:

What is RAID 0?

RAID 0, are also known as the disk striping, are the technique for decomposing files and distributing data across the all disk drive in the RAID group. One disadvantage of a RAID 0 are that it has no parity. If the drive fails, there will be no redundancy as well as all data will be misplaced.

Striping are the very confuse RAID level for beginners and requirements to be well understood and explained. RAIDS are collection of various disks also define the number of consecutively addressable disk block in these disks. These disk blocks are called As a set of strips and these strips, multiple disks are called strips.

Mirroring is easy to understand and one of the main reliable information protection methods. In this method, you only need to make a copy of the disk image you want to protect, and in this way get two copies of the data.
Parity: Mirroring involves high costs, so to protect data, the new technology uses a strip called parity. This is a reliable and low-cost data protection solution. In this method, extra HDDs or disks are added to the stripe width to save the parity bits.

Big Data Platform

Exercise 2: Storage Design

The repositories are essentially logical disk space provided by file systems on the top of the physical storage space hardware. If repositories are created on file servers, such as NFS share, the file system exists already; if repositories are created on LUN, OCFS2 system file are first created. Before you begin the configuration, you must reach an NFS-based repository and a LUN-based repository.

Based on the watched video answer the following questions:

What is ISS?

SSS is the element-rich RAID arrays that will provides a highly optimize I / O processing’s capabilities. It also provides plenty of caching and different I/O methods to improve performance. The ISS operating environment also provides brilliant cache organization, array asset administration, and connecting heterogeneous host. It supports virtual provisioning, flash drives as well as automatic storage tiring.

What are the 4 main components of the ISS?

The Video have been always mentioned 4 main mechanism of front end, cache, ISS, physical disks and back end.

Based on the watched videos answer the following questions:

Describe NAS and SAN briefly using diagrams?

Store Area Network (SAN) is a high performance network, the primary purpose of which is to communicate with the computer system with storage devices.

Network Attached Storage (NAS) is a specialized file storage device that provides file-based shared storage for local area network (LAN) nodes over standard Ethernet connections. (Rouse, 2015)

The SAN organizes storage resources on a separate, high-performance network. The key difference between NAS and SAN is that network attached storage handles single file input/output (I/O) requests, while the storage area network manages sequential data block I/O requests.

What are the advantages of SAN over NAS?

The major advantages of the NAS:

Support inclusive access to data
Improve the efficiency
Improve the flexibility
Centralize storage
Simplify administration
Scalability

The major advantages of the SAN:

Good disk uses
SAN for tragedy recovery for various applications
For improve the availability of the application
SAN can also reduce the backup time

What are two common NAS file sharing protocols? How they are different from each other?

There are the 2 general NAS file sharing protocols:
general Internet File System Protocol
Network File System Protocol

Implement CIFS in the Microsoft environment base on server messages block protocols and NFS in the UNIX environments.

Part B

Exercise 3: Storage Design (1 Mark)

Design Storage Solution for New Application

Organizations are deploying new company application in Organization environment. New application needs 1TB of the storage for the application data and business. During the peak workload period, the application is expected to generating 4900 IOPS with a distinctive I/O size of block 4KB. The Vendor-supplied disk drive options are rpm 15,000 drives with a ability of 100 GB. The specification of drive is: The Average search time are equal to 5 ms, information transfer rate is equal to 40 MB/sec. You are required to calculate the required number of disk drives that can meet both capacity and performance requirements of an application (Stonebraker, 2010).

Hint: In order to calculate the IOPS from average seek time, data transfer rate, disk rpm and data block size refer slide 15 in week 7 lecture slide. Once you have IOPS, refer slide 16 in week 7 to calculate the required number of disks.

Dc=1TB/100GB=10 disks

Ts=0.005s+0.5/(15,000rmp/60)+4kb/40mb=0.0071s

S=1/0.71=141

0.7S=99

Dp=4900/99=50 disks

Therefore, required disks drives are 50 disks.

Acquiring Big Data for Enterprises

What is FCoE and why we need FCoE?

The Fibre Channel over Ethernet storage protocol enables Fibre Channel communications to operate directly over Ethernet (Rouse, 2012). The emergence of FCoE supports the transmission of existing high-speed Ethernet infrastructures, and centralizes IP protocols on memory and Fibre Channel to a single cable transmission and interface.
The goal of FCoE is to unify (I / O) and reduce the complexity of the switch. It can also reduce the number of links and interfaces. In addition, using FCoE can also promote sustainability by reducing energy use, cooling requirements, and saving users money.

In your opinion how FCoE is cost effective than traditional connection – give brief explanation.

Traditional connections require multiple network adapters and multiple Ethernet systems. But with FCoE, each server requires only one adapter and one Ethernet system.
• Ability to reduce environmentally friendly equipment through energy, space, power and cooling systems;
• Simplified maintenance procedures due to reduced system and equipment (Garg, 2016).

You have read and answered about SAN in part A – based on your understanding and with some research effort answers the following questions:

What is a Virtual SAN?

Virtual SAN is a software-defined storage product provided by VMware that allows enterprises to share storage capabilities and instantly provision virtual machine storage through simple virtual machine-driven policies.

What is IP SAN protocols and FibreChannel over IP (FCIP)?

IP SAN protocol:

An IP SAN is a dedicated storage area network (SAN) that allows multiple servers to access a pool of shared block storage devices using storage protocols that rely on the Internet Engineering Taskforce standard Internet protocol suite.
FCIP: Fibre Channel over IP (FCIP) is an important technology for linking Fibre Channel storage area networks (SANs). FCIP and iSCSI are complementary solutions that enable company-wide storage access. FCIP transparently interconnects Fibre Channel (FC) SAN islands over IP networks, while iSCSI allows IP-connected hosts to access iSCSI or FC-attached storage.

Choose the correct answer from the following questions:

What is an advantage of a flat address space over a hierarchical address space?

Highly scalable with minimal impact on performance

Provides access to data, based on retention policies

Provides access to block, file, and object with same interface

Consumes less bandwidth on network while accessing data

What is a role of metadata service in an OSD node?

Responsible for storing data in the form of objects

Stores unique IDs generated for objects

Stores both objects and objects IDs

Controls functioning of storage devices

What is used to generate an object ID in a CAS system?

File metadata

Source and destination address

Binary representation of data

File system type and ownership

What accurately describes block I/O access in a unified storage?

I/O traverse NAS head and storage controller to disk

I/O traverse OSD node and storage controller to disk

I/O traverse storage controller to disk

I/O is directly sent to the disk

What accurately describes unified storage?

Provides block, file, and object-based access within one platform

Provides block and file storage access using objects

Supports block and file access using flat address space

Specialized storage device purposely built for archiving

What is Greenhouse effect?

Heating of the earth’s atmosphere due to an increase in gases like carbon dioxide.

We are legally, ethically, and socially required to green our IT products, applications, services, and practices – is this statement true? Why?

Organizing and Handling Big Data

True.

Because we have the responsibility to maintain a healthy environment for the world in which we live, and to maintain a healthy environment for our future generations.

What is Green IT and what are the benefits of greening IT?

The Green IT is practices of environmental maintenance. It realizes energy conservation and maintenance functions through design, management control and delivery, while reducing the environmental burden.

Advantages: energy saving, environmental protection, low radiation, recyclable (Gomes, Tolosana-Calasanz & Agoulmine, 2015).

Exercise 2: Environmental Sustainability

Read the article in the below link and answer the questions that follow:

https://www.computer.org/csdl/mags/it/2010/02/mit2010020004.html

According to the article how do you build a greener environment?

According to the article how do you build a greener environment?
Coordinate, redesign moreover optimize supply chain, manufacturing actions and organizational workflows to minimize the impact on the environment;
Make company operations, building and further systems energy capable;
Analyze, model also simulate environmental impact;
Provide a platform for ecological administration as well as emission trading;
Audit and report energy utilization and also savings;
Providing environmental knowledge administration system, decision support system furthermore environmental ontology; and
Integrate and aggregate environmental monitor network data (Gomes, Tolosana-Calasanz & Agoulmine, 2015).

Summarize the article

The article also discussed how the global warming, how it affects the surroundings, how to establish green IT surroundings to prevent the greenhouse effects, moreover finally discussed the development prospect of the green IT business.

Exercise 3: Environmentally Sound Practices

The questions in this exercise can be answered by doing internet search.

Briefly explain the following terms – a paragraph for each term:

Power usage effectiveness (PUE) and its reciprocal
The power usage effects are use to determine information centre energy effectiveness measurement methods. Service effectiveness of power supply are divided into total power of the data centre separated by the computer infrastructure use to run data centre
Data center efficiency (DCE)

DCE are to increase resource use and eliminate the unused ability

Data center infrastructure efficiency (DCiE)
DCE will also increase the resource utilization as well as eliminates unused capacities

List 5 universities who offers Green Computing course. You should name the university, the course name and the brief description about the course.

Green Computing - University of the Hertfordshire. - Environmental performances in energy as well as water waste transportation, production, sustainable procurement, environmental awareness and biodiversity management.
University of a Sydney - Green IT and Cloud Computing. Solve various issues related to CC (and information centre) technology.
The ICT sustainability and Australian National University. ICT sustainability is about how to reduce and assess carbon footprint moreover the material which are utilized by the PC and the telecommunications. The Strategies may reduce impact of the computers on surroundings as well as make businesses energy-efficient.
University of Brandeis –Green Computing. It is the learner-led efforts to reduce the energy utilize and the carbon emission, also to raise the awareness of environmental impacts of the PC use at the Brandeis University.
Carnegie Mellon University - Green Computing. This courses are also introduce the students to exciting areas of the "Green Computing" also divides them into 2 tracks. First track are "Energy Saving Calculations" and second track are "Applying Calculations to Sustainability (Khan, Shah & Nusratullah, 2015)."

Exercise 4: Major Cloud APIs

The following companies are the major cloud service provider: Amazon, GoGrid, Google, and Microsoft.

List and briefly describe (3 lines for each company) the Cloud APIs provided by the above major vendors.

The AdSense API enables you to integrate AdSense signup, ad unit management, and reporting into your web or blog hosting platform.
Google’s at no cost AdWords API services allows developers to design computer program that will directly interact with AdWords servers.
Google also Checkout API enable merchant to integrate their existing e-commerce system with the Google Checkout, the communicate order category to buyers, along with take benefits features offered by service.

Exercise 1: Greening IT Standards and Regulations

To design green computers and other IT hardware – the following standards and regulations are mainly used EPEAT (www.epeat.net), the Energy Star 4.0 standard, and the Restriction of Hazardous Substances Directive (https://www.gov.uk/guidance/rohs-compliance-and-guidance). Use the link provide with some internet search – summarize each standards and regulations in 150 words.

Standards and related certification and labeling plans are key features of products and services in a sustainable supply chain. The number of "green" schemes available in the last few years has increased rapidly and includes only 400 environmental labels. Nevertheless, standards and regulations are powerful tools for running green growth strategic frameworks and sustainable development goals (SDGs) because they encourage improvements in energy efficiency, emission standards, production market competition, source utilization, trade and foreign direct investment and private sector volunteer initiatives . By giving consumers information about products and production processes and providing clear policy signals for businesses, these tools can be effective in achieving environmental objectives and facilitating the best practices of sustainable goods and services markets.

Exercise 2: Green cloud computing

Xiong, N.; Han, W.; Vandenberg, A, "Green cloud computing schemes based on networks: a survey," Communications, IET, vol.6, no.18, pp.3294,3300, Dec. 18 2012

Most piece of energy utilization in server farms originates from calculation handling, plate stockpiling, system and cooling frameworks. These days, there are new advancements and strategies proposed to decrease vitality cost in server farms. From the above paper outline (in 300 words) the current work done in these fields. The creators are especially mindful that green distributed computing (GCC) is a wide range and hot region. The contrast amongst 'client' and 'cloud-based vitality assets supplier' can be critical for GCC's worldwide wide biological system generation. A client presents an administration demand to a cloud specialist organization with an Internet association or wired/remote system. The asked for benefit is come back to the client in time, however data stockpiling and preparing, interoperating conventions, benefit design, correspondence and disseminated PCs are effortlessly intelligent through all systems (Smith, 2014).

Big Data Analysis

Exercise 3: Cloud API Functionalities

List the functionalities that can be achieved by using the APIs mentioned in the following link:

Retrieve account details currency, name, currency value, points balance, account type moreover exceptions)
Retrieval Offer (Discount ID, Offer Information, Validity Period, Valid Period to)
How to choose a discount (discount)

What API is used in the following link and how it is used?

OpenStack username and password

--os-username, --os-password and --os-tenant-name

export OS_USERNAME=openstack

export OS_PASSWORD=yadayada

export OS_TENANT_NAME=myproject

Define the authentication url

--os-compute-api-version

export OS_AUTH_URL=https://example.com:8774/v2/

export OS_COMPUTE_API_VERSION=2

Keystone - export OS_AUTH_URL=https://example.com:5000/v2.0/

Write a report (1 page) about the Openstack features and functionalities.

The first is control. Open source platforms mean that there are no specific bindings and limitations, moreover modular designs may bring integration of traditional in addition to third-party technologies to gather their company requirements. IT projects are provided by the cloud computing, and IT teams may become their personal cloud Compute service providers, and building as well as maintaining open sources private CC are not for each business; however if you use developers and infrastructure, this would be the best choice (Khan, Shah & Nusratullah, 2015).

Followed by the compatibility. Its public cloud compatibilities allows companies to easily migrate data and applications in the future based on public cloud strategies for security, economy, and other mission-critical business standards.

The third is scalability. Currently mainstream Linux operating systems, including Fedora, The SUSE will also support it.

Fourth is flexibility. The Flexibility are the one of its greatest strengths. Users can build their infrastructure according to their needs and can easily enlarge cluster size for you.

Fifth is industry standard. Over 10 countries from leading global companies, including more than a 60 countries including Dell, Cisco, Microsoft, and Intel participated in the project

Sixth is the practice testing. The Practice is sole criterions for the testing truth. It’s operating systems that have been verified by global operation of large public as well as private cloud technologies ("Green Computing and its Applications in Different Fields", 2017).

References

Abouelmehdi, K., Beni-Hessane, A., & Khaloufi, H. (2018). Big healthcare data: preserving security and privacy. Journal Of Big Data, 5(1). doi: 10.1186/s40537-017-0110-7

Bughin, J. (2016). Big data, Big bang?. Journal Of Big Data, 3(1). doi: 10.1186/s40537-015-0014-3

Corea, F. (2016). Can Twitter Proxy the Investors' Sentiment? The Case for the Technology Sector. Big Data Research, 4, 70-74. doi: 10.1016/j.bdr.2016.05.001

Garg, P. (2016). A green step towards computing: Green cloud computing. International Journal Of Research Studies In Computing, 5(2). doi: 10.5861/ijrsc.2016.1518

Gomes, D., Tolosana-Calasanz, R., & Agoulmine, N. (2015). Introduction to special issue on Green Mobile Cloud Computing (Green MCC). Sustainable Computing: Informatics And Systems, 8, 37. doi: 10.1016/j.suscom.2015.11.002

Green Computing and its Applications in Different Fields. (2017). International Journal Of Recent Trends In Engineering And Research, 3(2), 185-189. doi: 10.23883/ijrter.2017.3023.6yhea

Hoskins, M. (2014). Common Big Data Challenges and How to Overcome Them. Big Data, 2(3), 142-143. doi: 10.1089/big.2014.0030

Jee, K., & Kim, G. (2013). Potentiality of Big Data in the Medical Sector: Focus on How to Reshape the Healthcare System. Healthcare Informatics Research, 19(2), 79. doi: 10.4258/hir.2013.19.2.79

Khan, M., Fahim Uddin, M., & Gupta, N. (2014). Seven V’s of Big Data Understanding Big Data to extract Value. Retrieved from https://asee-ne.org/proceedings/2014/Professional%20Papers/113.pdf

Khan, N., Shah, A., & Nusratullah, K. (2015). Adoption of Virtualization in Cloud Computing. International Journal Of Green Computing, 6(1), 40-47. doi: 10.4018/ijgc.2015010104

Ko?cielniak, H., & Puto, A. (2015). BIG DATA in Decision Making Processes of Enterprises. Procedia Computer Science, 65, 1052-1058. doi: 10.1016/j.procs.2015.09.053

Smith, B. (2014). Green computing. Boca Raton, Fla: CRC Press.

Stonebraker, M. (2010). SQL databases v. NoSQL databases. Communications Of The ACM, 53(4), 10. doi: 10.1145/1721654.1721659

Vis, F. (2013). A critical reflection on Big Data: Considering APIs, researchers and tools as data makers. First Monday, 18(10). doi: 10.5210/fm.v18i10.4878

Wrox. (2011). Professional NoSQL.

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2020). Data Science: Understanding Big Data, Courses Offered And Big Data Platform. Retrieved from https://myassignmenthelp.com/free-samples/itech-2201-cloud-computing-for-data-visualization-and-practical-application.

"Data Science: Understanding Big Data, Courses Offered And Big Data Platform." My Assignment Help, 2020, https://myassignmenthelp.com/free-samples/itech-2201-cloud-computing-for-data-visualization-and-practical-application.

My Assignment Help (2020) Data Science: Understanding Big Data, Courses Offered And Big Data Platform [Online]. Available from: https://myassignmenthelp.com/free-samples/itech-2201-cloud-computing-for-data-visualization-and-practical-application
[Accessed 24 April 2024].

My Assignment Help. 'Data Science: Understanding Big Data, Courses Offered And Big Data Platform' (My Assignment Help, 2020) <https://myassignmenthelp.com/free-samples/itech-2201-cloud-computing-for-data-visualization-and-practical-application> accessed 24 April 2024.

My Assignment Help. Data Science: Understanding Big Data, Courses Offered And Big Data Platform [Internet]. My Assignment Help. 2020 [cited 24 April 2024]. Available from: https://myassignmenthelp.com/free-samples/itech-2201-cloud-computing-for-data-visualization-and-practical-application.

Get instant help from 5000+ experts for

Writing Rewriting Editing

Subject/course code

❮ ❯

Pages

250 words

Order description (write/attach)

Attach file

I accept the T&C, agree to receive offers & updates

Have a coupon code?