Data Analysis Essay on East Asia & Pacific Countries.

Assignment Task

You are a member of the team, and need to perform data analysis on countries in the region of East Asia & Pacific.

The team has not set any specific goal for the analysis. Therefore, you have the freedom to explore the data, and dig out anything you feel interesting or significant.

You have been requested to prepare a data analysis report about your work and explain your findings. The potential audiences include other researchers, business representatives, and government agencies. They may have limited ICT or mathematical knowledge.

To prepare the report, please follow the following outline:

1. Introduction

Provide an introduction to the problem. Include background material as appropriate: who cares about this problem, what impact it has, where does the data come from.

2. Data Setup

Describe how to load the data, and the libraries needed. Provide an overview of the data about its dimensions and structures.

3. Exploratory Data Analysis

Perform 3 one-variable analysis. Plot at least one graph for each variable. Explain why the selected graph is appropriate.

Perform 2 two-variable analysis. Plot at least one graph for each variable. Explain why the selected graph is appropriate

The analysis can be performed on all years and all countries, or on a subset of your interest.

4. Advanced Analysis

Clustering

Briefly explain the concept of clustering and k-means.

Try to do a clustering analysis to group countries according to some selected attributes.

4.2 Linear Regression

Briefly explain the concept of linear regression.

Try to do 2 linear regression analysis. Plot the learned models.

The analysis can be performed on all years and all countries, or on a subset of your interest.

5. Conclusion

6. Reflections

In this part, discuss any difficulties you had performing the analysis and how you solved those difficulties. Reflect on how the analysis process went for you, what you learnt, and what you might do differently next time.

Data Setup

1. INTRODUCTION

This report represents the finding of the analysis on the data provided by the World Bank on the Health and Population in the East Asia and Pacific region. The analysis focuses on two main subsets in the data. These subsets are Immunization Data and Alcohol Consumption Data.

Immunization is an important health care aspect in any country. This aspect determines the survival rate of infants and children below five years of age. Polio, for instance, can develop into a disease that affects an individual’s entire live. Investing in immunization drives, is therefore a very important step in ensuring both the survival of the infants and children aged five and below, and a healthy life. This report compares the immunization in countries in East Asia and Pacific to obtain a view of the distribution of immunization among the populations in East Asia and Pacific. This report also examines the relationship that exist between the immunization and the amount of money allocated for health in the specific countries. The aim for examining the relationship is to establish whether higher allocations for health mean a higher allocation for immunization and if lower allocations for health mean lower allocation for immunization. The findings will provide experts in the health sectors in the various countries with information on the impact of investment in health to the immunization, information that form a reliable basis for health planning and funding.

The report also relates alcohol consumption in East Asia and Pacific to the individual income in each country. This comparison intends to establish whether the level of income has any influence on the amount of alcohol consumption. This analysis first compares the alcohol consumption and the Gross National Income (GNI) of the individual countries in East Asia and Pacific. The comparisons will provide an understanding of the nature of each of the two variables in the region. The outcome from this analysis will be relevant to researchers and professionals interested in social data analysis.

2. DATA SETUP

The R Code that imported and loaded the Health and Population data, HealthAndPopulation.csv into RStudio for analysis is as represented below

#Importing the data for East Asia and Pacific as EAP

EAP

The structure and dimensions of the Health And Population Data were obtained using the str() and dim() functions in R.

#Data Dimensions

dim(EAP)

#Data Structures

str(EAP)

summary(EAP)

From the structure and dimensions analysis, the Health and Population Data has 967 entries and 19 variables. The entries however are inclusive of five extra rows that at the end of the data file that give information about the data file. Therefore, the correct number of entries is 962 with the variables correctly recorded as 19. The 19 variables in the dataset are factors with the Country Name variable having 37 levels (Excluding the level representing the column name). The 37 level represent the 37 countries whose data is in the dataset.

Exploratory Data Analysis

The preprocessing of the Health and Population Data involved the creation of two subsets from the HealthAndPopulation.csv dataset. The resulting subsets were the Immunization Data.csv and Alcohol Consumption Data.csv. The Alcohol Consumption Data subset contains the Country Name, Alcohol Consumption and GNI variables for the year 2015. The Immunization Data subset contains the Country Name, Immunization BCG and Health Expenditure Total variables for the year 2014.

The preprocessing also involved the omitting of entries with missing values in the subsets using the codes below for importing and preprocessing the subsets in R.

#Data Preparation
#Specifying The Datasets for Analysis
#Alcohol Consumption Data (TacD)
TacD TacD[TacD == ".."] TacD TacD$Alcohol.Consumption TacD$GNI TacD$Country.Name
#Immunization Data (ImD)
ImD ImD[ImD == ".."] ImD ImD$Immunization ImD$Health.Expenditure ImD$Country.Name

The datasets were stored in the TacD and ImD data frames. The as.numeric function also converted the Alcohol Consumption, GNI, Immunization and Health Expenditure variables from factors.

3. EXPLORATORY DATA ANALYSIS

A. ONE VARIABLE ANALYSIS

I. ALCOHOL CONSUMPTION ANALYSIS FOR 2015

This analysis investigated the rates of alcohol consumption across the East Asia and Pacific region. The R Code below plotted the results for this analysis.

#Total Alcohol Consumption Analysis
plot(TacD$Country.Name,TacD$Alcohol.Consumption,
ylab = "Alcohol Consumption",
main = "Alcohol Consumption in East Asia & Pacific (2015)")

The analysis produced the boxplot graph in Figure 1 below

Figure 1

The results of the analysis in the plot above show Vietnam leading in the alcohol consumption in East Asia and Pacific followed by Thailand, Mongolia and China. Indonesia has the lowest alcohol consumption in the region.

II. GROSS NATIONAL INCOME (GNI) ANALYSIS

This analysis compared the Gross National Income (GNI) levels across the East Asia and Pacific region. The GNI provides the information on the average annual income per person in the country of interest. The R Code below plotted the results for this analysis.

#GNI Analysis
plot(TacD$Country.Name,TacD$GNI,
ylab = "GNI",
main = "GNI in East Asia & Pacific (2015)")

The analysis produced the boxplot graph in Figure 2 below

Figure 2

Advanced Analysis

The results of the GNI analysis in the graph above show China leading in the GNI levels in East Asia and Pacific followed by Tuvalu, Australia and Thailand. The country with the lowest GNI in the region is Cambodia.

II. IMMUNIZATION BCG DATA ANALYSIS

This analysis compared the immunization BCG in the different countries in the East Asia and Pacific region. The R Code below plotted the results for this analysis.

#Immunization Analysis
plot(ImD$Country.Name,ImD$Immunization,
ylab = "Immunization BCG",
main = "Immunization BCG in East Asia & Pacific (2015)")

The analysis produced the boxplot graph in Figure 3 below

Figure 3

The results of the Immunization BCG analysis in the graph above show that up to eleven countries lead in the Immunization BCG in East Asia and Pacific. These countries include China, Thailand, Fiji and Tuvalu. Kiribati has the lowest Immunization BCG in the region.

B. TWO VARIABLE ANALYSIS

I. ALCOHOL CONSUMPTION - GROSS NATIONAL INCOME (GNI) ANALYSIS

This analysis examined the relationship between the alcohol consumption and the Gross National Income in East Asia and Pacific. The R Code below plotted the outcome of the analysis.

#Alcohol Consumption Data Analysis
plot(TacD$GNI,TacD$Alcohol.Consumption,
xlab = "GNI", ylab = "Alcohol Consumption",
main = "Alcohol Consumption - GNI Analysis in East Asia & Pacific")

The analysis produced the scatterplot in Figure 4 below

Figure 4

The results in the plot above does not show any directly definable relationship between the Alcohol Consumption and the GNI in the East Asia and Pacific Region.

II. IMMUNIZATION BCG – HEALTH EXPENDITURE ANALYSIS

This analysis related the Immunization BCG and Health Expenditure variables to establish the relationship between them. The data on the Health Expenditure consisted of the totals of both government and private health sectors in the region. The R Code below plotted the outcome of the analysis. #Immunization Data Analysis
plot(ImD$Health.Expenditure,ImD$Immunization,
     xlab = "Health Expenditure",
     ylab = "Immunization BCG",
     main = "Immunization BCG - Health Expenditure Analysis in East Asia & Pacific")

Conclusion

The analysis produced the scatterplot in Figure 5 below

Figure 5

The outcome of the analysis represented in the plot above show an almost spread out trend, with high immunization BCG being spread across health expenditure levels. However, the lowest immunization BCG is in the least half of the health expenditure and the country with the highest health expenditure level has the highest immunization BCG in the region.

4. ADVANCED ANALYSIS

A. HIERARCHICAL AND K MEANS CLUSTER ANALYSIS

Clustering group together items that are similar in a dataset based on a predetermined condition or attribute (Galit, et al., 2018). It investigates the relationships and the nature of the data in multivariate data sets (Jon, 2006). This cluster analysis focused on the Gross National Income (GNI) in East Asia and Pacific region for the year 2015. The analysis aimed at grouping the countries in the region according to their GNI levels. The GNI, as an economic indicator, gives a view of how the economies in the region compare and relate.

The R Code below produced the plot for the hierarchical cluster analysis.

#Hierarchical Cluster plot for GNI in 2015
Hclust plot(Hclust, labels = TacD$Country.Name,
ylab = "GNI",
main = "GNI EAST ASIA & PACIFIC 2015")

The analysis produced the plot in Figure 6 below

Figure 6

The plot of the hierarchical cluster analysis above shows two main clusters from the top. These two clusters represent the top half and the bottom half in the GNI levels in the region. The top half on the left forms two more clusters with the left most country, China, representing the country with the highest GNI followed by Tuvalu, Australia and Thailand. The bottom half is also divided into two more clusters with the left most sub cluster of Myanmar, Cambodia and Malaysia being countries with the lowest GNI level in the region.

Reflections

The R Code below produced the plot for the K means cluster analysis

#K means Cluster plot for GNI in 2015
set.seed(10)
Kclust clusplot(TacD, Kclust$cluster, color=T, shade=T,
labels=2, lines=0,
main = "GNI EAST ASIA & PACIFIC 2015")

The analysis produced the plot below

Figure 7

The K means analysis grouped the East Asia and Pacific countries into three groups depending on their level of GNI. The biggest cluster labeled 3 in red consisted of countries such as Malaysia, Myanmar, Nauru and Solomon Islands indicated as 17, 21, 22 and 31 respectively. These are the countries with the lowest GNI, we can conclude that a huge number of countries in East Asia and Pacific have low GNI. The cluster with countries with the highest GNI labeled as 1 contains countries such as China and Australia indicated as 2 and 5 respectively.

B. LINEAR REGRESSION

The modelling of linear regressions is an analysis method that generates a linear equation to describe the nature of the relationship between the variables (two or more) in question (Freedman, 2009). Linear regression is a linear relationship given by the equation

Xi = d₀+ d₁v₁ + e₁, for a random variable X and controlled variable v (Jorge, et al., 2013). The analysis will involve the two variables in each of the subsets.

I. ALCOHOL CONSUMPTION – GROSS NATIONAL INCOME (GNI) MODEL

This model describes the relationship between the alcohol consumption and the Gross National Income (GNI) in East Asia and Pacific Region.

The R Code below generated the Alcohol Consumption – Gross National Income (GNI) Model

#Alcohol Consumption Data Model
AcModel summary(AcModel)
plot(Alcohol.Consumption~GNI, data = TacD,
main = "Alcohol Consumption Data Model")

The model produced the plot below

Figure 8

Table 1 below represents the output of the Alcohol Consumption – Gross National Income (GNI) Model

Table1

The results from the table above show that the linear equation describing the relationship between the alcohol consumption and the Gross National Income (GNI) in East Asia and Pacific is:

Alcohol Consumption = 11.4760 + [0.1351 * (GNI)]

The R²value for the model equals 0.02535; this means the model explains 2.535% of the relationship between the alcohol consumption and the Gross National Income.

II. IMMUNIZATION – HEALTH EXPENDITURE MODEL

This model describes the relationship between the immunization and the health expenditure in East Asia and Pacific Region.

The R Code below generated the Immunization – Health Expenditure Model

#Immunization Data Model
ImModel summary(ImModel)
plot(Immunization~Health.Expenditure, data = ImD,
main = "Immunization - Health Expenditure Model")

The model produced the plot below

Figure 9

Table 2 below represents the output of the Immunization – Health Expenditure Model

Table2

The results from the table above show that the linear equation describing the relationship between immunization and health expenditure in East Asia and Pacific is

Immunization = 6.8402 + [0.2585 * Health Expenditure]

The R²value for the model equals 0.2111; this means that the model explains 21.11% of the relationship between the immunization BCG and the health expenditure of countries in the region.

5. CONCLUSION

The analysis in this report indicate that China and Thailand are among the countries that lead in both Gross National Income (GNI) and alcohol consumption. This implies that that in the two countries, the increase in the GNI has resulted in the increase in alcohol consumption.

The analysis also indicates that China and Tuvalu are among the countries leading in both Gross National Income (GNI) and Immunization BCG, implying that the higher the health expenditure the higher the immunization BCG.

China, Tuvalu, Australia and Thailand lead the East Asia and Pacific region in terms of GNI and from analysis there could economic interrelation between the four nations. Myanmar, Cambodia and Malaysia on the other hand have the lowest GNI in the region.

The analysis however, does not find any significant linear relationship between alcohol consumption and Gross National Income (GNI), and Immunization and Health Expenditure for the region. Both the linear models explain a fraction of the relationships with 2.535% for Alcohol Consumption Data and 21.11% for the Immunization Data.

6. REFLECTIONS

The Health and Population dataset contained numerous missing entries. This meant data preprocessing had to involve the omitting of these entries. Although this process ensured complete data for analysis, it also limited the accuracy of the analysis.

The alcohol consumption data for instance, only had values for the year 2015, thereby restricting the analysis to the year 2015. The data on the Health Expenditure also had similar limitations with data available for the year 2015. Therefore, the analysis is not representative of the most recent findings.

The missing entries also meant that the analysis did not represent the entire East Asia and Pacific region and thus not completely representative of the region as a whole.

In conclusion, a more complete dataset would provide reliable analysis for the East Asia and Pacific region.

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2020). Data Analysis Essay On East Asia & Pacific Countries.. Retrieved from https://myassignmenthelp.com/free-samples/ict110-introduction-to-data-science/exploratory-data-analysis.html.

"Data Analysis Essay On East Asia & Pacific Countries.." My Assignment Help, 2020, https://myassignmenthelp.com/free-samples/ict110-introduction-to-data-science/exploratory-data-analysis.html.

My Assignment Help (2020) Data Analysis Essay On East Asia & Pacific Countries. [Online]. Available from: https://myassignmenthelp.com/free-samples/ict110-introduction-to-data-science/exploratory-data-analysis.html
[Accessed 28 April 2024].

My Assignment Help. 'Data Analysis Essay On East Asia & Pacific Countries.' (My Assignment Help, 2020) <https://myassignmenthelp.com/free-samples/ict110-introduction-to-data-science/exploratory-data-analysis.html> accessed 28 April 2024.

My Assignment Help. Data Analysis Essay On East Asia & Pacific Countries. [Internet]. My Assignment Help. 2020 [cited 28 April 2024]. Available from: https://myassignmenthelp.com/free-samples/ict110-introduction-to-data-science/exploratory-data-analysis.html.

Get instant help from 5000+ experts for

Writing Rewriting Editing

Subject/course code

❮ ❯

Pages

250 words

Order description (write/attach)

Attach file

I accept the T&C, agree to receive offers & updates

Have a coupon code?