This report on the Health and Population for East Asia and the Pacific provides the results from analysis of the datasets and variables present in the data. The data analysis aims at establishing the nature of variables in the data as well as identify relevant relationships between the variables.
The analysis results presented in this report focused on three key data variables. These variables are Unemployment Rate, Birth Rate and Tertiary Institution Enrollment Rate. The analysis examines the three variables in the East Asia and Pacific Region over the period between the year 2001 and 2015, comparing the variables in different countries for inferencing. The three variables are also analyzed for relationships to understand the effect of one variable on another variable.
The East Asia and Pacific Region has over the years become one of the leading trade zones of the world. This is partly attributed to the rapid industrialization of the economies in the region lead by China, Australia and Japan. With this rise in trade and growth of economies in the region, there is need to look into the social impact on the populations in the region. This forms the main objective of this report, to establish the social impact of the economic growth in the East Asia and Pacific region. Also of importance would be looking at the different economies both large and small in the East Asia and Pacific region, comparing them with respect to the three variables of interest.
The results from the analysis would be useful for professionals researching on any of the three variables. It would also be significant for various stakeholders, both public and private, that are interested in any of the three variables. This would be such as Governments interested in reducing the unemployment rate and NGOs working in the medical field interested in the birth rate for a given East Asia and Pacific country.
The output and inferences drawn from this report will provide a picture of the nature of the Birth Rate, Unemployment Rate and Tertiary Institution Enrollment Rate across the whole region. This will make identifying trends and predicting future rates possible, an aspect that will be vital for planning of projects by individuals, organizations and governments (Projects that are centered in one or more variables among the three variables of interest).
The Health and Population Data used for the analysis in this report was collected from the World Bank data releases for the years 2001 to 2015.
The R Code below was used to import the Health and Population Data into R Studio:
To explore the data for attributes, structure and dimensions, the following R Code was used:
The output of the code above produced information about the type and number of variables present in the data. The missing values were also replaced with NA for easy analysis and identification.
The set of complete dataset was also produced and stored in the data frame HnPDataN. These data frame consisted of only three variables; Adolescent Fertility Rate, Immunization and GNI per Capita. All of which fall outside the scope of the data analysis in this report. This therefore meant that removing rows with missing values from the dataset would make the data insignificant for the analysis in this report.
In the preprocessing of the dataset, preference was given to the subset data containing information on the three variables of interests. The subsets containing the three variables were filtered from the main Health and Population Dataset and stored as; BirthRate.csv, UnemploymentRate.csv and TER.csv for the Birth Rate, Unemployment Rate and Tertiary Institution Enrollment Rate data variables. An additional dataset was filtered from the main dataset, ChinaData.csv for analysis focused on China.
The Cluster and Amelia R packages would be required for conducting the missingness and the clustering analysis respectively.
The analysis involved two stages. First was a comparison between the top four countries with the highest GDP in the region i.e. China, Australia, Japan and South Korea. The second stage involved a comparison between the country with the highest GDP and the country with the lowest GDP that had sufficient birth rate data, which are China and Kiribati. The following code produced the analysis of the Birth Rate in East Asia and the Pacific region.
From the plot in figure 2 below, we observe that as of 2014, China had the highest birth rate, followed by Australia and South Korea, while Japan had the lowest birth rate. All the four countries have had the lowest birth rate on at least one occasion during the period between the years 2001 and 2015.
From figure 3 below, we observe that the birth rate in Kiribati has been on a sharper decline over the period between 2001 and 2015. This is in contrast to that of China that has had an undefinable trend with an overall gentle decline over the same period.
The analysis involved a comparison between the top four countries with the highest GDP in the region i.e. China, Australia, Japan and South Korea. The following code produced the analysis of the Birth Rate in East Asia and the Pacific region.
From the plot in figure 4 below, as of 2014 Australia had the highest unemployment rate of the four top countries with the highest GDP in the region. It was followed by China and South Korea while Japan had the lowest unemployment rate in the region. Australia and China showed similarity in trend in unemployment for the period 2001 to 2015.
The analysis involved a comparison between the country with the highest GDP and the country with the lowest GDP that had sufficient data on tertiary institution enrollment rate, which are China and Brunei Darussalam. The following code produced the analysis of the tertiary institution enrollment rate in East Asia and the Pacific region.
From figure 5 below, we can observe that the enrollment into tertiary institutions has been on a general rising trend for both China and Brunei Darussalam from the year 2002 to the year 2015.
Two sets of analysis were conducted on the Health and Population data for China, ChinaData.csv. The analysis captured the interaction between tertiary institution enrollment rate and unemployment rate, and unemployment rate and Birth rate in China. The following R Code produced the analysis.
From figure 6 below, there does not appear any direct correlation between the enrollment rate into tertiary institutions and the unemployment rate in China. However, it can be observed that points indicating high enrollment rate correspond with points of high unemployment rate and vice versa is true.
From figure 7 below, there does not appear any direct correlation between the unemployment rate and the birth rate in China. However, it can be observed that points indicating highest unemployment rate correspond with points of high birth rate and vice versa is true.
The process of clustering of a dataset groups variables with similar attributes. This technique of data analysis makes it simpler to observe the characteristics of the variables in the dataset. Grouped variables can also be considered as correlated allowing links to be drawn in the inference of the data analysis.
This report makes use of two clustering techniques i.e. Hierarchical and Kmeans Clustering of the birth rate dataset. The R Code that generated the clusters in the two techniques is as below.
From figure 8 of the Hierarchical Clustering, we observe that the countries in the region have been grouped depending on their GDPs. Those with the highest GDPs such as China and Australia lie in the same cluster, as well as South Korea and Japan. The countries with the lowest GDPs have also been grouped in the same cluster, example being Kiribati and Solomon Islands.
Figure 9 of Kmeans Clustering represents 5 Kmeans clusters from the birth rate dataset. The plot shows one cluster (labelled 1) that appears bigger than the other four clusters.
- LINEAR REGRESSION
Data analysis through linear regression outlines the relationship between two or more variables using an equation of the form y = mx + c (Freedman, 2009). For which x and y are the variables, independent and dependent variables respectively. The assumption is that the relationship between x and y can be represented in the form of a straight line with a gradient = m and y = c, when the value of x = 0.
The code below produced the regression model for the relationships between:
- Unemployment Rate and Tertiary Institution Enrollment Rate in China.
- Unemployment Rate and Birth Rate in China.
At zero unemployment rate in China, the birth rate = 4.6434. The positive value of the coefficient of the unemployment rate indicates that the unemployment rate and the birth rate are directly correlated.
At zero TER in China, the unemployment rate = 4.264. The positive value of the coefficient of the TER indicates that the TER and unemployment rate are directly correlated.
The main challenge in carrying out the analysis of the Health and Population Data for East Asia and the Pacific was the incomplete nature of the data. This challenge restricted the analysis of the data to countries in which the data was complete. In most cases, the smallest countries or the countries with the lowest GDPs had no data present. The analysis therefore may represent a good population of the region, but may not be spread out enough to represent all the countries in the region.
We can conclude that the birth rate in the region is related to the GDP of the country in question, with wealthier countries having higher birth rates. The unemployment rate is also related to the GDP of the country in question. The higher the GDP of the country in question the higher the unemployment rate. The enrollment rate to the tertiary institutions is however similar across the region regardless of the GDP of the country in question. We can also conclude that there exist linear relationships between unemployment rate and birth rate, and unemployment rate and enrollment rate into the tertiary institutions in China.
Freedman, D. A., 2009. Statistical Models: Theory and Practice. 1st ed. London: Cambridge University Press.
Galit, S. et al., 2018. Data Mining for Business Analytics. 1st ed. New Delhi: John Wiley & Sons, Inc..