Objectives
Objectives
This assignment aims to assess students’ theoretical understanding and practical knowledge of concepts covered in independent learning and practical sessions.
Background
The dataset provided in this assignment is created by researchers at the Peterson Institute for International Economics (PIIE) which uses data from the 1996 to 2014 Forbes’ World’s Billionaires lists. The data includes the name, country of citizenship and networth (current US dollars) and other variables of the world’s billionaires.
Using data and analytics, we can use the data to analyse outcomes and look at trends in the numbers, geographical locations, and sources of wealth and/or net worth of the world’s richest people. The dataset provides insights on sources of wealth acquired by the world’s wealthiest, change in gender and age distribution and regions they are based in. Hence, we can learn more about changes in extreme wealth, inequalities of wealth distribution and economic problems.
You are challenged to analyse trends and insights of the world’s wealthiest or any other relevant study using the publicly available dataset. Following are required for the assignment:
- Business understanding
- Data understanding
- Data profiling
- Use of data cleaning techniques
- Data integration
- Data transformation and
- Exploratory data analysis
Datasets
You are given 3 datasets scrapped from the dataset created by researchers at the Peterson Institute for International Economics (PIIE). To facilitate the assignment, some modifications are made where appropriate.
This dataset describes the demographics, companies info, sources of wealth:
- Wealthiest Demographics: detailed data including demographics;
- Company Info: detailed data of the company primarily associated with billionaire’s wealth;
- Countrycode: data on countrycodes and country
The below data dictionary provides a description of each column for all datasets.
Wealthiest Demographics Dataset
Name
|
Description
|
age
|
The age of the billionaire
|
citizenship
|
Billionaire country of citizenship
|
gender
|
Gender identity: female or male
|
name
|
Name of the individual or family on the billionaires list
|
was political
|
True, if billionaire is linked to a politician, or questionable license
|
wealth.type
|
Source of the wealth
|
worth in billions
|
Net worth of billionaire, current US dollars in billions
|
country code
|
3 digit ISO country code
|
region
|
Location classification of the billionaire
|
Company Info Dataset
Name
|
Description
|
category
|
Broad industry categories
|
company.name
|
Company primarily associated with billionaire’s wealth
|
company.type
|
Indicates if company was new, acquired or privatized when billionaire or family members were first associated with it
|
founded
|
Founding date of the company associated with the billionaire’s wealth
|
gdp
|
By country GDP, current US dollars
|
country code
|
3 digit country code
|
name
|
Name of the individual or family on the billionaires list
|
industry
|
The industry labels based on Kaplan and Rauh (2013) which the company primarily associated with billionaire’s wealth is in
|
relationship
|
Describes the billionaire’s relationship to their company
|
sector
|
The sector which the company primarily associated with billionaire’s wealth is in
|
year
|
The year the billionaire or family is listed in the wealthiest list
|
Countrycode Dataset
Name
|
Description
|
citizenship
|
The status of being a citizen in the country
|
country code
|
3 digit country code
|
region
|
Location classification of the billionaire
|
Task – Case Study
What can we learn about the about changes in extreme wealth and inequalities of wealth distribution in United States, Europe and other advanced countries using the dataset on the sources of billionaire wealth? Are the wealth mostly self-made or inherited and can we identify the company and industry from which it comes?
Among self-made billionaires are the individuals, founders, executives, politically connected? Can industries, sectors, regions lead to wealth being generated faster based on the data?
What You Need To Do
Using data and analytics, we can research the data, analyse outcomes and look at trends in the numbers, geographical locations, and sources of wealth and/or net worth of the world’s richest people
You are to explore the data and provide the insights in the numbers, geographical locations, and sources of wealth and/or net worth of the world’s top wealth distribution. Before you can visualize and perform analysis on the data, there is a need to understand the business requirements, understand the data, and perform data cleaning and exploration.
Data Profiling and Data Preparation
- Discuss the data quality issues found in the datasets provided.
- What method(s) will you use to clean the data? Explain with details.
- How will you integrate the datasets?
- How will you derive any categorical columns? Document and explain in detail.
- What are the imputation methods required? Explain the rationale behind it when you apply.
- Apply the data transformation to some columns as appropriate.
Below are some of the fundamental questions about inequality in wealth distribution:
- Are the wealthiest distributed equally (geographical locations)?
- Are the demographics of the wealthiest equal (median age, citizenship, net worth)?
- Which hosts are running a business with multiple listings and where are they?
- Find out whether there is any relationship between the wealthiest and:
- Sector of the company primarily linked to the billionaire or family
- Industry primarily linked to the billionaire or family
- Sources of wealth
- Geographical locations
- Find out whether there is any similarities/dissimilarities between the billionaires across:
- Regions
- Age group
- Industries
- Sectors
- Analyse the growth of the world’s wealthiest on the timeline (any highlight over time e.g. country or region)?