Data Sets Descriptions
The data used for the coursework is marketing campaign dataset based on a case of a retailer company. The primary objective of your work is to prepare data for further data mining and analysis.
The detailed data description of those data files is given below:
Marketing campaign data.csv
The data set contains 1500 customer records. Each record consists of 19 variables, which includes socio-demographic and product ownership information.
Requirements specifications
1.Data understanding
·To understand and describe the data sets are and characteristics of each attribute.
2.Data preparation
·Write Python programs to reduce variables (e.g remove variables with no influences on the target variable and COMMENTS which requires dedicated text mining tools)
·Write Python programs to clean data (e.g remove record with missing values)
·Write Python programs to transform variable into the following:
a)CUST_GENDER into binary F - 2, M -1
b)COUNTRY_NAME into ordinal number based on their occurrence in the data set in ascending order.
c)CUST_INCOME_LEVEL into 4 ordinal numbers 1 – low income (below 50,000), 2 – middle income(50,000 - 149,999), 3 – high income (150,000 - 299,999), 4 – super rich
d)EDUCATION into ordinal numbers based on USA education level in ascending order.
e)HOUSEHOLD_SIZE into ordinal numbers based on room number.
3.Data analysis
·Write a Python program to show summary statistics of sum, mean, standard deviation, skewness, and kurtosis of all variables.
·Write a Python program to calculate and show correlation of each variable with the target variable
·Write a Python program to calculate Euclidean distance between two user chosen customers.
4.Data exploration
·Write a Python program to show histogram plot with multiple subplots of any two user chosen variables.
·Write a Python program to show “X and Y keywords” plot for any two user chosen variables
5.Data Mining
Build two Predictive Models to predict AFFINITY_CARD taken in the marketing campaign with Python using the prepared data.
6.Discussion and reflection of the work
7.Professional document organisation and presentation
All Python programs should written in Python version 3.x and have screen shots of testing, results with brief discussion and justification in the technical report. Python codes should include adequate comments and saved in .py file(s).