You are working as a Data Scientist in an ecommerce company. The company is a market leader with range of product segments. The main product segments are latest gadgets, books, toys, household items and clothes.
However, the company’s performance especially its profitability is decreasing due to increased competition and changing customer expectations. The company has well qualified board of directors who believe in the power of data analytics in solving business problems and thus make important business decisions. Due to your data analytics skills and data science capability, the company executives have given you a special task in analysing the company’s sales data for each product segment and each geographic region. The executives have requested you the following:
To write a report outlining the changes that have occurred in a business due to digital innovation using a work cantered analysis approach.
Background
Today we purchase the most items online. We can get desired product at home within stipulated time. We can have more variety, quality in the desired products in online shopping than the offline shopping. Today we heard the worlds Amazon, Flipkart which is very common. There are many factors for shifting from offline shopping to online shopping. Wolfinbarger and Gilly (2001) reported that "consumers report that shopping online results in a substantially increased sense of freedom and control as compared to offline shopping." Lee and Lin (2005) developed the research model for examining the relationship among e?service quality dimensions and overall service quality, customer satisfaction and purchase intentions.
eCommerce becoming very raising and popular business in every corner of the world. Online shopping is most popular from the all eCommerce business. According to an online survey, “Australians spent a total of $1.95 billion per month on online shopping alone”. Online shopping for electronic device is very common in over the world.
eCommerce business increasing exponentially in recent decade bring new challenges to the service provider. Business competition and customer satisfaction are the most important factors in the eCommerce business.
About Data:
For this case study, we have developed the data. We have data regarding the sale of laptops (1390 products) in the month. We assume that this data is from company Computer World. We considered the following attributes Product Name, Product Price (in $), Profit (in $), Sale Price (in $), Number of customers who bought this products, Shipping Type (Free or Paid), Customer Type (New or Existing), Region (NSW, QLD, WA, VIC, TAS, SA), Product Brand (Acer, Asus, Haier and HP), CPU Type (i3, i5 and i7), Operating System (Windows 10 Home and Windows 10 Pro), Screen Size in inches (11.1, 14, 15.6 and 17.1).
We need to define following variables for the study objectives as
Total Monthly sale amount (in $) = Sale Price (in $) × Number of customers
Total monthly profit (in $) = Profit (in $) × Number of customers
Project Problem:
We concentrate on the following
- Profit analysis by shipping type, customer type, region, brand, CPU type, operating system and screen size.
- Whether the mean number of customers are significantly different for different shipping type, customer type, region, brand, CPU type, operating system and screen size.
- Correlation analysis of variables
- Regression analysis for total monthly sales
Research Methodology
Statistical tool and techniques are important aspect of data analysis. In literature there are many statistical tools and techniques are available. But use of proper tools and technique is important part of analysis.
For the profit analysis, we presented the total monthly sale amount (in $) and total monthly profit (in $) for shipping type, customer type, region, brand, CPU type, operating system and screen size. We have also calculated the profit percentage in profit analysis. We presented the descriptive statistics for number of customers for shipping type, customer type, region, brand, CPU type, operating system and screen size. We analysed the mean number of customers for shipping type, customer type, region, brand, CPU type, operating system and screen size by the two sample t test and one way ANOVA. We studied the correlation between product price, profit, sale price and number of customers. We try to fit the regression model for total sale amount. We used Python 3.6.5 and MS-Excel for the data analysis. The sample code are given in appendixes. We used Grus (2015), McKinney (2012), Pedregosa et al. (2011) and Schutt and O'Neil (2013).
Assessment of Various Data Analytical Methods
Analytical Findings
Profit Analysis:
For the profit analysis we have given the total monthly sale amount (in $), total monthly profit (in $) and profit percentage for shipping type, customer type, region, brand, CPU type, operating system and screen size. We referred Berenson et al. (2012), Black (2009), Groebner et al. (2008), Kvanli et al. (2000) and Mendenhall and Sincich (1993). Results are represented in Table 1.
Table 1: Profit analysis according to different attributes
Attributes
|
Level
|
Total Monthly
Sale (in $)
|
Total Monthly
Profit (in $)
|
Profit %
|
Shipping Type
|
Free
|
2916380
|
149853.5
|
5.14%
|
Paid
|
4528836
|
232483.1
|
5.13%
|
Customer Type
|
Existing
|
3011425
|
153767.4
|
5.11%
|
New
|
4433790
|
228569.2
|
5.16%
|
Region
|
NSW
|
1360477
|
69336.3
|
5.10%
|
QLD
|
1420248
|
72803.5
|
5.13%
|
SA
|
1467640
|
76496.2
|
5.21%
|
TAS
|
411042.6
|
21301.6
|
5.18%
|
VIC
|
1444593
|
74270.5
|
5.14%
|
WA
|
1341216
|
68128.5
|
5.08%
|
Brand
|
Acer
|
1658146
|
86376.4
|
5.21%
|
Asus
|
1604424
|
97642.3
|
6.09%
|
Haier
|
1830076
|
86780.7
|
4.74%
|
HP
|
2352569
|
111537.2
|
4.74%
|
CPU Type
|
i3
|
2127357
|
108809.4
|
5.11%
|
i5
|
2898768
|
149383.6
|
5.15%
|
i7
|
2419091
|
124143.6
|
5.13%
|
Operating
System
|
Windows 10 Home
|
4008167
|
206626.8
|
5.16%
|
Windows 10 Pro
|
3437049
|
175709.8
|
5.11%
|
Screen Size
(in Inches)
|
11.1
|
1579945
|
80963.0
|
5.12%
|
14
|
1760526
|
91088.5
|
5.17%
|
15.6
|
2010451
|
102975.4
|
5.12%
|
17.1
|
2094294
|
107309.7
|
5.12%
|
Total
|
|
7445216
|
382336.6
|
5.14%
|
From Table 1 we can claim that Computer World earns on average 5.14% profit on each laptop. The profit percentage is significantly different for brand and region. On Asus laptops, company gets more profit about 6.09% whereas for HP and Haier laptops earns only 4.74%. In SA region company gets 5.21% profit whereas only 5.08% from WA region. Profit percentage for different attributes like shipping type, customer type, CPU type, operating system and screen size.
Descriptive statistics for number of customer:
Customer is the pillar of any business. Total monthly sale and profit are proportional to the number of customers. So in this subsection, we presented the summary statistics for the number of customers for shipping type, customer type, region, brand, CPU type, operating system and screen size. We used the well-known books for this section such as Bickel and Doksum (2015), Casella and Berger (2002), DeGroot and Schervish (2012), Hodges Jr and Lehmann (2005), Papoulis (1990), Pillers (2002) and Ross (2014). Table 2 represents the size, mean, standard deviation, minimum and maximum of number of customers.
From Table2, we can observed following
- i) Computer World get averagely 6.6 customers for each laptop
- ii) mean number of customer for the laptops which has free shipping is more than those who has paid shipping
- iii) mean number of customers for laptops in TAS region is more than other region
- iv) mean number of customers for HP brand laptops is more than other brand
- v) mean number of customers for laptops having i5 CPU type is more than other CPU type
- vi) mean number of customers for laptops having Windows 10 Home is more than Windows 10 Pro
- vii) mean number of customers for 15.6 inches laptop is more than other screen size laptops.
Table 2: Summary statistics for numbers of customer for shipping type, customer type, region, brand, CPU type, operating system and screen size.
Attributes
|
Level
|
Size
|
Mean
|
SD
|
Min
|
Max
|
Shipping Type
|
Free
|
539
|
6.72
|
2.857
|
1
|
16
|
Paid
|
851
|
6.53
|
2.844
|
1
|
18
|
Customer Type
|
Existing
|
561
|
6.61
|
2.792
|
1
|
16
|
New
|
829
|
6.60
|
2.889
|
1
|
18
|
Region
|
NSW
|
264
|
6.38
|
3.011
|
1
|
18
|
QLD
|
267
|
6.55
|
2.594
|
1
|
15
|
SA
|
270
|
6.66
|
2.981
|
1
|
17
|
TAS
|
71
|
7.17
|
3.247
|
1
|
14
|
VIC
|
271
|
6.67
|
2.800
|
1
|
16
|
WA
|
247
|
6.62
|
2.719
|
1
|
14
|
Brand
|
Acer
|
340
|
6.15
|
2.607
|
1
|
16
|
Asus
|
349
|
6.11
|
2.654
|
1
|
15
|
Haier
|
347
|
6.27
|
2.647
|
1
|
16
|
HP
|
354
|
7.86
|
3.080
|
1
|
18
|
CPU Type
|
i3
|
460
|
6.10
|
2.529
|
1
|
14
|
i5
|
470
|
7.61
|
3.045
|
1
|
18
|
i7
|
460
|
6.09
|
2.673
|
1
|
17
|
Operating
System
|
Windows 10 Home
|
701
|
7.15
|
2.851
|
1
|
18
|
Windows 10 Pro
|
689
|
6.05
|
2.741
|
1
|
17
|
Screen Size
(in Inches)
|
11.1
|
342
|
6.31
|
2.840
|
1
|
17
|
14
|
350
|
6.42
|
2.740
|
1
|
16
|
15.6
|
338
|
7.12
|
2.996
|
1
|
18
|
17.1
|
360
|
6.59
|
2.766
|
1
|
14
|
Total
|
|
1390
|
6.60
|
2.849
|
1
|
18
|
Two Sample t-test:
Here we are interested to know whether there is significant difference between the mean of number of customers for shipping type, customer type and operating system. Our null hypothesis is that there is no significant difference between mean of number of customers for levels of attributes and alternative hypothesis is that there is significant difference between mean of number of customers for levels of attributes. We used two sample independent test assuming unequal variances. In Table 4, we have represented the value of test statistic and p-value of two sample independent test assuming unequal variances.
Table 4: Two sample independent test for shipping type, customer type and Operating system
Attributes
|
Levels
|
Test Statistic
|
p-value
|
Shipping Type
|
Free and Paid
|
1.16
|
0.245
|
Customer Type
|
New and Existing
|
0.02
|
0.985
|
Operating System
|
Windows 10 Home and Windows 10 Pro
|
7.30
|
0.000
|
From Table 4, we can claim that there is no significant difference between the mean of number of customers between Free and Paid shipping. A similar conclusion for customer type. We observed a significant difference between the mean of number of customers for laptops having Windows 10 Home and Windows 10 Pro. Mean of number of customers for laptops having Windows 10 Home is significantly more than laptops having Windows 10 Pro.
Good Use of Python to Develop a Predictive Model of Monthly Sales
One way ANOVA:
Here we are interested to know whether there is significant difference between the mean of number of customers for region, brand, CPU type, Screen size. We used one way ANOVA for testing the null hypothesis that there is no significant difference between mean of number of customers for levels of attributes and alternative hypothesis is that there is significant difference between mean of number of customers for levels of attributes. In Table 5, we have represented the value of F statistic and p-value.
Table 5: Output of one way ANOVA for region, brand, CPU type and Screen size
Attributes
|
Level
|
F Statistic
|
P Value
|
Region
|
NSW, QLD, WA, VIC, TAS and SA
|
0.96
|
0.440
|
Brand
|
Acer, Asus, Haier and HP
|
32.87
|
0.000
|
CPU Type
|
i3, i5 and i7
|
46.81
|
0.000
|
Screen Size (in inches)
|
11.1, 14, 15.6 and 17.1
|
5.47
|
0.001
|
From Table 5, we observed that there is no significant difference between the mean of number of customers for different region whereas there is significant difference between the mean of number of customers for different brands, CPU type and screen size (in inches). Mean number of customers for HP brand laptop is significantly more than other brand, for laptops having i5 CPU type is significantly more than other CPU type and for laptop having screen size 15.6 inches is significantly more than other screen type.
Correlation Analysis:
In this subsection, we studied the correlation between product price, Sale price, profit and number of customers. Table 6 represents the correlation coefficient between product price, Sale price, profit and number of customers.
Table 6: Pearson’s correlation coefficient for Product Price, Sale Price, Profit and Numbers of customers
|
Product Price
|
Sale Price
|
Profit
|
Numbers of customer
|
Product Price
|
1
|
0.999
|
0.683
|
0.043
|
Sale Price
|
0.999
|
1
|
0.713
|
0.038
|
Profit
|
0.683
|
0.713
|
1
|
-0.057
|
Numbers of customer
|
0.043
|
0.038
|
-0.057
|
1
|
From Table 6, we observed that
- i) Product price is positive correlated with sale price, profit and number of customers.
- ii) Sale price is positively related with profit and number of customers.
- iii) Profit is negatively correlated with number of customers.
Regression analysis:
In this subsection, we have fitted the simple regression model to the total profit. We used number of customers as a predictor variable. Table 7 presents the output of regression analysis.
Table 7: Output of Regression Analysis
Regression Statistics
|
|
|
|
|
|
R Square
|
0.862291
|
|
|
|
|
|
|
|
|
|
|
|
|
ANOVA
|
|
|
|
|
|
|
|
df
|
SS
|
MS
|
F
|
Significance F
|
|
Regression
|
1
|
18906460
|
18906460
|
8691.225
|
0
|
|
Residual
|
1388
|
3019386
|
2175.35
|
|
|
|
Total
|
1389
|
21925846
|
|
|
|
|
|
|
|
|
|
|
|
|
Coefficients
|
Standard Error
|
t Stat
|
P-value
|
Lower 95%
|
Upper 95%
|
Intercept
|
4.636421
|
3.158995
|
1.467688
|
0.142415
|
-1.5605
|
10.83334
|
Number of Customer
|
40.94684
|
0.439218
|
93.22674
|
0
|
40.08524
|
41.80844
|
From Table 7, we observed that there is significant relationship between total monthly profit and number of customers. We also observed R2 as 0.86 suggests that model fitting is adequate. Each addition customer gets a profit of $ 40.96484.
Recommendations to the company
- i) Company needs to concentrate on the products which earns more profit.
- Company needs to attract the number of customers so that profit will be maximize.
- Company should give the offers so that more number of customers will be attracted.
- There is more demand of HP laptops, laptops with Windows 10 Home Operating system, laptops with i3 CPU type, laptops with 15.6 inches screen size.
An implementation plan based on the recommendations you have provided
- i) Company can use more marketing techniques for the brands which earns more profit.
- Company can give the offers to the customer like money back, free delivery certain discount, pay later.
- Company can be used more advertising hording to know about company and products.
- Company should keep the stock of laptops with proper specification which has more demand.
Conclusions
From Profit Analysis, the profit percentage is significantly different for brand and region. On Asus laptops, company gets more profit about 6.09% whereas for HP and Haier laptops earns only 4.74%. In SA region company gets 5.21% profit whereas only 5.08% from WA region.
From two sample t test, we observed that there is no significant difference between the mean of number of customers between Free and Paid shipping. A similar conclusion for customer type. We observed a significant difference between the mean of number of customers for laptops having Windows 10 Home and Windows 10 Pro. Mean of number of customers for laptops having Windows 10 Home is significantly more than laptops having Windows 10 Pro.
Recommendations
From one way ANOVA, we observed that there is no significant difference between the mean of number of customers for different region whereas there is significant difference between the mean of number of customers for different brands, CPU type and screen size (in inches). Mean number of customers for HP brand laptop is significantly more than other brand, for laptops having i5 CPU type is significantly more than other CPU type and for laptop having screen size 15.6 inches is significantly more than other screen type.
From Correlation analysis, we observed that (i) Product price is positive correlated with sale price, profit and number of customers (ii) Sale price is positively related with profit and number of customers and (iii) Profit is negatively correlated with number of customers. From Regression analysis, we observed that there is significant relationship between total monthly profit and number of customers. We also observed R2 as 0.86 suggests that model fitting is adequate. Each addition customer gets a profit of $ 40.96484.
We have also provided recommendations and plan for company.
References
Berenson, M., Levine, D., Szabat, K.A. and Krehbiel, T.C., (2012). Basic business statistics: Concepts and applications. Pearson higher education AU.
Bickel, P.J. and Doksum, K.A., (2015). Mathematical statistics: basic ideas and selected topics, volume I (Vol. 117). CRC Press.
Black, K., (2009). Business statistics: Contemporary decision making. John Wiley & Sons.
Casella, G. and Berger, R.L., (2002). Statistical inference (Vol. 2). Pacific Grove, CA: Duxbury.
DeGroot, M.H. and Schervish, M.J., (2012). Probability and statistics. Pearson Education.
Groebner, D.F., Shannon, P.W., Fry, P.C. and Smith, K.D., (2008). Business statistics. Pearson Education.
Grus, J., (2015). Data science from scratch: first principles with python. " O'Reilly Media, Inc.".
Hodges Jr, J.L. and Lehmann, E.L., (2005). Basic concepts of probability and statistics. Society for Industrial and Applied Mathematics.
Kvanli, A.H., Pavur, R.J. and Guynes, C.S., (2000). Introduction to business statistics. Cincinnati, OH: South-Western.
Lee, G. G., & Lin, H. F. (2005). Customer perceptions of e-service quality in online shopping. International Journal of Retail & Distribution Management, 33(2), 161-176.
McKinney, W., (2012). Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. " O'Reilly Media, Inc.".
Mendenhall, W. and Sincich, T., (1993). A second course in business statistics: Regression analysis. San Francisco: Dellen.
Papoulis, A., (1990). Probability & statistics (Vol. 2). Englewood Cliffs: Prentice-Hall.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V. and Vanderplas, J., (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(Oct), pp.2825-2830.
Pillers Dobler, Carolyn. "Mathematical statistics: Basic ideas and selected topics." (2002): 332-332.
Ross, S.M., (2014). Introduction to probability and statistics for engineers and scientists. Academic Press.
Schutt, R. and O'Neil, C., (2013). Doing data science: Straight talk from the frontline. " O'Reilly Media, Inc.".
Wolfinbarger, M. and Gilly, M.C., (2001). Shopping online for freedom, control, and fun. California Management Review, 43(2), pp.34-55.