CallCentre data analysis essay: Stats - Regression

Descriptive Statistics

Research Report

Please use the dataset CallCentre.dta and associated information file CC_DEFINITIONS_.XLSX to answer these questions. Use the software program STATA 15 available through RMIT MyDesktop for all data analysis. This is a group assignment where you can work alone or with up to three other students (a maximum group size of four). All group members will receive the same marks for the assignment. You must submit an electronic copy of your assignment in Canvas in pdf, doc or docx format. Hard copies will not be accepted. Show your tables and calculations as well as answering the questions in full sentences. You should write no more than 1000 words (not including tables/calculations) in total for this assignment. The number of words, tables, graphs, calculations given in parentheses after each question are a guide.

1.Calculate descriptive statistics using the ‘summarize’ command for the variables net_promoter_score, total_silence, total_silence_weighted, agent_to_cust_index and agent_crosstalk_weighted and present the results in a table. Comment on what we learn about these variables from the descriptives. Graph a scatter plot of net_promoter_score against agent_crosstalk_weighted and describe the relationship between these two variables.

2.Estimate a multiple linear regression with net_promoter_score as the dependent variable and total_silence_weighted, agent_to_cust_index and agent_crosstalk_weighted as the explanatory (independent) variables. Predict the change in net_promoter_score associated with a 0.1 increase in total_silence_weighted and a 0.01 increase in agent_crosstalk_weighted. Assuming this is the correct model specification, are we sure that total_silence_weighted has a negative effect? [Hint: consider the t-statistic and p-value]

3.Add dummy variables to the regression to control for all of the potential effects of State and Package. Make sure the base category is customers with the “HOSPITAL AND EXTRAS” package in NSW. Carefully interpret the estimated coefficient on the package1 dummy variable you have included. Why is this NOT a very important result?

4.Include a quadratic specification of the variable “sentiment_score_cust” in the model along with the existing explanatory variables. Calculate and interpret the marginal effect of a 1 point change in “sentiment_score_cust” when sentiment_score_cust = 1 and when sentiment_score_cust=4.

5.Explain the conditional mean independence assumption and assess its relevance with respect to the explanatory variable “sentiment_score_cust”.

6.Explore the data with descriptive statistics and/or preliminary regressions, then design a regression model to best predict the binary outcome variable nps_group_3. Choose the explanatory variables to include, and whether to include them as dummies/ logs/ polynomials/ interactions as you feel appropriate. Present the results of the descriptive statistics and your final regression model in tables. Discuss the statistical significance of the explanatory variables in your model. Discuss how you have designed your model with reference to the “Gauss Markov” assumptions and whether these assumptions are likely to be met. Interpret the results of THREE of your explanatory variables, which you consider to be the key driversof nps_group_3 (ie being a promoter). Do NOT include the variables net_promoter_score or sentiment_score_cust in your model.

7.Write an executive summaryof the findings in questions 2 to 6 on what variables are likely and are not likely to be important drivers of net promoter score

Descriptive Statistics

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

net_promot~e | 1945 8.567095 1.970395 0 10

total_sile~e | 1945 43.89775 72.24571 0 518.5

total_sile~d | 1945 .0985188 .1252939 0 .665

agent_to_c~x | 1945 2.061445 1.50401 .142 14.674

agent_cros~d | 1944 .0195041 .0140015 0 .092

Table 1: Descriptive Statistics

(Source: Created using STATA)

The descriptive statistics would help the researcher to develop a proper and in detail understanding of the variables considered in the study. The table above depicts that net_promoter_score has a mean of 8.56, the range of the variable is 1 to 10. Hence it can be stated that the data is centralized towards a higher value and it is quite likely that the customers will recommend ABC to the friends and colleagues. The standard deviation of the variable is 1.97 which signifies that the data are centralized towards the mean and there is lower dispersion. The mean of total_silence has been observed to be 43.89 which signifies that in maximum number of calls the duration if silence is 43.89 seconds. The standard deviation is also high that means the data is dispersed from mean, which is why it has a minimum value of 0 and the maximum value of 518.5. On the other hand, the agent-to-customer-index as well as the agent-customer ratio has also depicted significant mean value which means that the data is appropriately distributed and the value of the standard deviation also signifies that the variables are not widely spread as well.

Figure 1: Two Way Scatter Plot

(Source: Created using STATA)

The two way scatter plot signifies that the as the agent crosstalk weighted increases the met promoter score increases. Hence there is a positive relationship between these two attributes.

Source | SS df MS Number of obs = 1944

-------------+------------------------------ F( 3, 1940) = 2.07

Model | 24.0485783 3 8.01619278 Prob > F = 0.1026

Residual | 7523.12375 1940 3.87789884 R-squared = 0.0032

-------------+------------------------------ Adj R-squared = 0.0016

Total | 7547.17233 1943 3.88428838 Root MSE = 1.9692

------------------------------------------------------------------------------------------

net_promoter_score | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------------------+----------------------------------------------------------------

total_silence_weighted | -.058195 .3722264 -0.16 0.876 -.7882008 .6718109

agent_to_cust_index | -.0089455 .030321 -0.30 0.768 -.0684107 .0505197

agent_crosstalk_weighted | 7.556478 3.389054 2.23 0.026 .9099085 14.20305

_cons | 8.444175 .1226135 68.87 0.000 8.203707 8.684643

------------------------------------------------------------------------------------------

Table 2: Multiple Linear Regression

(Source: Created using STATA)

The regression analysis depicts the fact that if there is a 10 per cent increase in the total silence weighted that would reduce the net promoter score by 5.8 per cent as the coefficient is negative. On the other hand, a 1 percent increase in the agent crosstalk weighted would in turn increase the net promoter score by 755%.

Linear Regression Analysis

The total silence weighted cannot be regarded as it is possessing a negative impact on the net promoter index because the probability is larger than 5% at the 95% confidence interval. This leads to the rejection of the alternative hypothesis and acceptance of the null hypothesis which is there is not a negative relationship between total silence weighted and net promoter index.

Source | SS df MS Number of obs = 1944

-------------+------------------------------ F( 4, 1939) = 2.50

Model | 38.7896588 4 9.69741471 Prob > F = 0.0405

Residual | 7508.38267 1939 3.87229637 R-squared = 0.0051

-------------+------------------------------ Adj R-squared = 0.0031

Total | 7547.17233 1943 3.88428838 Root MSE = 1.9678

------------------------------------------------------------------------------------------

net_promoter_score | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------------------+----------------------------------------------------------------

package1 | .6849752 .3510706 1.95 0.051 -.0035403 1.373491

total_silence_weighted | -.0829211 .3721733 -0.22 0.824 -.8128229 .6469807

agent_to_cust_index | -.008715 .0302993 -0.29 0.774 -.0681376 .0507077

agent_crosstalk_weighted | 7.645688 3.386913 2.26 0.024 1.003313 14.28806

_cons | 8.43312 .1226558 68.75 0.000 8.192569 8.673671

------------------------------------------------------------------------------------------

Table 3: Multiple Linear Regression

(Source: Created using STATA)

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

package1 | 1945 .0164524 .1272403 0 1

Table 4: Descriptive Statistics

(Source: Created using STATA)

The descriptive statistics of the Package 1 variable strictly depicts that the observations are not appropriately inclined towards the mean value and the standard deviation also depicts that there is not such variations in the dataset. Finally the results of the regression signifies that the probability is higher than 5% at the 95% confidence interval which strictly signifies that the result is irrelevant.

Source | SS df MS Number of obs = 1944

-------------+------------------------------ F( 4, 1939) = 4.66

Model | 71.792336 4 17.948084 Prob > F = 0.0010

Residual | 7475.37999 1939 3.85527591 R-squared = 0.0095

-------------+------------------------------ Adj R-squared = 0.0075

Total | 7547.17233 1943 3.88428838 Root MSE = 1.9635

------------------------------------------------------------------------------------------

net_promoter_score | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------------------+----------------------------------------------------------------

sentiment_score_custsq | .0151579 .0043073 3.52 0.000 .0067104 .0236054

total_silence_weighted | .0427023 .3722449 0.11 0.909 -.6873401 .7727446

agent_to_cust_index | -.0127483 .0302517 -0.42 0.674 -.0720777 .0465811

agent_crosstalk_weighted | 7.172991 3.38091 2.12 0.034 .5423899 13.80359

_cons | 8.295593 .1293408 64.14 0.000 8.041931 8.549255

Table 5: Multiple Linear Regression

(Source: Created using STATA)

The results of the multiple linear regression signifies that the quadratic sentiment score is a significant variable and its marginal effect due to one point change will be same irrespective of its value. In each of the cases it would lead to a positive change of 1.5% change in the net promoter score.

Predictive Modelling

5.The conditional mean independence assumption can be projected with the help of the following expression,

E (U|X, Z) = E (U|Z)

The conditional mean of the sentiment score customer can be influenced by several other factors which in turn may violate the assumption of conditional mean independence. Such as the customer’s experience may in turn affect the sentiment score customer positively while on the other hand, the general attitude of the customers towards the call center conversations may seem to be directly correlated with the sentiment score customers. If the customers already possess a positive attitude towards the call center calls it would be easier to convince them.

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

total_sile~d | 1945 .0985188 .1252939 0 .665

call_durat~n | 1945 402.0766 262.4619 80 1605

agent_to_c~x | 1945 2.061445 1.50401 .142 14.674

Table 6: Descriptive Statistics

(Source: Created using SPSS)

Source | SS df MS Number of obs = 1945

-------------+------------------------------ F( 3, 1941) = 1.40

Model | .960443967 3 .320147989 Prob > F = 0.2403

Residual | 443.087885 1941 .228278148 R-squared = 0.0022

-------------+------------------------------ Adj R-squared = 0.0006

Total | 444.048329 1944 .228419922 Root MSE = .47778

----------------------------------------------------------------------------------------

nps_group3 | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-----------------------+----------------------------------------------------------------

total_silence_weighted | -.1100792 .0873117 -1.26 0.208 -.2813137 .0611553

call_duration | -.0000583 .000042 -1.39 0.165 -.0001406 .000024

agent_to_cust_index | .0036755 .0072638 0.51 0.613 -.0105702 .0179211

_cons | .6740259 .0246866 27.30 0.000 .6256109 .7224409

Table 7: Regression Analysis

(Source: Created using STATA)

The model has been designed as considering the nps group 3 that is being promoter is the dependent variable keeping into consideration the independent variables are the total silence weighted, call duration as well as the agent to customer index. The descriptive statistics as obtained for the independent variables depicts that the total silence weighted has a mean value of 0.0985 which is much closer towards the lower value and this may signify the fact that the representatives are able to communicate with the customers with much efficacy. The call duration is also depicting a sustainable mean value which signifies the fact that call center workers can influence the customer to be promoter. Furthermore, the agent to customer index notes a mean value which is also quite significant.

The model as generated with the help of these variables it can be stated that all the variable which are independent are observed to be significant. The total silence weighted has a negative coefficient and at the 5% confidence interval the value is also significant. Which signifies that the null hypothesis will be rejected and the alternative hypothesis would be accepted. This means that the total silence weighted it increases that would reduce the chance of a customer for being a promoter.

In the context of call duration it can be stated that the probability is also significant in the context of the 95% confidence interval and it therefore is negatively related with the nps group 3 variable. Similar is the case in the context of agent to customer index. Here also the probability is significant however, this variable is found to be positively related with the nps group 3. Therefore it can be stated that in relation to the t test the model can be perfectly explained with the help of the selected variables.

7.The assignment will focus on analyzing the data on call center. The data mainly comprises of a number of variables though the assignment has focused and has shed light on only some of them. Primarily it has considered the summary statistics of the variables like net promoter score, total silence, total silence weighted, agent to customer index and agent crosstalk weighted. The results of the descriptive statistics strictly signified that all of them are distributed quite well and do not project significant variations. Furthermore a multiple linear regression has also been performed while the net promoter score has been considered as the dependent variables which can be explained with the help of total silence weighted, agent to customer index and agent cross talk. In the context of this regression analysis it has been observed that net promoter score is negatively related with the net promoter score as the coefficient is negative. However, it has also been observed that in this model this variable is not significant as the value of probability is more than 5% at the 95% confidence interval and hence the null hypothesis have been accepted. Similar is the case with the agent crosstalk weighted variable. Furthermore the assignment has also investigated a suitable model which could explain the nps group 3 that is being a promoter with the help of three most significant independent variables. On an added notion it has also added a dummy variable and tested its significance in the regression on the basis of the results of the t test.

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2020). Research Report: Descriptive Statistics, Linear Regression, And Predictive Modelling In CallCentre Essay.. Retrieved from https://myassignmenthelp.com/free-samples/econ1066-basic-econometrics/the-descriptive-statistics.html.

"Research Report: Descriptive Statistics, Linear Regression, And Predictive Modelling In CallCentre Essay.." My Assignment Help, 2020, https://myassignmenthelp.com/free-samples/econ1066-basic-econometrics/the-descriptive-statistics.html.

My Assignment Help (2020) Research Report: Descriptive Statistics, Linear Regression, And Predictive Modelling In CallCentre Essay. [Online]. Available from: https://myassignmenthelp.com/free-samples/econ1066-basic-econometrics/the-descriptive-statistics.html
[Accessed 27 July 2024].

My Assignment Help. 'Research Report: Descriptive Statistics, Linear Regression, And Predictive Modelling In CallCentre Essay.' (My Assignment Help, 2020) <https://myassignmenthelp.com/free-samples/econ1066-basic-econometrics/the-descriptive-statistics.html> accessed 27 July 2024.

My Assignment Help. Research Report: Descriptive Statistics, Linear Regression, And Predictive Modelling In CallCentre Essay. [Internet]. My Assignment Help. 2020 [cited 27 July 2024]. Available from: https://myassignmenthelp.com/free-samples/econ1066-basic-econometrics/the-descriptive-statistics.html.

Get instant help from 5000+ experts for