Dataset 1
Prepare a report in a document file (.doc or .docx) which includes all relevant tables and figures, using the following structure:
1.Section 1: Introduction
a.Give a brief introduction about the assignment and search related article and write a paragraph of summary which supports your assignment. You need to give the full citation of the article.
b.Dataset 1: Give a short description about this dataset. Is this primary or secondary data? What are types of variables involved? Explain briefly what are the possible cases used in this study.
c.Dataset 2: Explain how you collect the data and discuss its limitation (e.g. whether your sample is biased). Is this primary or secondary data? What is/are the type(s) of variable(s) involved? Give a description of cases you consider for this data set.
2.Section 2: Analysis of single variable in Dataset 1
a.To answer research question “Which type of public transport was most used by the
NSW people during 8th to 14th of August 2016?”, provide a suitable numerical summary and graphical display for the variables mode of Dataset 1. Give a detailed comment to answer the research question.
b.Now to answer research question “Are there more than 50% of public transport users in NSW use the particular mode of transport found in Part a?” setup an appropriate hypotheses, perform hypotheses test and answer the research question by writing the conclusion of the test.
3.Section 3: Analysis of two variables in Dataset 1
NSW Government need to decide on whether they have to build an underground Railway line from either Parramatta, Bankstown or Gosford to central. To prepare a recommendation for this;
a.Give a numerical summary and an appropriate graphical display for the variables location, by only considering those three stations; and the variable count by considering the data with trains only.
b.Perform a suitable hypothesis test at a 5% level of significance to test whether there is difference between mean counts of taps on and off.
c.Use the conclusion of the test in part b and the outputs in part a to write a recommendation to NSW government.
4.Section 4: Collect and analysis Dataset2
You are interested in finding whether there is a difference in preference between different gender in terms of their transport mode (Bus, Train, Ferry and Light Rail). by considering appropriate number of cases and variable, give a proper graphical display and use it to write a comments.
Section 5: Discussion & Conclusion
Write an executive summary by combining all your findings in the previous sections which must be a valuable recommendation for NSW Transport. Give a suggestion for further research
The aim of this assignment is to test skills of collecting and analyzing data to answer a specific business problem. The assignment also seeks to present an opportunity to apply the theories learnt during the course such as finding numerical summaries, displaying with appropriate graphs and using statistical inferences to solve business problems, including constructing hypotheses, test them and interpreting the findings (Ryabko, Stognienko, & Shokin, 2004).
We are presented with data for NSW transport system in order to come up with decision based recommendations that aims at improving public transport system. The project presents a series of research questions which need to be answered based on the knowledge gained in the course of the study.
- Dataset 1:
The first dataset (dataset 1) is a secondary data provided by NSW transport system. The data has a total of 1000 observations with six variables. The description of the variables is given below;
Table 1: Description of the variables
Variable |
Description |
Values |
Variable Type |
mode |
Type of the public transport |
Bus, Train, Ferry and Light Rail |
Nominal Variable (qualitative) |
date |
Date of the tap on/off held |
Date/month/year |
Nominal Variable (qualitative) |
tap |
It is a tap on or off |
On and Off |
Nominal Variable (qualitative) |
loc |
Locations of stops. For bus postcodes and others name of the stations |
Postcodes and names of the stations |
Nominal Variable (qualitative) |
count |
Total number tap on or off on the certain location and the certain date |
Number |
Scale variable (quantitative) |
The possible cases used in this study are 1000 cases (number of observations).
- Dataset 2:
The second dataset (dataset 2) is a primary data provided that was collected by the researcher. A random sample of 50 individuals was selected and the persons interviewed in regard to their gender, age and the mode of transport they prefer to use most. The data has a total of 50 observations with three variables. Give a description of cases you consider for this data set.
For the dataset 2, a random sampling was employed to collect the data from individuals so as to understand the mode of transport they frequently use. This is a primary data since the data is collected directly from the subjects. The limitation of this data is the fact that only a small sample size of 50 cases was selected. The description of the variables is given below;
Table 2: Description of the variables
Variable |
Description |
Values |
Variable Type |
Mode |
Type of the public transport |
Bus, Train, Ferry and Light Rail |
Nominal Variable (qualitative) |
Age |
Date of the tap on/off held |
Number |
Scale variable (quantitative) |
Gender |
Gender of the respondent |
Male and female |
Nominal Variable (qualitative) |
- Section 2: Analysis of single variable in Dataset 1
In this section, we attempt to answer the research questions posed. To answer the research questions, we use dataset 1.
- Which type of public transport was most used by the NSW people during 8thto 14th of August 2016?
To answer this research question, we ran a frequency distribution test. Table 1 below gives the results.
Table 3: Frequency table for the mode of transport used
Row Labels |
Count of mode |
Percent |
Bus |
467 |
46.7% |
Ferry |
25 |
2.5% |
Light-rail |
24 |
2.4% |
Train |
484 |
48.4% |
Grand Total |
1000 |
100.0% |
As can be seen, the top most used modes were use of bus and train. Train however came out as the most frequently used with 48.4% (n = 484) of the participants having used it in the last 1 week. The second most commonly used mode was the bus with 46.7% (n = 467) having used it in the last one week. Ferry and Light-rail were among the least used with only 2.4% (n = 24) having used light-rail in the last one week and 2.5% (n = 25) said to have used ferry in the last one week.
Dataset 2
Figure 1: Bar chart on mode of transport used
- Now to answer research question “whether the proportion of those using train is greater than 50%, the setup for an appropriate hypotheses is given below.
To answer the given research question, the following hypothesis was tested.
H0: The proportion of transport users who use train is not significantly different from 50%.
HA: The proportion of transport users who use train is significantly different from 50%.
To test this, a One-Sample t-test was used and it was tested at 5% level of significance. The results are given below;
Table 4: One-Sample Statistics
N |
Mean |
Std. Deviation |
Std. Error Mean |
|
Train |
1000 |
.4840 |
.49999 |
.01581 |
Table 5: One-Sample Test
Test Value = 0.5 |
||||||
t |
df |
Sig. (2-tailed) |
Mean Difference |
95% Confidence Interval of the Difference |
||
Lower |
Upper |
|||||
Train |
-1.012 |
999 |
.312 |
-.01600 |
-.0470 |
.0150 |
A one-sample t-test was run to determine whether the proportion of NSW transport users who rely on train as the mode of transport is more than 50%. The proportion of those who used train transport (0.484 ± 0.5) was not significantly different from 50% (95% CI, -0.05 to 0.02), t(999) = -1.012, p = .312.
- Section 3: Analysis of two variables in Dataset 1
NSW Government need to decide on whether they have to build an underground Railway line from either Parramatta, Bankstown or Gosford to central. To prepare a recommendation for this;
- Give a numerical summary and an appropriate graphical display for the variables location, by only considering those three stations; and the variablecount by considering the data with trains only.
In this section we first consider the number times the train left the three mentioned locations. This information is given in the table below;
Table 6: Frequency of train from the three locations
|
Count |
Percent |
Parramatta Station |
7 |
53.8% |
Gosford Station |
2 |
15.4% |
Bankstown Station |
4 |
30.8% |
Figure 2: Bar chart for the count of times the train leaves the stations
Considering the data with trains only, it was established that the average number of counts was 103.38 with the standard deviation of the counts being 226.14
Table 7: Descriptive statistics for the variable count
count |
|
Mean |
103.379 |
Standard Error |
7.151282 |
Median |
53 |
Mode |
18 |
Standard Deviation |
226.1434 |
Sample Variance |
51140.84 |
Kurtosis |
238.9731 |
Skewness |
13.04214 |
Range |
4955 |
Minimum |
18 |
Maximum |
4973 |
Sum |
103379 |
Count |
1000 |
The mode of counts was found to be 18 with the median count being 53. The skewness value indicated that the data is highly and heavily skewed. This is evident from the fact that the minimum count was 18 while the maximum count was 4973. This presents a very huge range which suggests a probable presence of outliers in the dataset hence bringing about the skewness observed.
The histogram presented below further shows that the data is skewed. The shape of the histogram indicates that the data is skewed to the right (longer tail to the right).
Figure 3: Histogram of the variable count
- Perform a suitable hypothesis test at a 5% level of significance to test whether there is difference between mean counts of taps on and off.
To answer this, the following the hypothesis was tested at 5% level of significance.
H0: There is no significant difference in the mean counts of taps on and taps off
HA: There is significant difference in the mean counts of taps on and taps off.
To test this, an independent samples t-test was used. The results are given below;
Analysis of Single Variable in Dataset 1
Table 8: Group Statistics
Tap |
N |
Mean |
Std. Deviation |
Std. Error Mean |
|
count |
On |
481 |
106.65 |
269.081 |
12.269 |
Off |
519 |
100.35 |
177.530 |
7.793 |
Table 9: Independent Samples Test
Levene's Test for Equality of Variances |
t-test for Equality of Means |
|||||||||
F |
Sig. |
t |
df |
Sig. (2-tailed) |
Mean Difference |
Std. Error Difference |
95% Confidence Interval of the Difference |
|||
Lower |
Upper |
|||||||||
count |
Equal variances assumed |
.083 |
.774 |
.440 |
998 |
.660 |
6.296 |
14.319 |
-21.802 |
34.394 |
Equal variances not assumed |
.433 |
821.5 |
.665 |
6.296 |
14.535 |
-22.233 |
34.825 |
We performed an independent t-test was in order to compare the average number of counts for the taps on and the taps off. Results showed that the average number of counts for the taps on (M = 106.65, SD = 269.08, N = 481) did not significantly differ with the average number of counts for the taps off (M = 100.35, SD = 177.53, N = 519), t (998) = 0.440, p > .05, two-tailed. The mean difference of 6.30 observed was insignificant at 5% level of significance. Essentially the results indicate that whether the taps are on or off does not really affect the number of counts.
- Use the conclusion of the test in part b and the outputs in part a to write a recommendation to NSW government.
We concluded that there is no significant difference in the average number of counts for the taps off and taps on. The chosen three stations also did not show much traffic. It is therefore recommended that the government’s plan to build an underground Railway line from either Parramatta, Bankstown or Gosford to central is not as ideal as would be required.
- Section 4: Collect and analysis Dataset2
You are interested in finding whether there is a difference in preference between different gender in terms of their transport mode (Bus, Train, Ferry and Light Rail). By considering appropriate number of cases and variable, give a proper graphical display and use it to write a comments.
The results for this section are presented below;
Count of Gender |
Column Labels |
|
|
Row Labels |
Female |
Male |
Grand Total |
Bus |
16.7% |
42.3% |
30.0% |
Ferry |
20.8% |
7.7% |
14.0% |
Light Rail |
8.3% |
11.5% |
10.0% |
Train |
54.2% |
38.5% |
46.0% |
Grand Total |
100.00% |
100.00% |
100.00% |
As can be seen, most of the male commuters (42.3%, n = 11) said to use bus while most of the female commuters (54.2%, n = 13) said to use train.
Chi-Square test
A Chi-square test was performed to determine whether there is significant association between gender and the preferred mode of transport (Bagdonavicius & Nikulin, 2011). The hypothesis tested is given below;
H0: There is no significant association between gender and preferred mode of transport
HA: There is significant association between gender and preferred mode of transport
This was tested at 5% level of significance and the results are given below;
Table 10: Chi-Square Tests
Value |
df |
Asymp. Sig. (2-sided) |
|
Pearson Chi-Square |
5.072a |
3 |
.167 |
Likelihood Ratio |
5.239 |
3 |
.155 |
N of Valid Cases |
50 |
||
a. 4 cells (50.0%) have expected count less than 5. The minimum expected count is 2.40. |
The p-value for the test is 0.167 (a value greater than 5% level of significance), we therefore fail to reject the null hypothesis and conclude that there is no evidence that there is significant association between gender and preferred mode of transport.
Section 5: Discussion & Conclusion
The main purpose of this study was to present analysis of NSW transport system. We were provided with a secondary dataset (dataset 1) that comprised of 1000 cases with six variables. Apart from the provided secondary data on NSW transport system, we also gathered survey on 50 individuals. We sought to fight out the most commonly used mode of transport among the individuals. Results showed that the most commonly used mode of transport was train followed by bus though people used ferry and light rails, their usage was very minimal as compared to the use of bus and train. In regard to the comparison of the mode of transport in terms of the males and the females using dataset 2, we noted that majority of female respondents preferred to use the train while most of the male commuters preferred using bus as the mode of transport. In regard to the findings we would like to make the following recommendations to NSW government;
- The use of train is very common among the many commuters; it would therefore prudent to improve on this particular mode of transport to make more and more effective. The building of an underground Railway line from either Parramatta, Bankstown or Gosford to central would indeed be a blessing to the commuters.
Future research should be broad enough to even understand the motivation behind the preference for the various mode of transports. This would help the management and the government to fully understand the needs and the desires of the people.
References
Bagdonavicius, V., & Nikulin, M. S. (2011). Chi-squared goodness-of-fit test for right censored data. The International Journal of Applied Mathematics and Statistics, 30–50.
Ryabko, B. Y., Stognienko, V. S., & Shokin, Y. I. (2004). A new test for randomness and its application to some cryptographic problems. Journal of Statistical Planning and Inference, 123, 365–376. doi:10.1016/s0378-3758(03)00149-6
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2021). Analyzing NSW Transport Data: A Report On Public Transport Preferences And Recommendations For NSW Government. Retrieved from https://myassignmenthelp.com/free-samples/bus708-statistics-and-data-analysis/journal-of-statistical-planning-and-inference.html.
"Analyzing NSW Transport Data: A Report On Public Transport Preferences And Recommendations For NSW Government." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/bus708-statistics-and-data-analysis/journal-of-statistical-planning-and-inference.html.
My Assignment Help (2021) Analyzing NSW Transport Data: A Report On Public Transport Preferences And Recommendations For NSW Government [Online]. Available from: https://myassignmenthelp.com/free-samples/bus708-statistics-and-data-analysis/journal-of-statistical-planning-and-inference.html
[Accessed 14 November 2024].
My Assignment Help. 'Analyzing NSW Transport Data: A Report On Public Transport Preferences And Recommendations For NSW Government' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/bus708-statistics-and-data-analysis/journal-of-statistical-planning-and-inference.html> accessed 14 November 2024.
My Assignment Help. Analyzing NSW Transport Data: A Report On Public Transport Preferences And Recommendations For NSW Government [Internet]. My Assignment Help. 2021 [cited 14 November 2024]. Available from: https://myassignmenthelp.com/free-samples/bus708-statistics-and-data-analysis/journal-of-statistical-planning-and-inference.html.