Summary Statistics and Frequency Distribution
The objective of this memorandum is to respond to the queries conveyed to me with regards to the CO2 emissions data that was provided. The given data that has been used to answer the various queries and draw useful insights essentially consists of relevant information of 1082 vehicles which were tested in Canada for emissions in 2015. While there is information regarding a host of variables, but arguably the most critical is the CO2 emissions measured in g/km. Also, information about the type of vehicle particularly with regards to the size of engine , number of cylinders used along with the type of fuel used is also presented in the provided dataset. This information has immense utility as it has enabled the exploration of the relationship between emissions and various attributes of the underlying vehicles, fuel used, type of engine in terms of cylinders along with the size. Further, this analysis could be highly useful for policy formulation which would essentially focus on curbing the vehicles which are responsible for the highest level of pollution. The critical observations from the analysis of the data in relation to the queries posed are shared below.
1. The attached excel sheet contains the summary statistics in relation to CO2 emissions with focus being on measures of central tendency along with measures of dispersion. Besides, using the data presented on emissions, a frequency distribution has been derived based on which a histogram has been drawn for graphical illustration of observations.
The above analysis firstly indicate the non-normal distribution of emissions data which has a positive skew thereby indicating presence of certain vehicles which have a very high CO2 emissions. This is also confirmed by the fact that the mean emissions level (244.69 g/km) is higher than the corresponding median emissions (239 g/km). It makes sense to target these vehicles which have higher emissions level. Further, the dispersion of emissions level tends to remain low to moderate only which implies that there are not very many vehicles that tend to produce very low emissions level.
2. For identifying any possible association, it is advisable that a scatter plot between the variables of interest should be drawn so as to recognize the visual pattern. Further, the correlation coefficient must also be computed for further identification. A relevant plot has been obtained in the attached excel file.
The scatter plot obtained reflects that association between the variables of interest i.e. fuel type and emission level of CO2 is quite weak as no clear pattern seems to be decipherable based on the obtained plots. This observation is also supported by the low value of correlation coefficient (0.167) obtained for the two variables. The relationship between the variables while weak tends to be positive and thus for higher numbered fuel types higher emissions may be expected. While regular petrol is numbered lowest as 1, other varieties such as premium petrol, diesel and ethanol are labeled as 2, 3 and 4 respectively.
Association between CO2 Emissions and Fuel Type
It is common knowledge that diesel is a highly polluting fuel and thus higher CO2 emissions are expected. However, it is expected that ethanol would result in lower emissions but the relationship hints towards ethanol based vehicles having the highest emissions level. Also, it is interesting to note that out of all the available fuel choices, it is the vehicles that are run on normal petrol that lead to the lowest level of CO2 emissions. These results must be viewed with caution as these may be the result of other factors rather than fuel type specially considering the low value of correlation coefficient which hints at other factors involved in the play which might be leading to a contradictory observation.
3. In order to comment on the average CO2 emissions on the population level based on the available sample data, confidence interval is a suitable technique which would provide reasonable estimates. The division of the various vehicles has been done as per the cylinder count available in the engine.
Using excel as a key enabling tool, output for computation of the requisite confidence interval to determine the average CO2 emissions has been carried out for the engine containing 4,6 and 8 cylinders as the expected emissions level in each cases comes out to be different. The output hints that there is a 95% likelihood of a engine consisting of 4 cylinders having an estimated mean CO2 emissions level lying in the range of 198.38 g/km and 203.46 g/km. Further, there is a 95% likelihood of a engine consisting of 6 cylinders having an estimated mean CO2 emissions level lying in the range of 255.39 g/km and 260.97 g/km. Also, there is a 95% likelihood of a engine consisting of 8 cylinders having an estimated mean CO2 emissions level lying in the range of 316.39 g/km and 328.23 g/km
The 95% confidence intervals calculated using excel as an enabling tool clearly indicate that there is a significant difference in the mean CO2 emissions which are witnessed for vehicles with engines having different number of cylinders. In this regards, a general rule of thumb that is visible from computations is that the engines which run on more number of cylinders tend to emit higher CO2 and hence focus should be on such vehicles in order to address global warming and pollution in general.
The computation for the confidence interval based on usage of proportion can be carried out using the formula indicated below.
Confidence Intervals for Mean and Proportion
Using the above formula, in excel requisite computations have been performed to ascertain the 95% confidence interval of the various vehicles having different vehicles that tend to run engines with different number of cylinders. With a certainty of 95%, it may be concluded that the proportion of vehicles with an engine containing 4 cylinders would fall between 42.23% and 48.16%. Also, it can also be concluded with 95% certainty that the proportion of vehicles with an engine containing 6 cylinders would fall between 32.64% and 38.34%. Besides, it would also be fair to claim with 95% certainty that the proportion of vehicles with an engine containing 8 cylinders would fall between 16.96% and 21.67%. The relevant computations for the above results have been presented in the excel sheet attached. It may be seen from the confidence intervals computed above that it is highly likely that the majority of the vehicles would have 4 cylinder engine while the incidence of vehicles with engine containing 8 cylinders is comparatively lesser. This works well for reducing the overall emissions level as the engines based on higher cylinders tend to emit more CO2 emissions as has been highlighted above.
4. In accordance with a claim, there are atleast 5% vehicles that exist on the road whose emissions level in terms of CO2 would be in excess of 350g/km. In order to determine if this claim is indeed true, the requisite technique to be deployed would be hypothesis testing based on the given sample data so as to derive information about the population.
For testing the above claim, two hypothesis have been formed. The null hypothesis tends to negate the claim made while the alternative hypothesis tends to endorse and support the claim highlighted above. Using computations summarized in attached excel sheet, the above hypothesis has been tested and based on this, it has been found that the available evidence was not found sufficient to reject the null hypothesis and thereby alternative hypothesis could not be accepted. As a result, it would be appropriate to conclude that CO2 emissions level greater than 350 g/km is possessed by proportion of vehicles lesser than 0.05 or 5%. Thus, it can be concluded that the claim regarding CO2 emissions level is incorrect and thus, the government must fix a lower limit which covers at least 5% vehicles.
5. The linear regression aims to highlight the causal link that tends to exist between engine size and CO2 emissions level. Using the available sample data, computations have been done in attached excel and the equation of the model obtained is summarized below.
Hypothesis Testing for CO2 Emissions of Vehicles
Emissions of CO2 (g/km) = 130.319 + 36.594*Engine Size
Further, a key parameter associated with the above output is coefficient of determination (R2). The value for the same has come out to be 0.7017. This represents that alternations in the engine size are capable of offering explanation to 70.17% of all changes observed in the CO2 emissions. This clearly highlights the fact that slope of the regression line is significant and cannot be ignored. Besides, the slope value of 36.594 implies that as the engine size tends to increase by 1 liter, the corresponding emissions levels would witness an increase of 36.59 g/km. This implies that higher engine sized vehicles would add more to the CO2 emissions than those with lower engine size.
- It would be appropriate to use the regression model explained above for the computation of emissions of a engine with a size of 1000cc or 1 liter. This may be explained on account of the value of the independent variable that have been utilized for obtained the given regression model. With regards to the engine size, input values above and below 1 liter have been considered, hence the derived regression model can be used to estimate the CO2 emissions expected from an engine with a size of 1 liter. Had it been that either all the values used for obtaining the regression model had been either all lower or all higher than 1liter, then potentially there could have been issues.
6. The formula for determining the minimum sample sie is highlighted below.
Using the above formula as indicated in the attached excel file, the sample size that is atleast required for the given estimation of vehicle proportion stands at 152. Clearly this value is significantly lower to the current sample size being used.
The formula for determining the minimum sample sie is highlighted below.
Using the above formula as indicated in the attached excel file, the sample size that is atleast required for the given estimation of fuel consumption stands at 129. Clearly this value is significantly lower to the current sample size being used.
It may be concluded that the emissions of CO2 observed from the sample data provided indicate the presence of certain vehicles having a very high emissions level thus causing mean distortion. Also, the association between the type of fuel used and the CO2 emissions level were represented through the scatter plot and also the correlation coefficient but found to be weakly related with a positive relationship. Also, it emphasized that alternative fuels may emit more CO2 emissions than traditional fossil fuels but this needs to be probed further in future studies. Also, vehicles with higher cylinder count typically tend to have higher emission levels but it is a relief that the vehicle fleet is dominated by engines containing lesser cylinders and the higher cylinder engines are comparatively lesser only.
Hence, going forward it would be required that the government tends to introduce a policy whereby various levies may be charged from the owner of vehicles having higher cylinders on account of these causing more emissions and leading to arming of environment. These are specially required for the vehicles which have 8 engine cylinder. Further, as the claim of fixing limit at 350g/km did not find favor and could not be established as per the testing of hypothesis, hence suitable amendments to the proposed policy must be introduced so as to ensure that the emission level is lower than the current level of 350g/km so as to maximize coverage. Also, CO2 emissions level tend to depend on the underlying engine size and since larger engine sizes have larger emissions, there is a case to introduce suitable policy measures to promote the purchase of lower engine sizes. Besides, the current sample consisting of 1082 values seems quite large and lower sample size corresponding to only 152 observations may be used.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2022). An Analysis Of CO2 Emissions Data Of Vehicles In Canada In 2015. Retrieved from https://myassignmenthelp.com/free-samples/econ1008-data-analytics-i/distribution-of-emissions-data-file-A8B91C.html.
"An Analysis Of CO2 Emissions Data Of Vehicles In Canada In 2015." My Assignment Help, 2022, https://myassignmenthelp.com/free-samples/econ1008-data-analytics-i/distribution-of-emissions-data-file-A8B91C.html.
My Assignment Help (2022) An Analysis Of CO2 Emissions Data Of Vehicles In Canada In 2015 [Online]. Available from: https://myassignmenthelp.com/free-samples/econ1008-data-analytics-i/distribution-of-emissions-data-file-A8B91C.html
[Accessed 27 February 2024].
My Assignment Help. 'An Analysis Of CO2 Emissions Data Of Vehicles In Canada In 2015' (My Assignment Help, 2022) <https://myassignmenthelp.com/free-samples/econ1008-data-analytics-i/distribution-of-emissions-data-file-A8B91C.html> accessed 27 February 2024.
My Assignment Help. An Analysis Of CO2 Emissions Data Of Vehicles In Canada In 2015 [Internet]. My Assignment Help. 2022 [cited 27 February 2024]. Available from: https://myassignmenthelp.com/free-samples/econ1008-data-analytics-i/distribution-of-emissions-data-file-A8B91C.html.