Stats on Australia's LGA Pop Sample

Task 1: Pie chart showing transport type of LGA sample

Class Participation and Attendance (10%) During the trimester students will discuss various Statistics issues and cases, which are noted in the prescribed textbook. Active student participation in the lectures and tutorials is essential to the learning process. All students are expected to be prepared for and contribute to the lecture and tutorial discussions. To assist in your understanding of Statistics issues, you are encouraged to contribute to in-class discussions. Your participation and interaction with other students will be a major component of the class instruction. Part of your class participation will also be based upon your ability to recognise contemporary statistical issues. As this is a graduate Bachelor of Business unit, regular and active participation in lecture and tutorial discussions is expected from every student each week.

Participation in this unit will be compiled and noted for each week. Please contact your Lecturer/Tutor as soon as possible if you are unable to participate in class discussions for any length of time (e.g., more than two “working” days in a row, weekends not included). Students are expected to ask and answer questions in all discussions. Points for class participation and completion of other class assignments will be recorded. Attendance at 10 lectures and tutorials as a minimum per student, per trimester is expected. Students are expected to participate throughout the entire trimester, including during the training session presentations made by other groups. Please note: Your attendance in class is a necessary, but not sufficient, condition for good participation. (Just showing up for work, but not contributing anything to the organisation, would not generally be considered acceptable behaviour in the workplace.) Please participate, and feel free to share your experiences as long as they are relevant.

The assignment has a total of six tasks to complete. For the assignment you need to create your own data set using the data provided as LGAData.xls excel file. Presentation Your answers must be presented in task number order and be clearly labelled with the appropriate task number. Your assignment must be presented in Microsoft (MS) Word. Copy and paste any relevant Excel outputs to this document immediately before (above) any relevant written answers to each task. If you are unfamiliar with the use of the MS Word Equations Editor, you may write algebraic/ mathematical or statistical symbols and notation in neat handwritten form. Your answers must be clear. You must highlight relevant items on any required Excel outputs and make reference to them in your written answers. When asked to perform a manual calculation (i.e. the use of MS Excel is not specified) you must show all working.

Task 1: Pie chart showing occupation type of LGA sample

This must include intermediate steps where relevant. Failure to do so will result in a loss of marks. Do not include the assignment questions nor the data with your submitted assignment. Introduction The Assignment Data (LGAData.xls) file contains, LGA data (sourced from ABS) for a population of 400 LGA in Australia. You are required to select a random sample of 50 LGA from this population. Variables in the data set are as below: PN ID number ID V1 LGA name Name V2 Gender Male=1; Female=0 V3 Age Group 30-34 =1; 35-39=2; 39-44 =3; 45-50=4 V4 Age1 number of people 30-34 years old V5 Age2 number of people 35-39 years old V6 Age3 number of people 39-44 years old V7 Age4 number of people 45-50 years old V8 Income category $650-$800=0;

$801-$850=1 V9 Income1 number of people with $650-$800 weekly income V10 Income2 number of people with $801-$850 weekly income V11 Occupation type Managers=1; Professionals=2;Sales workers=3;Administration=4 V12 Occupation1 number of Managers V13 Occupation2 number of Professionals V14 Occupation3 number of Sales workers V15 Occupation4 number of Administration V16 Industry of occupation Mining=1; Agriculture=2;Manufacturing=3;Education=4 V17 Industry1 number of people in Mining V18 Industry2 number of people in Agriculture V19 Industry3 number of people in Manufacturing V20 Industry4 number of people in Education V21 Transport 0=car; 1=public transport V22 Living with children 0=with; 1=without Column A (PN), contains the LGA identification numbers from 001 to 400 LGA. 5 Selecting Random Sample and Creating your Sample Data File To select your random sample, you need to open the PopulationLGAData.

xls file and create a SampleLGAData Excel file of 50 samples (rows) which will include 50 LGA. To select these samples randomly one member of the group starts with 2 last digits of their student ID and adds 50 to it to create the data set. For example a student with ID of MIT172256 will start with row 56 to add 50 rows to it which will include rows 56-106. Then the data set for this group will include LGA 56-106. You will need to submit a SOFT copy of your sample LGAdata. You will also need to report what student ID was used for random data selection. This sample data set will form the basis of the statistical presentation and analysis tasks of the assignment. Assignment requirements Answers to the assignment tasks must be based on the sample data file that you have created. All tasks in this assignment require you to obtain an Excel output prior to performing some analysis.

Task 1: Pie chart showing age groups of LGA sample

Copy and Paste these outputs to your assignment MS- Word document immediately preceding any subsequent analysis. Charts and tables must have appropriate titles and numerical values must be rounded to an appropriate number of decimal places and accompanied by the correct units of measure. Task 1 (10 marks) Use Excel to produce a Frequency Column Chart and; a Relative Frequency Pie-Chart for your sample to show the number and proportion, respectively, of people under each age group. Use these graphical summaries to answer the following questions: (a) How many LGA in your sample indicate use of public transport? (b) Which occupation type occurs most frequently in your sample? (c) What proportion of LGA in your sample consists of people within 30-34 years old age group? Task 2 (10 marks) (a) Use Excel to sort your sample “occupation 4” data.

Use the percentile location formula; LP = (n +1) P 100 , and the three associated rules to determine: (i) The 70th percentile. (ii) The first and third quartiles. Remember to show all working! (b) Briefly explain what the 70th percentile that you have determined informs you about your sample “occupation 4” data. (c) Determine the Inter-Quartile Range of your sample “occupation 4” data and provide a brief explanation of what information this statistic provides about your sample data. 6 Task 3 (15 marks) (a) Use Excel to produce a Descriptive Statistics table for your sample “occupation 4” data. (b) Use results from Task 3 to determine manually for this data, the upper and lower inner fence limits; IFUL = Q3 + 1.5 x IQR IFLL = Q1 – 1.5 x IQR (c) Based on the limits calculated in (b), choose from the numerical summary measures provided in the Descriptive Statistics table, and/or measures calculated previously in

Task 4; (i) an appropriate measure of central tendency, and, (ii) an appropriate measure of dispersion for your sample “occupation 4”data. Remember to show all working! Provide a brief explanation of the reasoning behind your choice in both cases. (d) Write a brief report on the “occupation 4” data paying particular attention, on the mean, median, quartiles and measures of variation. Task 4 (15 marks) Remember to show all working! Failure to do so will result in the loss of marks. (a) From the Descriptive Statistics table obtained in Task 4, identity three pieces of evidence that indicate whether your sample “occupation 4” data has been obtained from a normally distributed population or not. What is your conclusion? Note: Make sure only one piece of evidence relates to the shape of the sample data. (b) Regardless of your conclusion in above, assume the “occupation 4” data is normally distributed.

Applying the Standard Normal tables, calculate how many “occupation 4” observations in your sample would expect to lie within 1.5 standard deviations of the mean (i.e. between z = –1.5 and z = +1.5). (c) Use the mean and standard deviation from the Descriptive Statistics table of Task 4 to calculate the bound for 1.5 standard deviation spread from the mean. Using the “occupation 4” sample data, manually count the number of observations fall within the bound. State whether this count matches, approximately, your answer to (b) and hence whether this result confirms (or not) your conclusion in (a). Task 5 (15 marks) (a) Use Excel to produce a Descriptive Statistics table for the “Occupation 4” variable in your sample suitable for constructing an interval estimate of the population mean “Occupation 4” Hence determine:

(i) A point estimate of the mean “Occupation 4” of the population.

(ii) A 90% confidence interval estimate of the mean “Occupation 4” of the population.

(iii) Make a brief verbal statement explaining the meaning of the confidence interval estimate obtained in

(ii) in the context of the variable in this task.

(b) If the population mean “Occupation 4”is actually 59, would you consider the interval estimate obtained in

(a), to be satisfactory? Explain why or why not. 7 Task 6 (15 marks)

(a) Use Excel to produce a Descriptive Statistics table for the $650-$800 income earners in your sample suitable for constructing an interval estimate of the population proportion of income earners. Hence determine:

(i) A point estimate of the proportion of $650-$800 income earners in the population.

(ii) A 99% confidence interval estimate of the $650-$800 income earners.

(b) Using the following formula: (sample statistic) ? (critical z or t) ? (standard error of the sample statistic) Use the Empirical Rule for a Normal distribution to determine a 95% confidence interval estimate of the $650-$800 income earners in the population.

(c) Compare, in terms of the precision, the interval manually calculated in (b) with the interval obtained from the Descriptive Statistics table in

(a). Explain why the direction of the change in precision is expected.

Task 1: Pie chart showing transport type of LGA sample

The Assignment Data (LGAData.xls) file contains, LGA data (sourced from ABS) for a population of 400 LGA in Australia. We are required to select a random sample of 50 LGA from this population.

For creating our random sample of 50 LGA from the population we take one of our ID say suppose we take ID of Kapil Yogi i.e MIT170802 . Hence the last two digits of his student I.D. is 02. Hence our dataset will contain LGA 02-51 ( As in between LGA 02-51 there are 50 samples including both ends).

The soft-copy of our sample LGAdata is named as LGA_sample.

Task 1 Figure 1:Histogram Showing type of transport of 50 sampled LGA’s

The variable “V21” gives the information about the type of transport of each LGA. Here 0 is denoted for “Car” and “1” for public transport.

Using Excel we get that 28 LGA (From Frequency Chart) in our sample indicate use of public transport.

We store the values of “V21” in B column.

[ Excel Formula: =COUNTIF(B2: B51,0) ]

Figure 2: Pie Diagram Showing type of occupation of 50 sampled LGA’s

The variable “V11” gives the information about the type of occupation of each LGA. Here, the indications are as follows : Managers=1; Professionals=2;Sales workers=3;Administration=4 .

Using Excel we get that the occupation type occurs most frequently in your sample is Administration indicated by “4”. (Also evident from the pie diagram)

We store the values of “V11” in C column.

[ Excel Formula: =MODE(C2:C51)] Figure 3: Pie Diagram Showing age group of 50 sampled LGA’s

The variable “V3” gives the information about the age Group of each LGA . Here, the indications are as follows : 30-34 =1; 35-39=2; 39-44 =3; 45-50=4

Out of 50 sampled LGA 8 LGA’s satisfy having people within 30-34 years age group.

Hence,required proportion= (From Relative frequency Pie Diagram)

We store the values of V3 in E column.

[ Excel formula: =COUNTIF(E2:E51,1) ]

Task 2

Here in our sample V15 gives the information about Occupation4 i.e number of Administration.

Task 1: Pie chart showing occupation type of LGA sample

Here we have to sort our sample “occupation 4” data.

Using Excel we have sorted the data using the following path: Home->Sort & Filter

The percentile location formula is given by, = (n + 1)

Here,P represents the percentile rank and n denotes the number of observations under consideration.

Here n=50.

Hence using the above formula , for the 70-th percentile,

P=70 so we get 35.7.

So interger portion is 35 and fractional part is 0.7. The 35^th and 36^th observations of ordered data are respectively 11 and 13.So the 70^th percentile is 11+(13-11)*0.7=12.4

For The first quartile, P=25 so we get .

So interger portion is 12 and fractional part is 0.75. The 12^th and 13^th observations of ordered data are respectively 0 and 0.So the first quartile is 0+(0-0)*0.75=0

For the third quartile, P=75 so we get .So interger portion is 38 and fractional part is 0.25. The 38^th and 39^th observations of ordered data are respectively 17 and 18.So th third quartile is 17+(18-17)*0.25=17.25

The 70th percentile that we have determined informs us that in 70% of the LGA’s the number of people with occupation “4” i.e administration is 12.4(12 if rounded off) or less (in our sample data).

Let,

Here the Inter-Quartile Range is given by, IQR== 17.25-0=17.25, it gives the spread of middle 50% of our data.

So,50% of the LGA’s have number of administration in between 0 and 17.25 .This is a measure of spread which is not influenced by extreme small or large values.

We store the values of V15 in A column.

[ Excel Formula: = QUARTILE(A2:A51,3)-QUARTILE(A2:A51,1) ]

Task 3

Using Excel we find out the following descriptive statistics table of our sample “occupation 4” data.

Mean	8.46
Standard Error	1.190167
Median	6.5
Mode	0
Standard Deviation	8.415753
Sample Variance	70.8249
Kurtosis	-0.94489
Skewness	0.666345
Range	25
Minimum	0
Maximum	25
Sum	423
Count	50

Table 1: Descriptive Statistics of Occupation 4 sample data

[Excel Path: Data->Data Analysis->Descriptive Statistics ]

From the Task 2 we get IQR=17.25 and Q3=17.25 and Q1=0.

Now,

The upper inner fence limit : IFUL = Q3 + 1.5 x IQR =17.25+1.5*17.25=43.125

The lower inner fence limit : IFLL = Q1 - 1.5 x IQR=0-1.5*17.25= -25.875

For your sample “occupation 4”data considering all the measures from previously done,

Here in the data the minimum and maximum values are 0 and 25 respectively .Hence both are within the IFLL,IFUL limits. So there is no outliers.

Here an appropriate measure of central tendency is, Median as measure of skewness is 0.666345 i.e here the data is skewed. Here Median=6.5

Here an appropriate measure of dispersion is, Interquatile range as the data is skewed. Here IQR=17.25.

Task 1: Pie chart showing age groups of LGA sample

The variable under consideration is “Number of administration”. Here the mean is 8.46,median=6.5,first quartile=0 and third quartile=17.25,Standard deviation=8.4(approx),IQR=17.25.

Here -=17.25-6.5= 10.75 . ( and =6.5-0=6.5 .

So the data is positively skewed i.e its longer tail is towards larger values of the variable under consideration.

Here ,mean=8.46 refers that on an avg if we pick a LGA randomly then its Number of administration would be 8.46 (9 approx) on an avg.

Here,standard deviation=8.4, is a measure of spread,it accounts all the values of the variable,it measures the variability of the data .It measures how the data is deviated from the mean value.

Here ,IQR i.e inter quartile range gives the range in which the 50% of the middle values .Here the range is [0,17.25]

Task 4

From Table 1, the measure of Kurtosis -0.94489,so the data is Platykurtic i.e. the tails are very thin compared to the normal distribution.

The measure of skewness 0.666345,so the data is positively skewed.

In case of normal distribution Mean=median=mode but here mean=8.46,mode=0 and median=6.5

So according to these three pieces of evidence our sample “occupation 4” data has not been obtained from a normally distributed population .

According to Standard normal table P(Z<1.5)=0.9332 where Z follows standard normal distribution.

Hence , P(-1.5<Z<1.5)=2(0.9332-0.5)=0.8664. (As Z is symmetric about 0)

So 50*0.8664=43.32 i.e 43

So approximately 43 values out of 50 should lie within 1.5 standard deviations from the mean.

According to the descriptive statistics table

Mean=8.46 and standard deviation=8.415753

The bound for 1.5 standard deviation spread from the mean is given by [-4.1636295,21.0836295]

Going through the data we observe that 44 observations out of 50 lie within the above interval so it satisfies the result in (b) . (only difference of One observation can be ruled out). Hence the result does not confirm our conclusion in (a)

[Sheldon, Ross (2010). Introductory Statistics, Academic Press,USA.]

Task 5

Using Excel we find out the following descriptive statistics table of our sample “occupation 4” data.

Here, We have considered only those which are required for computation of the confidence interval.

Mean	8.46
Standard Error	1.190167
Standard Deviation	8.415753
Sample Variance	70.8249
Count	50

Hence :

A point estimate of the mean “Occupation 4” of the population is given by the sample mean

i.e 8.46

A 90% confidence interval estimate of the mean “Occupation 4” of the population.

[ , ]

,n=50, s = 8.415753

=upper 100% point of a t distribution with (n-1) degrees of freedom.

=1.676551 [ Excel Formula: =T.INV(0.95,49) ]

Task 2: Percentiles and quartiles of LGA occupation data

Here, Upper CI=8.46+1.676551=10.45537601

Lower CI=8.46-1.676551=6.464623986

Hence the 90% confidence interval is given by [6.464623986,10.45537601] i.e [6.46,10.46] (upto 2 decimal places)

iii)

In the context of the variable in this task if we collect samples again and again from the population then 90% of the times the population mean number of administration lies within [6.46,10.46] i.e [6,10] (rounded off)

The 90% confidence interval of mean number of administration lies within [6.46,10.46] i.e [6,10] ,hence it does not contain the value 59,so we would not consider the interval estimate obtained in (a), to be satisfactory.

Task 6

(a)

Here we are interested in the values of “V8” i.e Income category. According to our data the indexes are following : $650-$800=0; $801-$850=1

In this case we are focusing on the $650-$800 income earners.

Using Excel we find out the following: Out of 50 LGA’s 24 are $650-$800 income earners.

We store the data of “V8” in column F.

[ Excel Formula: =COUNTIF(F2:F51,0) ]

(i)

A point estimate of the proportion of $650-$800 income earners in the population is obtained

As, (ii)

A 99% confidence interval estimate of the $650-$800 income earners of the population is given by

[ , ]

Here is the observed proportion of $650-$800 income earners in our sample.

is the 100 % point of a standard normal distribution. n is the sample size i.e 50.

Here for 99% confidence interval,α=0.01,n=50, and 2.575829.

The value of is obtained using the following [ Excel Formula: =NORM.INV(0.995,0,1) ]

So,

Upper CI= = 0.661992846

Lower CI= = 0.298007153

Hence the 99% Confidence interval is given by,

[0.298007153, 0.661992846] i.e [0.3,0.66] (upto 2 decimal places)

Let the population proportion of $650-$800 income earners is denoted by P

Now,P follows Normal distribution with mean==0.48 and standard deviation==0.070654086

By empirical rule of Normal distribution the 95% of the values of normal distribution lies within 2 standard deviation from the mean.

So the 95% Confidence interval (based on the Empirical rule) of the $650-$800 income earners in the population is [0.48-2*0.070654086,0.48+2*0.070654086] i.e [0.338692,0.621308] i.e [0.34,0.62] (upto 2 decimal places)

[Akobeng AK. Confidence intervals and p-values in clinical decision making. Acta Paediatr. 2008;97:1004–1007]c)

The 99% confidence interval of the $650-$800 income earners in the population is [0.3,0.66] (upto 2 decimal places) where as, the 95% Confidence interval (based on the Empirical rule) of the $650-$800 income earners in the population is [0.34,0.62] (upto 2 decimal places).

Hence the length of the 95% confidence interval is 0.62-0.34=0.28 and the length of the 99% confidence interval is 0.66-0.3=0.36 . Hence the 99% confidence interval’s length is more than 95% confidence interval’s length as it is quiet obvious as 99% confidence interval will contain the value of the population mean 99% of the times if we repeatedly collect samples from our population where as 95% confidence interval will contain the value of the population mean 95% of the times if we repeatedly collect samples from our population. So the more accurate the confidence interval is the more spread it is. Direction of the spread is expected as we give an interval estimate so when we increase our accuracy level .

[Altman D, Bland JM. Confidence intervals illuminate absence of evidence. BMJ. 2004;328:1016–1017]

References

Sheldon, Ross (2010). Introductory Statistics, Academic Press,USA.
·         Hoel,P.G.,(1971),Introduction to Mathematical Statistics,Fourth Edition,USA
·         Feller,William(2013),An introduction to Probability Theory and Its Applications,Volume I,Third Edition,U.K.
·         Du Prel, J.-B., Hommel, G., Röhrig, B., & Blettner, M. (2009). Confidence Interval or P-Value?: Part 4 of a Series on Evaluation of Scientific Publications. Deutsches Ärzteblatt International, 106(19), 335–339.
·         Akobeng AK. Confidence intervals and p-values in clinical decision making. Acta Paediatr. 2008;97:1004–1007
·         Altman D, Bland JM. Confidence intervals illuminate absence of evidence. BMJ. 2004;328:1016–1017

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2020). Statistical Analysis Of LGA Data For Australia Population Sample. Retrieved from https://myassignmenthelp.com/free-samples/bb-108-business-statistics/percentile-location-formula.html.

"Statistical Analysis Of LGA Data For Australia Population Sample." My Assignment Help, 2020, https://myassignmenthelp.com/free-samples/bb-108-business-statistics/percentile-location-formula.html.

My Assignment Help (2020) Statistical Analysis Of LGA Data For Australia Population Sample [Online]. Available from: https://myassignmenthelp.com/free-samples/bb-108-business-statistics/percentile-location-formula.html
[Accessed 10 May 2024].

My Assignment Help. 'Statistical Analysis Of LGA Data For Australia Population Sample' (My Assignment Help, 2020) <https://myassignmenthelp.com/free-samples/bb-108-business-statistics/percentile-location-formula.html> accessed 10 May 2024.

My Assignment Help. Statistical Analysis Of LGA Data For Australia Population Sample [Internet]. My Assignment Help. 2020 [cited 10 May 2024]. Available from: https://myassignmenthelp.com/free-samples/bb-108-business-statistics/percentile-location-formula.html.