Household Data Analysis Essay: Sampling and Stats.

A. Draw a random sample of two hundred and fifty (250) households as per the sample selection procedure. What sampling method have you used to select your sample data? In your opinion, is this the best method of sampling particularly when one is interested in characteristics like the gender of the household head, education levels etc., why or why not?
B. Compute the descriptive statistics and draw a Box-Whisker plot of Expenditures on the following variables (all series in one graph!);
(i) Alcohol (ii) Meals (iii) Fuel (iv) Phone
C. Use information from the descriptive statistics and the boxplots in part (B) above to present a summary of your findings by contrasting different features of these distributions.

A. Construct a frequency distribution of the expenditures on Utilities, using the following classification (11 classes).
1 2 ... 10 11
Classes 0 - 300 300 - 600 ... 2700 - 3000 More than 3000

B. Using frequency distribution of the utilities above, what is the percentage of households who spend on Utilities
1.
a.at the most $900 per annum
b.between $1500 and $2700 per annum, and
c.more than $3000 per annum.

A.  Find the top 5% value and the bottom 5% value of household’s annual after-tax income (AtaxInc). What do these two values imply?
B.  The series OwnHouse represents whether a household owns a house or not. Let X be a random variable such that X = Number of households who own a house.
(i) Is this a quantitative or a qualitative variable?
(ii) What would be the probability distribution of this random variable if we choose randomly (a) Only 1 household? (b) 250 households? Provide any relevant condition(s) to justify your answer.
C.  Draw a scatter plot of natural log of total expenditures against natural log of after-tax income, that is, ln(texp) against ln(ataxinc) and compute the coefficient of correlation. Express your finding of the relationship between the two variables.

A. Construct a contingency table between the gender and the level of education.
B. What is the probability that the head of household is a male and his highest level of education is Intermediate?
C. What is the probability that the head of household is a female and has the Bachelor degree?
D. What is the proportion of having the Secondary as the highest degree from among males?
E. Do you think that the events "gender of household head is female" and "having the Master Degree" are independent?

Use of stratified random sampling over simple random sampling for representative sample

In order to analyse the household data, simple random sampling procedure has been taken into account. However, in simple random sampling, there is possibility of under and over representation of the key attributes of population considering equal probability attached with each element being selected. Hence, it would be recommended to use stratified random sampling for ensuring a more representative sample (Flick, 2015). The derived sample would be considered as the true representation of the population because the sample has the same ratio of the key attributes as contained in the population. Therefore, in regards, to derive true sample from population, one should use stratified random sampling for the given case (Hastie, Tibshirani & Friedman, 2011).

The variables of interest are Alcohol, Fuel, Meal and Phone, the descriptive statistics (numerical summary) and the box whisker plot for the variables are shown below:

Box whisker plot

The data set would be considered to be normally distributed only when the measures of central tendency mean, median, mode are equal. Also, it is essential that the data should not have any skew in regards to conclude that data are normally distributed. When, the measures of central tendency are not equal and there is skew present, then in such cases the data would not be classified as normal distributed data. It is apparent from the numerical summary tables that mean, median and mode are not equal in any of the four cases. Similarly, each variable has the nonzero skew. Hence, the conclusion can be drawn that the data for each variables has non-normal distribution (Hair et. al., 2015).

Further, the high value of skew represents the existence of outliers in the data. This is apparent from the box and whisker plot that at the high positive end of the data outliers are present which represents the distortive effect of the mean value and hence, it would be suitable to consider the median as the measure of central tendency (Flick, 2015).

The variable ‘utilities’ has been taken into consideration to prepare frequency distribution table.

The frequency, relative frequency, cumulative frequency and cumulative relative frequency is computed in excel and is shown below:

Percentages of household hold for the given scenarios are shown below:

At the most of $900 per year

The given number of households heads (Sample) = 250

The number of households heads that spent on utilities at the most $ 900 per year = 0 + 102 = 102

Descriptive statistics and box-whisker plot for variables of interest

% of households heads that spent on utilities at the most $ 900 per year =the number of households heads that spent on utilities at the most $ 900 per year / the given number of households heads (Sample)

= 102 / 250 = 0.408 or 40.8%

40.8% of the household would spend at most $900 per year on utilities.

Between $1500 per year and $2700 per year

The given number of households heads (Sample) = 250

The number of households heads that spent on utilities between $1500 per year and $2700 per year = 72 +51 = 123

% of households heads that spent on utilities on utilities between $1500 per year and $2700 per year = The number of households heads that spent on utilities between $1500 per year and $2700 per annum / The given number of households heads (Sample)

= 123/250 = 0.492 or 49.2 %

40.8% of the household would spend at most $900 per year on utilities.

More than $3000 per annum

The given number of households heads (Sample) = 250

The number of households heads that spent on utilities more than $3000 per year = 8

% of households heads that spent on utilities more than $3000 per year = The number of households heads that spent on utilities more than $3000 per year / The given number of households heads (Sample) = 8 / 250 = 0.032 or 3.2 %

3.20% of the household would spend more than $3000 per year on utilities.

The top and bottom 5% value for the annual tax income (Ataxlnc) is computed in excel and is represented below:

Top 5% value of annual tax income: $143023.30 represents that 95% of the total number of households would have annual tax income lower than this value (143023.30).

Bottom 5% value of annual tax income: $46958.00 represents that 95% of the total number of households would have annual tax income higher than this value (143023.30).

The numerical value (0 = rented/otherwise or 1 = owned) of OwnHouse variable is indication of the fact that the variable X is a quantitative variable (Eriksson & Kovalainen, 2015).

From the above, it can be seen that when the analysis needs to be done for one household, then the expected possible results would be two only zero or one. Therefore, the probability distribution can be said as normal. While, when large data set such as 250 households would be used for analysis, then the variable x can have various discrete integral values which indicate the Poisson distribution rather than normal distribution Hastie, Tibshirani & Friedman(, 2011).

In regards to find the relationship between the variables annual tax income and annual tax expenditure scatter plot and correlation coefficient is used.

The value of correlation coefficient is higher than 0.5 which represents that the relationship between variables has medium to high strength. There is also evidence of the relationship between the given variables being directly proportional. It indicates that higher the annual tax income would lead higher annual total expenditure. However, there are some cases where deviations are observed along with outliers (Eriksson & Kovalainen, 2015).

Contingency table between higher level of education and gender of household head

Probability that household head – Male – Intermediate degree = 32

Total households = 250

Probability that household head – Female – Bachelor degree = 26

Total households = 250

Proportion that household head – Male – Secondary degree = 14

Total households = 131

Independent when, (Fehr and Grossman, 2013).

It is apparent that above condition is not satisfied and hence, the events are not said to be independent.

References

Eriksson, P. & Kovalainen, A. (2015) Quantitative methods in business research (3^rd ed.). London: Sage Publications.

Fehr, F. H., & Grossman, G. (2013) An introduction to sets, probability and hypothesis testing (3^rd ed.). Ohio: Heath.

Flick, U. (2015) Introducing research methodology: A beginner's guide to doing a research project (4^th ed.). New York: Sage Publications.

Hair, J. F., Wolfinbarger, M., Money, A. H., Samouel, P., & Page, M. J. (2015) Essentials of business research methods (2^nd ed.). New York: Routledge.

Hastie, T., Tibshirani, R. & Friedman, J. (2011) The Elements of Statistical Learning (4^th ed.). New York: Springer Publications.

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2020). Analyzing Household Data: Sampling Technique, Descriptive Statistics, And Essay.. Retrieved from https://myassignmenthelp.com/free-samples/bus5sbf-statistics-for-business-and-finance-for-analyzing-household-data.

"Analyzing Household Data: Sampling Technique, Descriptive Statistics, And Essay.." My Assignment Help, 2020, https://myassignmenthelp.com/free-samples/bus5sbf-statistics-for-business-and-finance-for-analyzing-household-data.

My Assignment Help (2020) Analyzing Household Data: Sampling Technique, Descriptive Statistics, And Essay. [Online]. Available from: https://myassignmenthelp.com/free-samples/bus5sbf-statistics-for-business-and-finance-for-analyzing-household-data
[Accessed 11 June 2025].

My Assignment Help. 'Analyzing Household Data: Sampling Technique, Descriptive Statistics, And Essay.' (My Assignment Help, 2020) <https://myassignmenthelp.com/free-samples/bus5sbf-statistics-for-business-and-finance-for-analyzing-household-data> accessed 11 June 2025.

My Assignment Help. Analyzing Household Data: Sampling Technique, Descriptive Statistics, And Essay. [Internet]. My Assignment Help. 2020 [cited 11 June 2025]. Available from: https://myassignmenthelp.com/free-samples/bus5sbf-statistics-for-business-and-finance-for-analyzing-household-data.