Regression Analysis and Hypothesis Testing

Question:

Independent variables in a regression equation should NOT be mutually exclusive.
P-value is the smallest level of α at which to reject Hο.
The usual objective of regression analysis is to predict/estimate the value of one variable when the value of another variable is known.
Correlation analysis is concerned with measuring the strength of the relationship between two variables.
In the least squares model, the explained sum of squares is always smaller than the regression sum of squares.
The sample correlation coefficient and the sample slope will always have the same sign.
An important relationship in regression analysis is = .
If in a regression analysis the explained sum of squares is 75 and the unexplained sum of square is 25, r2 = 0.33.
When small values of Y tend to be paired with small values of X, the relationship between X and Y is said to be inverse.
The probability that the test statistic will fall in the critical region, given that H0 is true, represents the probability of making a type II error.
When we reject a true null hypothesis, we commit a Type I error.

Part B: Multiple Choice (12–21)

A computer statistical package has included the following quantities in its output: SST = 50, SSR = 35, and SSE = 15. How much of the variation in y is explained by the regression equation?

49% 70% c. 35% d. 15%

In testing the significance of b₁, the null hypothesis is generally that

β = b₁ β 0 c. β = 0 d. β = r

Testing whether the slope of the population regression line could be zero is equivalent to testing whether the population could be zero.

standard error of estimate y-intercept
prediction interval coefficient of correlation

A multiple regression equation includes 4 independent variables, and the coefficient of multiple determination is 0.64. How much of the variation in y is explained by the regression equation?

80% 16% c. 32% d. 64%

A multiple regression analysis results in the following values for the sum-of-squares terms: SST = 50.0, SSR = 35.0, and SSE = 15.0. The coefficient of multiple determination will be

= 0.35 = 0.30 c. = 0.70 d. = 0.50

In testing the overall significance of a multiple regression equation in which there are three independent variables, the null hypothesis is
In a multiple regression analysis involving 25 data points and 4 independent variables, the sum-of-squares terms are calculated as SSR = 120, SSE = 80, and SST = 200. In testing the overall significance of the regression equation, the calculated value of the test statistic will be

F = 1.5 F = 5.5
F = 2.5 F = 7.5

For a set of 15 data points, a computer statistical package has found the multiple regression equation to be = -23 + 20+ 5 + 25 and has listed the t-ratio for testing the significance of each partial regression coefficient. Using the 0.05 level in testing whether = 20 differs significantly from zero, the critical t values will be

t = -1.960 and t= +1.960
t = -2.132 and t = +2.132
t = -2.201 and t = +2.201
t = -1.796 and t = +1.796

Computer analyses typically provide a p-Value for each partial regression coefficient. In the case of , this is the probability that

= 0
=
the absolute value of could be this large if = 0
the absolute value of could be this large if 1

In the multiple regression equation, = 20,000 + 0.05+ 4500 , is the estimated household income, is the amount of life insurance held by the head of the household, and is a dummy variable ( = 1 if the family owns mutual funds, 0 if it doesn’t). The interpretation of = 4500 is that

owing mutual funds increases the estimated income by $4500
the average value of a mutual funds portfolio is $4500
45% of the persons in the sample own mutual funds
the sample size must have been at least n = 4500

Part C: Please fill in the blank, circle your decision or answer the following questions (22-24).

State whether you would reject or fail to reject the null hypothesis in each of the following cases (two-tailed): Make a decision.

P = 0.12 ;	Decision
	Reject / Fail to reject

P = 0.03 ;	Decision
	Reject / Fail to reject

P = 0.001 ;	Decision
	Reject / Fail to reject

Given the following, complete the ANOVA table and make the correct inference. Using F-value to make a decision.

Source	SS	df	MS	F
Treatments		3
Error	88.8
Total	435	19

	ANSWER
a) What is the hypothesis being tested in this problem?	H₀=The treatment are all equal. H_a=at least one of them is not equal or all of them are not equal
b) In the above ANOVA table, is the factor significant at the 5% level?
c) What is the number of observations?

State whether should be accepted or rejected for , given the following; (fill in the blank and circle your decision)
a)= 2.34; k=3, n=14

Computed F	Critical F	Decision
2.34	________	Reject / Fail to reject

b)= 2.52; k=5, n=25

Computed F	Critical F	Decision
2.52	________	Reject / Fail to reject

c)= 4.29; k=4, n=28

Computed F	Critical F	Decision
4.29	________	Reject / Fail to reject

Part D: Must show all your work step by step in order to receive the full credit; Excel is not allowed. (25-35)

Consider the following hypothesis test.

H_o: µ = 17

H_a: µ ≠ 17

A sample of 25 gives a sample mean of 14.2 and sample variance of 25.

a)	At 5% should the null be rejected?	b)	Compute the value of the test statistic
c)	What is the p-value?	d)	What is your conclusion? Explain.

Consider the following hypothesis test

H_o: µ ≥ 10

H_a: µ < 10

A sample of 50 provides a sample mean of 9.46 and sample variance of 4.

a)	At 5% should the null be rejected?	b)	Compute the value of the test statistic

c)	What is the p-value?	d)	What is your conclusion?

Use problem 13 on page 9-28 to answer the following questions.

A bath soap manufacturing process is designed to produce a mean of 120 bars of soap per batch. Quantities over or under the standard are undesirable. A sample of ten batches shows the following number bars of soap.

108 118 120 122 119 113 124 122 120 123

Using a 0.05 level of significance, test to see whether the sample results indicate that the manufacturing process is functioning properly.

a) What is the sample mean

b) What is the sample standard deviation

c) Use Z or T test? And why?

d) What is your hypothesis test

e) At α = 0.05, what is the rejection rule?

f) Compute the value of the test statistic.

g) What is the p-value?

h) What is your conclusion?

Fill the below table and answer the given questions

Months on job (x)	Monthly sales (y) thousands of dollars	X²	Y²	XY
1	0.80
2	2.40
4	7.00
5	3.70
8	11.30
9	12.00
12	15.00
Total	52.2

a)Find	b)Find
c)Write the equation and interpret	d) Compute R²and how is it different from adjusted R².
e) Compute the estimated variance of the regression.	f) Find
g) Compute the estimated variance of	h) Compute the standard error of

Please fill in the computer printout and answer the following questions. Given that

SUMMARY OUTPUT
Regression Statistics
Multiple R	0.9037
R Square	______
Adjusted R Square	______
Standard Error	______
Observations	5
ANOVA
	df	SS	MS	F	Significance F
Regression	1	4.9	_____	_____	0.03535
Residual	3	1.1	_____
Total	_____	_____

	Coefficients	Standard Error	t Stat	P-value	Lower95%	Upper 95%	Lower 95.0%	Upper 95.0%
Intercept	-0.1	______	-0.15746	0.88488	-2.12112	1.92112	-2.12112	1.92112
X1	0.7	______	3.65563	0.03535	0.09061	1.30939	______	______

a) What percent of the variation is explained by the regression equation?

b) What is the standard error of regression?

c) What is the critical value of the F-statistic?

d) What sample size is used in the print out

The following regression equation was obtained using the five independent variables.Given that

a) What percent of the variation is explained by the regression equation?	b) What is the standard error of regression? 2
c) Write the estimated equation.	d) What is the critical value of the F-statistic?
e) What sample size is used in the print out?	f) What is the variance of the slope coefficient of income?
g) Assuming that you are using a two-tailed test make a decision using the computed P-value.

A large hotel purchased 200 new color televisions several months ago: 80 of one brand and 60 of each of two other brands. Records were kept for each set as to how many service calls were required, resulting in the table that follows.

Number of Service Calls	TV Brand			Total
Number of Service Calls	Sony	Toshiba	Sanyo	Total
None	8	15	18	41
One	30	55	12	97
Two or more	22	10	30	62
Total	60	80	60	200

Assume the TV sets are random samples of their brands. With 5% risk of Type I error, test for

an association between TV brand and the number of service calls.

Is the value significant at 5% level of significant? Write the conclusion for this question.

A metropolitan bus system sampler’s rider counts on one of its express commuter routes for week. Use the following data to establish whether ridership is evenly balanced by day of the week. Let = 0.05.

Day	Monday	Tuesday	Wednesday	Thursday	Friday
Rider Count	10	34	21	57	44

Is the value significant at 5% level of significant? Write the conclusion for this question

Explain the major difference between and ANOVA in terms of type of data (no calculation required)
Explain why a 95% confidence interval estimate for the mean value of y at a particular x is narrower than a 95% confidence interval for an individual y value at the same value of x
If , n=11, andwhat is ? (Single Regression model)
You are given the following information from fitting a multiple regression with three variables to 30 sample data points:

Answer:

PART A: True or False

1) F

2) T

3) T

4) T

5) F

6) T

7) T

8) F

9) F

10) F

11) T

PART B: Multiple Choice

12) Option B

13) Option C

14) Option D

15) Option D

16) Option C

17) Option A

18) Option D

19) Option C

20) Option A

21) Option A

PART C: Fill in the blanks

22) a) Fail to reject

b) Reject
c) Reject

23) a) Fail to Reject

b) Fail to Reject
c) Reject

24 a) H₀=The treatment are all equal.

H_a=at least one of them is not equal or all of them are not equal

b) Yes, Factor is significant
c) 20

Question 25

H_o: µ = 17

H_a: µ ≠ 17

Sample size = 25

Sample mean = 14.2

Sample variance = 25

Standard deviation = sqrt(25) = 5

a)	At 5% should the null be rejected?	b)	Compute the value of the test statistic
	(a) H₀ will be rejected when the p value is higher than significance level (5%).		(b) The test statistics
c)	What is the p-value?	d)	What is your conclusion? Explain.
	The p value (for 24 degree of freedom and -2.8 t value) = 0.0099		It can be seen that p value is lower than significance level (0.05) and hence, null hypothesis would be rejected and alternative hypothesis would be accepted. Hence, it can be said that hypothesized mean is not same as 17.

Question 26

H_o: µ >= 10

H_a: µ < 10

Sample size = 50

Sample mean = 9.46

Sample variance = 4

Standard deviation = sqrt(4) = 2

a)	At 5% should the null be rejected?	b)	Compute the value of the test statistic
	(a) H₀ will be rejected when the p value is higher than significance leel (5%).		(b) The test statistics
c)	What is the p-value?	d)	What is your conclusion?
	The p value (for 49 degree of freedom and -1.91 t value) = 0.031		(c) It can be seen that p value is lower than significance level (0.05) and hence, null hypothesis would be rejected and alternative hypothesis would be accepted. Hence, it can be said that population mean is less than 10.

Question 27

What is the sample mean

Sample mean = 108+118+120+122+119+113+124+122+120+123 = 118.9

What is the sample standard deviation

Standard deviation = sqrt(218.9/9) =4.93

Standard deviation = 4.93

Use Z or T test? And why?

Population standard deviation is unknown and also, the sample size is lower than 30 and hence, t test would be taken into consideration.

What is your hypothesis test

Null hypothesis H₀: µ = 120

Alternative hypothesis H_a: µ ≠ 120

At α = 0.05, what is the rejection rule?

At significance level 0.05, the null hypothesis would be rejected when p value is lower than 0.05.

Compute the value of the test statistic.

What is the p-value?

The p value (for (10-1) = 9 degree of freedom and –0.70 t value) = 0.5016

What is your conclusion?

It can be seen that p value is higher than significance level (0.05) and hence, null hypothesis would not be rejected and alternative hypothesis would not be accepted. Hence, it cannot be said that mean number of bars is difference from 120.

Question 28

a)Find	b)Find
c)Write the equation and interpret y = 1.3152 x-0.2462	d) Compute R²and how is it different from adjusted R².
e) Compute the estimated variance of the regression.	f) Find
g) Compute the estimated variance of Variance of b1 = MSE- Sxx Sxx = sum(x- x bar)^2 = 94.86 Variance of b1 = MSE- Sxx = 2.487/94.86 = 0.026	h) Compute the standard error of Standard error of = sqrt(2.487/94.86) = 0.162

Question 29

a) What percent of the variation is explained by the regression equation?

R^2 = (4-9)/6.0 = 0.8167

Required percentage = 81.67%

b) What is the standard error of regression?

Standard error = sqrt((1-0.816)^2 * 0.1915) = 0.1874

c) What is the critical value of the F-statistic?

F stat = 13.36

d) What sample size is used in the print out?

Sample size = 5

Question 30

a) What percent of the variation is explained by the regression equation?

99.4% of variation is explained by the regression equation.

b) What is the standard error of regression? 2

Standard error of regression is 1.507.

c) Write the estimated equation.

Sales = -19.7 -0.00063outlets +1.74cars +0.410income + 2.04age -0.034 bosses

d) What is the critical value of the F-statistic?

Critical value is 6.256 based on the given significance level coupledwith degrees of freedom.

e) What sample size is used in the print out?

Sample size = 10

f) What is the variance of the slope coefficient of income?

Variance = √ SE * n⁴ = (0.04385)^0.5*10⁴ = 2,094.04

g) Assuming that you are using a two-tailed test make a decision using the computed P-value.

The null hypothesis would be rejected since the p value is zero which is lesser than the significance level. As a result, the alternative hypothesis would be accepted.

Question 31

Null and alternative hypothesis

H_o: TV brand and number of service calls are independent.

H_a: TV brand and number of service calls are dependent.

= .5033 + 0.1195 + 2.6415 + 0.0278 + 6.7639 + 10.0485 + 0.6215 + 8.8323 +6.987 = 37.54

Degree of freedom = (3-1) (3-1) =4

The p value for 37.54 chi square and 4 degree of freedom = 0.00

Significance level = 5%

It can be seen from the above that p value is lower than level of significance and hence, sufficient evidence is present to reject the null hypothesis and to accept the alternative hypothesis. Hence, it can be concluded that TV brand and number of service calls are dependent.

Question 32

Null and alternative hypothesis

H_o: Ridership is equally balanced.

H_a: Ridership is not equally balanced.

Expected frequency = (10+34+21+57+44)/5 = 33.2

= 41.2892

Degree of freedom = 5-1 = 4

The p value for 37.54 chi square and 4 degree of freedom = 0.00

Significance level = 5%

b) The major difference between chi-square and ANOVA is that the chi-square is used when the data is of categorical type and non-numerical in nature. On the contrary, ANOVA is used when the underlying data is numerical with the level of measurement being interval or ratio.

Question 33

When the estimation of mean value of y is done, then standard deviation with regards to sample mean is given by s/√n

Thus, the confidence interval is given by (Sample mean – margin of error, Sample mean + margin of error)

Thus, the width of the confidence interval is 2*margin of error = 2*critical value * s/√n

When the y’s individual value is estimated for the value of x which is same, then s would be standard deviation

Thus, the width of the confidence interval is 2*s

As a result, the comparison of the two confidence interval width clearly supports the assertion in the question.

Question 34

R^2 =0.95

N = 11

SST= 100

Now,

Question 35

Null and alternative hypothesis

H_o:

H_a: .

Test stat (t value) = (25.2-0)/15 =1.68

Critical value of t = 2.05

It can be seen that t stat is lower than critical value of t and hence, cannot reject null hypothesis. Therefore, it can be concluded that slope is insignificant and can be assumed to be zero at 5% significance level.

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2021). Regression Analysis And Hypothesis Testing. Retrieved from https://myassignmenthelp.com/free-samples/eco578-statistical-methods/measuring-the-strength-of-relationship.html.

"Regression Analysis And Hypothesis Testing." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/eco578-statistical-methods/measuring-the-strength-of-relationship.html.

My Assignment Help (2021) Regression Analysis And Hypothesis Testing [Online]. Available from: https://myassignmenthelp.com/free-samples/eco578-statistical-methods/measuring-the-strength-of-relationship.html
[Accessed 25 April 2024].

My Assignment Help. 'Regression Analysis And Hypothesis Testing' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/eco578-statistical-methods/measuring-the-strength-of-relationship.html> accessed 25 April 2024.

My Assignment Help. Regression Analysis And Hypothesis Testing [Internet]. My Assignment Help. 2021 [cited 25 April 2024]. Available from: https://myassignmenthelp.com/free-samples/eco578-statistical-methods/measuring-the-strength-of-relationship.html.