Exercises in Stats & Regression Analysis essay.

1. Suppose that a rare disease has an incidence of 1 in 1000 assuming that all members of a population are equally likely to become infected and that their probabilities of infection are independent. Find the probability of k cases in a population of n = 100,000 for k = 0, 1, 2.
(Hint: recall the example in lecture.)

2. Suppose that the lifetime of an electrical component follows an exponential distribution with λ = 0.1
• Find the probability that the lifetime is less than 10.
• Find the probability that the lifetime is between 5 and 15.

3. Let X be a Gaussian random variable with µ = 5 and σ = 10. Find:
• Pr(X > 10).
• Pr(−20 < X < 15).
• the value of x such that Pr(X > x) = 0.95.

4. Suppose that N the number of insurance claims a year filed per year follows a Poisson distribution with E(N) = 10,000. Use the Central Limit Theorem and the Gaussian approximation to the Poisson to approximate Pr(N > 10,200). Check your answer with the exact result obtained using MATLAB.

Probability of rare disease incidence in a population

When the two six sided die are thrown sequentially, the sample space will be the following.

S = {(1,1)(1,2)(1,3)(1,4)(1,5)(1,6)(2,1)(2,2)(2,3)(2,4)(2,5)(2,6)(3,1)(3,2)(3,3)(3,4)(3,5)(3,6)(4,1)(4,2)(4,3)(4,4)(4,5)(4,6)(5,1)(5,2)(5,3)(5,4)(5,5)(5,6)(6,1)(6,2)(6,3)(6,4)(6,5)(6,6)}

b)Event A = the sum of two values is at least 5 = {(1,4)(1,5)(1,6)(2,3)(2,4)(2,5)(2,6)(3,2)(3,3)(3,4)(3,5)(3,6)(4,1)(4,2)(4,3)(4,4)(4,5)(4,6)(5,1)(5,2)(5,3)(5,4)(5,5)(5,6)(6,1)(6,2)(6,3)(6,4)(6,5)(6,6)}

Event B = the value of the ?rst die is higher than the value of the second = {(2,1)(3,2)(4,1)(4,2)(4,3)(5,1)(5,2)(5,3)(5,4)(6,1)(6,2)(6,3)(6,4)(6,5)}

Event C = The first value is 4 = {(4,1)(4,2)(4,3)(4,4)(4,5)(4,6)}

c) Pr(A) = n(A)/n(S) = (number of elements in event A)/(total elements in sample space) = 30/36 = 5/6.

Pr(B|given that the ?rst die was a 3) = P(B∩ first die 3)/ P(first die 3) = =

Pr(C) = n(C)/n(S) = 6/36 = 1/6.

The occurrence of the disease follows binomial distribution with probability of success p = 1/1000 = 0.001 and q = 1-0.001 = 0.999 (probability of failure) and n = 100,000.

Hence, the probability mass function of the distribution is

P(k) = 100000Ck * (0.001)^k * (0.999)^(100000-k)

P(0) = 0.999^(100000)

P(1) = 100000 * (0.001) * (0.999)^(99999)

P(2) = 100000 * (0.001)^2 * (0.999)^(99998)

Let the random variable X, which is the lifetime of an electrical instrument follows exponential distribution with λ = 0.1.

Hence, P(X) = for x ≥ 0.

= 0, for x<0

So, P(X<=10) =F(10) = = 0.63212.

P(5<=X<=15) = F(15) – F(5) = F( P(X<=15) – P(X<=5) = 0.77687 – 0.39347 = 0.3834.

Given that,

X ~ N(µ = 5, σ = 10)

Hence, P(X > 10) = 1- P(X<=10) = 1- P(Z<=(10-5)/10)) 1- 0.6915 = 0.3085. (from standard normal table)

P(−20 < X < 15) = P(X<=15) – P(X<=-20) = P(Z<=(15-5)/10) – P(Z<=(-20-5)/10) = 0.8413 – 0.0062 = 0.8351. (from standard normal table)

Now, P(X > x) = 0.95 => P(X<=x) = 1-0.95 = 0.05 (As normal distribution is symmetric and the total probability is 1.

Now, from the standard normal table Z = -1.65 corresponds to a score below which the area under the normal curve is equal to 0.05.

Hence, (x-5)/10 = -1.65 => x = -16.5 + 5 = -11.5

Hence, the value of x=-11.5, for which the area in the right tail of normal curve is 0.95.

Given that,

N ~ Poisson(µ=10000)

Now, by Gaussian approximation and the central limit theorem it is known that any distribution of the sample mean having a well-defined mean and variance can be approximated to normal distribution.

In this case the approximation will be N ~ normal(10000, sqrt(10000)) = normal(10000, 100)

Hence, by the approximation the value of P(N > 10,200) = 1 – P(N<=10200)

= 1- P(Z<=(10200-10000)/100) = 1 – 0.9772 = 0.0228.

Now, in MATLAB putting Poisson distribution to calculate the CDF of P(N > 10,200) or 1 – P(N<=10200) gives the result 0.0227.

MATLAB code:

1 - cdf('Poisson',10200,10000)

0.0227

Hence, approximation error of Poisson to normal is |0.0228 – 0.0227| = 0.0001

The MLE (maximum likelihood estimate) of the above function is the value of which maximizes the function L() = f(x1,x2,x3..|). Here, f is the probability density function.

So, L( = (x1/(x2/(x3/….

Now, taking logarithm both side,

Lifetime of an electrical component

Now, the maximizing.

At, max(,

= 0

So, for this pdf gives the maximum likelihood estimate.

Given that the sample of data x= x1,x2,….xn follows Poisson distribution with mean λ, and that λ follows exponential distribution with parameter θ.

So, P(X) =

P() = θ e^(-

Hence, posterior probability = Probability of likelihood* Prior probability

Hence, this is a Gamma distribution with parameters

β = θ + n, α = (Proved)

Question 8:

The variables of the yacht.dat file are the following in order.

X1 Residuary resistance per unit weight of displacement, adimensional

V2 Longitudinal position of the center of buoyancy, adimensional

V3 Prismatic coe?cient, adimensional

V4 Length-displacement ratio, adimensional

V5 Beam-draught ratio, adimensional

V6 Length-beam ratio, adimensional

V7 Froude number, adimensional

Now, using fitlm command in MATLAB the variable X1 is fitted with respect to independent variables V2,V3,V4,V5,V6 and V7.

MATLAB command:

% manually load yacht.dat by selecting it from folder

bfm = fitlm(yacht,'X7~V1+V2+V3+V4+V5+V6')

bfm =

Linear regression model:

X7 ~ 1 + V1 + V2 + V3 + V4 + V5 + V6

Estimated Coefficients:

Estimate SE tStat pValue

(Intercept) 154.51 32.359 4.775 2.8055e-06

V1 0.018076 0.44595 0.040534 0.96769

V2 -301.54 52.185 -5.7783 1.8779e-08

V3 -9.8484 18.656 -0.52791 0.59795

V4 7.0168 7.2464 0.96832 0.33366

V5 7.6548 18.712 0.40908 0.68277

V6 73.168 5.1483 14.212 1.8803e-35

Number of observations: 309, Error degrees of freedom: 302

Root Mean Squared Error: 11.8

R-squared: 0.402, Adjusted R-Squared 0.39

F-statistic vs. constant model: 33.9, p-value = 3.44e-31

Hence, the linear regression model is,

X1 = 154.51 + 0.018V1 -301.54V2 -9.848V3 + 7.017V4 +7.655V5 + 73.168V6.

The feval function in MATLAB evaluates the value of X1= Residual resistance per unit weight of displacement, for different values of the variables V1 to V6.

Here, a generalized linear model is fitted for both red wine ‘quality’ variable and white wine ‘quality’ variable assuming Poisson distribution.

Model fitting for red wine:

MATLAB code with output:

% manually load winequalityred.csv from folder

modelspec = 'quality~fixedacidity +volatileacidity + citricacid + residualsugar + chlorides + freesulfurdioxide + totalsulfurdioxide + density + pH + sulphates + alcohol';

lm1 = fitglm(winequalityred,modelspec,'Distribution','poisson')

Generalized linear regression model:

quality ~ [Linear formula with 12 terms in 11 predictors]

Distribution = Poisson

Estimated Coefficients:

Estimate SE tStat pValue

(Intercept) 3.6538 13.67 0.26728 0.78925

fixedacidity 0.0036583 0.016633 0.21994 0.82592

volatileacidity -0.1977 0.08039 -2.4593 0.013921

citricacid -0.035923 0.096141 -0.37365 0.70866

residualsugar 0.0026177 0.009736 0.26887 0.78803

chlorides -0.33176 0.27688 -1.1982 0.23084

freesulfurdioxide 0.00082523 0.0014126 0.58418 0.5591

totalsulfurdioxide -0.00061063 0.00047979 -1.2727 0.20312

density -2.1729 13.953 -0.15573 0.87624

pH -0.074826 0.12406 -0.60317 0.5464

sulphates 0.15912 0.072618 2.1912 0.028434

alcohol 0.04815 0.016999 2.8325 0.0046188

1599 observations, 1587 error degrees of freedom

Dispersion: 1

Chi^2-statistic vs. constant model: 66.1, p-value = 6.81e-10

White wine model:

MATLAB code with output:

% manually load winequalitywhite.csv from folder

modelspec = 'quality~fixedacidity +volatileacidity + citricacid + residualsugar + chlorides + freesulfurdioxide + totalsulfurdioxide + density + pH + sulphates + alcohol';

lm2 = fitglm(winequalitywhite,modelspec,'Distribution','poisson')

Generalized linear regression model:

quality ~ [Linear formula with 12 terms in 11 predictors]

Distribution = Poisson

Estimated Coefficients:

Estimate SE tStat pValue

(Intercept) 28.094 11.144 2.5211 0.011698

fixedacidity 0.012809 0.011881 1.0781 0.281

volatileacidity -0.33456 0.064234 -5.2085 1.9041e-07

citricacid 0.0025292 0.053278 0.047471 0.96214

residualsugar 0.014557 0.0043653 3.3347 0.00085393

chlorides -0.062667 0.31275 -0.20037 0.84119

freesulfurdioxide 0.00062244 0.00046312 1.344 0.17894

totalsulfurdioxide -3.6945e-05 0.00021042 -0.17558 0.86063

density -27.359 11.298 -2.4215 0.015457

pH 0.1235 0.059026 2.0922 0.036417

sulphates 0.10875 0.054501 1.9953 0.046011

alcohol 0.03036 0.014207 2.137 0.032594

4898 observations, 4886 error degrees of freedom

Dispersion: 1

Chi^2-statistic vs. constant model: 185, p-value = 1.04e-33

Now, in the white wine model it is clear that the overall p value is very close to zero which is less than considered significance level of 0.05. Hence, the model is appropriate. Now, the p values of the independent variables which are significant are volatileacidity, residualsugar, density, pH and alcohol respectively.

Similarly, the red wine model is a proper fit as overall p value is 6.81e-10 which is less than considered level of significance of 0.05.

Here, the p values of the independent variables which are significant are volatileacidity, sulphates and alcohol.

So, in white wine model there are more significant independent predictor variables than in red wine model.

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2021). Solved Exercises In Probability, Statistics, And Regression Analysis Essay.. Retrieved from https://myassignmenthelp.com/free-samples/enn543-data-analytics-and-optimisation/poisson-distribution.html.

"Solved Exercises In Probability, Statistics, And Regression Analysis Essay.." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/enn543-data-analytics-and-optimisation/poisson-distribution.html.

My Assignment Help (2021) Solved Exercises In Probability, Statistics, And Regression Analysis Essay. [Online]. Available from: https://myassignmenthelp.com/free-samples/enn543-data-analytics-and-optimisation/poisson-distribution.html
[Accessed 30 May 2025].

My Assignment Help. 'Solved Exercises In Probability, Statistics, And Regression Analysis Essay.' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/enn543-data-analytics-and-optimisation/poisson-distribution.html> accessed 30 May 2025.

My Assignment Help. Solved Exercises In Probability, Statistics, And Regression Analysis Essay. [Internet]. My Assignment Help. 2021 [cited 30 May 2025]. Available from: https://myassignmenthelp.com/free-samples/enn543-data-analytics-and-optimisation/poisson-distribution.html.