1. Suppose that a rare disease has an incidence of 1 in 1000 assuming that all members of a population are equally likely to become infected and that their probabilities of infection are independent. Find the probability of k cases in a population of n = 100,000 for k = 0, 1, 2.
(Hint: recall the example in lecture.)
2. Suppose that the lifetime of an electrical component follows an exponential distribution with λ = 0.1
• Find the probability that the lifetime is less than 10.
• Find the probability that the lifetime is between 5 and 15.
3. Let X be a Gaussian random variable with µ = 5 and σ = 10. Find:
• Pr(X > 10).
• Pr(−20 < X < 15).
• the value of x such that Pr(X > x) = 0.95.
4. Suppose that N the number of insurance claims a year filed per year follows a Poisson distribution with E(N) = 10,000. Use the Central Limit Theorem and the Gaussian approximation to the Poisson to approximate Pr(N > 10,200). Check your answer with the exact result obtained using MATLAB.
Probability of rare disease incidence in a population
When the two six sided die are thrown sequentially, the sample space will be the following.
S = {(1,1)(1,2)(1,3)(1,4)(1,5)(1,6)(2,1)(2,2)(2,3)(2,4)(2,5)(2,6)(3,1)(3,2)(3,3)(3,4)(3,5)(3,6)(4,1)(4,2)(4,3)(4,4)(4,5)(4,6)(5,1)(5,2)(5,3)(5,4)(5,5)(5,6)(6,1)(6,2)(6,3)(6,4)(6,5)(6,6)}
b)Event A = the sum of two values is at least 5 = {(1,4)(1,5)(1,6)(2,3)(2,4)(2,5)(2,6)(3,2)(3,3)(3,4)(3,5)(3,6)(4,1)(4,2)(4,3)(4,4)(4,5)(4,6)(5,1)(5,2)(5,3)(5,4)(5,5)(5,6)(6,1)(6,2)(6,3)(6,4)(6,5)(6,6)}
Event B = the value of the ?rst die is higher than the value of the second = {(2,1)(3,2)(4,1)(4,2)(4,3)(5,1)(5,2)(5,3)(5,4)(6,1)(6,2)(6,3)(6,4)(6,5)}
Event C = The first value is 4 = {(4,1)(4,2)(4,3)(4,4)(4,5)(4,6)}
c) Pr(A) = n(A)/n(S) = (number of elements in event A)/(total elements in sample space) = 30/36 = 5/6.
Pr(B|given that the ?rst die was a 3) = P(B∩ first die 3)/ P(first die 3) = =
Pr(C) = n(C)/n(S) = 6/36 = 1/6.
The occurrence of the disease follows binomial distribution with probability of success p = 1/1000 = 0.001 and q = 1-0.001 = 0.999 (probability of failure) and n = 100,000.
Hence, the probability mass function of the distribution is
P(k) = 100000Ck * (0.001)^k * (0.999)^(100000-k)
P(0) = 0.999^(100000)
P(1) = 100000 * (0.001) * (0.999)^(99999)
P(2) = 100000 * (0.001)^2 * (0.999)^(99998)
Let the random variable X, which is the lifetime of an electrical instrument follows exponential distribution with λ = 0.1.
Hence, P(X) = for x ≥ 0.
= 0, for x<0
So, P(X<=10) =F(10) = = 0.63212.
P(5<=X<=15) = F(15) – F(5) = F( P(X<=15) – P(X<=5) = 0.77687 – 0.39347 = 0.3834.
Given that,
X ~ N(µ = 5, σ = 10)
Hence, P(X > 10) = 1- P(X<=10) = 1- P(Z<=(10-5)/10)) 1- 0.6915 = 0.3085. (from standard normal table)
P(−20 < X < 15) = P(X<=15) – P(X<=-20) = P(Z<=(15-5)/10) – P(Z<=(-20-5)/10) = 0.8413 – 0.0062 = 0.8351. (from standard normal table)
Now, P(X > x) = 0.95 => P(X<=x) = 1-0.95 = 0.05 (As normal distribution is symmetric and the total probability is 1.
Now, from the standard normal table Z = -1.65 corresponds to a score below which the area under the normal curve is equal to 0.05.
Hence, (x-5)/10 = -1.65 => x = -16.5 + 5 = -11.5
Hence, the value of x=-11.5, for which the area in the right tail of normal curve is 0.95.
Given that,
N ~ Poisson(µ=10000)
Now, by Gaussian approximation and the central limit theorem it is known that any distribution of the sample mean having a well-defined mean and variance can be approximated to normal distribution.
In this case the approximation will be N ~ normal(10000, sqrt(10000)) = normal(10000, 100)
Hence, by the approximation the value of P(N > 10,200) = 1 – P(N<=10200)
= 1- P(Z<=(10200-10000)/100) = 1 – 0.9772 = 0.0228.
Now, in MATLAB putting Poisson distribution to calculate the CDF of P(N > 10,200) or 1 – P(N<=10200) gives the result 0.0227.
MATLAB code:
1 - cdf('Poisson',10200,10000)
0.0227
Hence, approximation error of Poisson to normal is |0.0228 – 0.0227| = 0.0001
The MLE (maximum likelihood estimate) of the above function is the value of which maximizes the function L() = f(x1,x2,x3..|). Here, f is the probability density function.
So, L( = (x1/(x2/(x3/….
Now, taking logarithm both side,
Lifetime of an electrical component
Now, the maximizing.
At, max(,
= 0
- = 0
- =
So, for this pdf gives the maximum likelihood estimate.
Given that the sample of data x= x1,x2,….xn follows Poisson distribution with mean λ, and that λ follows exponential distribution with parameter θ.
So, P(X) =
P() = θ e^(-
Hence, posterior probability = Probability of likelihood* Prior probability
Hence, this is a Gamma distribution with parameters
β = θ + n, α = (Proved)
Question 8:
The variables of the yacht.dat file are the following in order.
X1 Residuary resistance per unit weight of displacement, adimensional
V2 Longitudinal position of the center of buoyancy, adimensional
V3 Prismatic coe?cient, adimensional
V4 Length-displacement ratio, adimensional
V5 Beam-draught ratio, adimensional
V6 Length-beam ratio, adimensional
V7 Froude number, adimensional
Now, using fitlm command in MATLAB the variable X1 is fitted with respect to independent variables V2,V3,V4,V5,V6 and V7.
MATLAB command:
% manually load yacht.dat by selecting it from folder
bfm = fitlm(yacht,'X7~V1+V2+V3+V4+V5+V6')
bfm =
Linear regression model:
X7 ~ 1 + V1 + V2 + V3 + V4 + V5 + V6
Estimated Coefficients:
Estimate SE tStat pValue
(Intercept) 154.51 32.359 4.775 2.8055e-06
V1 0.018076 0.44595 0.040534 0.96769
V2 -301.54 52.185 -5.7783 1.8779e-08
V3 -9.8484 18.656 -0.52791 0.59795
V4 7.0168 7.2464 0.96832 0.33366
V5 7.6548 18.712 0.40908 0.68277
V6 73.168 5.1483 14.212 1.8803e-35
Number of observations: 309, Error degrees of freedom: 302
Root Mean Squared Error: 11.8
R-squared: 0.402, Adjusted R-Squared 0.39
F-statistic vs. constant model: 33.9, p-value = 3.44e-31
Hence, the linear regression model is,
X1 = 154.51 + 0.018V1 -301.54V2 -9.848V3 + 7.017V4 +7.655V5 + 73.168V6.
The feval function in MATLAB evaluates the value of X1= Residual resistance per unit weight of displacement, for different values of the variables V1 to V6.
Here, a generalized linear model is fitted for both red wine ‘quality’ variable and white wine ‘quality’ variable assuming Poisson distribution.
- Model fitting for red wine:
MATLAB code with output:
% manually load winequalityred.csv from folder
modelspec = 'quality~fixedacidity +volatileacidity + citricacid + residualsugar + chlorides + freesulfurdioxide + totalsulfurdioxide + density + pH + sulphates + alcohol';
lm1 = fitglm(winequalityred,modelspec,'Distribution','poisson')
Generalized linear regression model:
quality ~ [Linear formula with 12 terms in 11 predictors]
Distribution = Poisson
Estimated Coefficients:
Estimate SE tStat pValue
(Intercept) 3.6538 13.67 0.26728 0.78925
fixedacidity 0.0036583 0.016633 0.21994 0.82592
volatileacidity -0.1977 0.08039 -2.4593 0.013921
citricacid -0.035923 0.096141 -0.37365 0.70866
residualsugar 0.0026177 0.009736 0.26887 0.78803
chlorides -0.33176 0.27688 -1.1982 0.23084
freesulfurdioxide 0.00082523 0.0014126 0.58418 0.5591
totalsulfurdioxide -0.00061063 0.00047979 -1.2727 0.20312
density -2.1729 13.953 -0.15573 0.87624
pH -0.074826 0.12406 -0.60317 0.5464
sulphates 0.15912 0.072618 2.1912 0.028434
alcohol 0.04815 0.016999 2.8325 0.0046188
1599 observations, 1587 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 66.1, p-value = 6.81e-10
White wine model:
MATLAB code with output:
% manually load winequalitywhite.csv from folder
modelspec = 'quality~fixedacidity +volatileacidity + citricacid + residualsugar + chlorides + freesulfurdioxide + totalsulfurdioxide + density + pH + sulphates + alcohol';
lm2 = fitglm(winequalitywhite,modelspec,'Distribution','poisson')
Generalized linear regression model:
quality ~ [Linear formula with 12 terms in 11 predictors]
Distribution = Poisson
Estimated Coefficients:
Estimate SE tStat pValue
(Intercept) 28.094 11.144 2.5211 0.011698
fixedacidity 0.012809 0.011881 1.0781 0.281
volatileacidity -0.33456 0.064234 -5.2085 1.9041e-07
citricacid 0.0025292 0.053278 0.047471 0.96214
residualsugar 0.014557 0.0043653 3.3347 0.00085393
chlorides -0.062667 0.31275 -0.20037 0.84119
freesulfurdioxide 0.00062244 0.00046312 1.344 0.17894
totalsulfurdioxide -3.6945e-05 0.00021042 -0.17558 0.86063
density -27.359 11.298 -2.4215 0.015457
pH 0.1235 0.059026 2.0922 0.036417
sulphates 0.10875 0.054501 1.9953 0.046011
alcohol 0.03036 0.014207 2.137 0.032594
4898 observations, 4886 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 185, p-value = 1.04e-33
- Now, in the white wine model it is clear that the overall p value is very close to zero which is less than considered significance level of 0.05. Hence, the model is appropriate. Now, the p values of the independent variables which are significant are volatileacidity, residualsugar, density, pH and alcohol respectively.
Similarly, the red wine model is a proper fit as overall p value is 6.81e-10 which is less than considered level of significance of 0.05.
Here, the p values of the independent variables which are significant are volatileacidity, sulphates and alcohol.
So, in white wine model there are more significant independent predictor variables than in red wine model.
To export a reference to this article please select a referencing stye below:
My Assignment Help. (2021). Solved Exercises In Probability, Statistics, And Regression Analysis Essay.. Retrieved from https://myassignmenthelp.com/free-samples/enn543-data-analytics-and-optimisation/poisson-distribution.html.
"Solved Exercises In Probability, Statistics, And Regression Analysis Essay.." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/enn543-data-analytics-and-optimisation/poisson-distribution.html.
My Assignment Help (2021) Solved Exercises In Probability, Statistics, And Regression Analysis Essay. [Online]. Available from: https://myassignmenthelp.com/free-samples/enn543-data-analytics-and-optimisation/poisson-distribution.html
[Accessed 22 November 2024].
My Assignment Help. 'Solved Exercises In Probability, Statistics, And Regression Analysis Essay.' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/enn543-data-analytics-and-optimisation/poisson-distribution.html> accessed 22 November 2024.
My Assignment Help. Solved Exercises In Probability, Statistics, And Regression Analysis Essay. [Internet]. My Assignment Help. 2021 [cited 22 November 2024]. Available from: https://myassignmenthelp.com/free-samples/enn543-data-analytics-and-optimisation/poisson-distribution.html.