Get Instant Help From 5000+ Experts For
question

Writing: Get your essay and assignment written from scratch by PhD expert

Rewriting: Paraphrase or rewrite your friend's essay with similar meaning at reduced cost

Editing:Proofread your work by experts and improve grade at Lowest cost

And Improve Your Grades
myassignmenthelp.com
loader
Phone no. Missing!

Enter phone no. to receive critical updates and urgent messages !

Attach file

Error goes here

Files Missing!

Please upload all relevant files for quick & complete assistance.

Guaranteed Higher Grade!
Free Quote
wave

1. Suppose that a rare disease has an incidence of 1 in 1000 assuming that all members of a population are equally likely to become infected and that their probabilities of infection are independent. Find the probability of k cases in a population of n = 100,000 for k = 0, 1, 2.
(Hint: recall the example in lecture.)

2. Suppose that the lifetime of an electrical component follows an exponential distribution with λ = 0.1
• Find the probability that the lifetime is less than 10.
• Find the probability that the lifetime is between 5 and 15.

3. Let X be a Gaussian random variable with µ = 5 and σ = 10. Find:
• Pr(X > 10).
• Pr(−20 < X < 15).
• the value of x such that Pr(X > x) = 0.95.

4. Suppose that N the number of insurance claims a year filed per year follows a Poisson distribution with E(N) = 10,000. Use the Central Limit Theorem and the Gaussian approximation to the Poisson to approximate Pr(N > 10,200). Check your answer with the exact result obtained using MATLAB.

Probability of rare disease incidence in a population

When the two six sided die are thrown sequentially, the sample space will be the following.

S = {(1,1)(1,2)(1,3)(1,4)(1,5)(1,6)(2,1)(2,2)(2,3)(2,4)(2,5)(2,6)(3,1)(3,2)(3,3)(3,4)(3,5)(3,6)(4,1)(4,2)(4,3)(4,4)(4,5)(4,6)(5,1)(5,2)(5,3)(5,4)(5,5)(5,6)(6,1)(6,2)(6,3)(6,4)(6,5)(6,6)}

b)Event A = the sum of two values is at least 5 = {(1,4)(1,5)(1,6)(2,3)(2,4)(2,5)(2,6)(3,2)(3,3)(3,4)(3,5)(3,6)(4,1)(4,2)(4,3)(4,4)(4,5)(4,6)(5,1)(5,2)(5,3)(5,4)(5,5)(5,6)(6,1)(6,2)(6,3)(6,4)(6,5)(6,6)}

Event B = the value of the ?rst die is higher than the value of the second = {(2,1)(3,2)(4,1)(4,2)(4,3)(5,1)(5,2)(5,3)(5,4)(6,1)(6,2)(6,3)(6,4)(6,5)}

Event C = The first value is 4 = {(4,1)(4,2)(4,3)(4,4)(4,5)(4,6)}

c) Pr(A) = n(A)/n(S)  = (number of elements in event A)/(total elements in sample space) = 30/36 = 5/6.

Pr(B|given that the ?rst die was a 3) = P(B∩ first die 3)/ P(first die 3) =   =

Pr(C) = n(C)/n(S) = 6/36 = 1/6. 

The occurrence of the disease follows binomial distribution with probability of success p = 1/1000 = 0.001 and q = 1-0.001 = 0.999 (probability of failure) and n = 100,000.

Hence, the probability mass function of the distribution is

P(k) = 100000Ck * (0.001)^k * (0.999)^(100000-k)

P(0) = 0.999^(100000)

P(1) = 100000 * (0.001) * (0.999)^(99999)

P(2) = 100000 * (0.001)^2 * (0.999)^(99998) 

Let the random variable X, which is the lifetime of an electrical instrument follows exponential distribution with λ = 0.1.

Hence, P(X) =  for x ≥ 0.

= 0,  for x<0

So, P(X<=10) =F(10) =  = 0.63212.

P(5<=X<=15) = F(15) – F(5) = F( P(X<=15) – P(X<=5) = 0.77687 – 0.39347 = 0.3834. 

Given that,

X ~ N(µ = 5, σ = 10)

Hence, P(X > 10) =  1- P(X<=10) = 1- P(Z<=(10-5)/10)) 1- 0.6915 = 0.3085. (from standard normal table)

P(−20 < X < 15) = P(X<=15) – P(X<=-20) = P(Z<=(15-5)/10) – P(Z<=(-20-5)/10) =  0.8413 – 0.0062 = 0.8351. (from standard normal table)

Now, P(X > x) = 0.95 => P(X<=x) = 1-0.95 = 0.05 (As normal distribution is symmetric and the total probability is 1.

Now, from the standard normal table Z = -1.65 corresponds to a score below which the area under the normal curve is equal to 0.05.

Hence, (x-5)/10 = -1.65 => x = -16.5 + 5  = -11.5

Hence, the value of x=-11.5, for which the area in the right tail of normal curve is 0.95. 

Given that,

N ~ Poisson(µ=10000)

Now, by Gaussian approximation and the central limit theorem it is known that any distribution of the sample mean having a well-defined mean and variance can be approximated to normal distribution.

In this case the approximation will be N ~ normal(10000, sqrt(10000)) = normal(10000, 100)

Hence, by the approximation the value of P(N > 10,200) = 1 – P(N<=10200)

= 1- P(Z<=(10200-10000)/100) = 1 – 0.9772 = 0.0228.

Now, in MATLAB putting Poisson distribution to calculate the CDF of P(N > 10,200) or 1 – P(N<=10200) gives the result 0.0227.

MATLAB code:

1 - cdf('Poisson',10200,10000)  

0.0227

Hence, approximation error of Poisson to normal is  |0.0228 – 0.0227| = 0.0001  

The MLE (maximum likelihood estimate) of the above function is the value of  which maximizes the function L() = f(x1,x2,x3..|). Here, f is the probability density function.

So, L( = (x1/(x2/(x3/…. 

Now, taking logarithm both side, 

Lifetime of an electrical component

Now, the maximizing.

At, max(,

 = 0

  • = 0
  • =
  •  

So, for this pdf  gives the maximum likelihood estimate. 

Given that the sample of data x= x1,x2,….xn follows Poisson distribution with mean λ, and that λ follows exponential distribution with parameter θ.

So, P(X) =

P() = θ e^(-

Hence, posterior probability = Probability of likelihood* Prior probability 

Hence, this is a Gamma distribution with parameters

β = θ + n,  α =  (Proved)

Question 8:

The variables of the yacht.dat file are the following in order.

X1 Residuary resistance per unit weight of displacement, adimensional

V2 Longitudinal position of the center of buoyancy, adimensional

V3 Prismatic coe?cient, adimensional

V4 Length-displacement ratio, adimensional

V5 Beam-draught ratio, adimensional

V6 Length-beam ratio, adimensional

V7 Froude number, adimensional

Now, using fitlm command in MATLAB the variable X1 is fitted with respect to independent variables V2,V3,V4,V5,V6 and V7.

MATLAB command:

% manually load yacht.dat by selecting it from folder

bfm = fitlm(yacht,'X7~V1+V2+V3+V4+V5+V6')

bfm =

Linear regression model:

    X7 ~ 1 + V1 + V2 + V3 + V4 + V5 + V6

Estimated Coefficients:

                   Estimate      SE        tStat        pValue   

    (Intercept)      154.51     32.359       4.775    2.8055e-06

    V1             0.018076    0.44595    0.040534       0.96769

    V2              -301.54     52.185     -5.7783    1.8779e-08

    V3              -9.8484     18.656    -0.52791       0.59795

    V4               7.0168     7.2464     0.96832       0.33366

    V5               7.6548     18.712     0.40908       0.68277

    V6               73.168     5.1483      14.212    1.8803e-35

Number of observations: 309, Error degrees of freedom: 302

Root Mean Squared Error: 11.8

R-squared: 0.402,  Adjusted R-Squared 0.39

F-statistic vs. constant model: 33.9, p-value = 3.44e-31

Hence, the linear regression model is,

X1 = 154.51 + 0.018V1 -301.54V2 -9.848V3 + 7.017V4 +7.655V5 + 73.168V6.

The feval function in MATLAB evaluates the value of X1= Residual resistance per unit weight of displacement, for different values of the variables V1 to V6. 

Here, a generalized linear model is fitted for both red wine ‘quality’ variable and white wine ‘quality’ variable assuming Poisson distribution.

  1. Model fitting for red wine:

MATLAB code with output:

% manually load winequalityred.csv from folder

modelspec = 'quality~fixedacidity +volatileacidity + citricacid + residualsugar + chlorides + freesulfurdioxide + totalsulfurdioxide + density + pH + sulphates + alcohol';

lm1 = fitglm(winequalityred,modelspec,'Distribution','poisson')  

Generalized linear regression model:

    quality ~ [Linear formula with 12 terms in 11 predictors]

    Distribution = Poisson 

Estimated Coefficients:

                           Estimate          SE         tStat       pValue   

    (Intercept)                3.6538         13.67     0.26728      0.78925

    fixedacidity            0.0036583      0.016633     0.21994      0.82592

    volatileacidity           -0.1977       0.08039     -2.4593     0.013921

    citricacid              -0.035923      0.096141    -0.37365      0.70866

    residualsugar           0.0026177      0.009736     0.26887      0.78803

    chlorides                -0.33176       0.27688     -1.1982      0.23084

    freesulfurdioxide      0.00082523     0.0014126     0.58418       0.5591

    totalsulfurdioxide    -0.00061063    0.00047979     -1.2727      0.20312

    density                   -2.1729        13.953    -0.15573      0.87624

    pH                      -0.074826       0.12406    -0.60317       0.5464

    sulphates                 0.15912      0.072618      2.1912     0.028434

    alcohol                   0.04815      0.016999      2.8325    0.0046188  

1599 observations, 1587 error degrees of freedom

Dispersion: 1

Chi^2-statistic vs. constant model: 66.1, p-value = 6.81e-10

White wine model:

MATLAB code with output:

% manually load winequalitywhite.csv from folder

modelspec = 'quality~fixedacidity +volatileacidity + citricacid + residualsugar + chlorides + freesulfurdioxide + totalsulfurdioxide + density + pH + sulphates + alcohol';

lm2 = fitglm(winequalitywhite,modelspec,'Distribution','poisson') 

Generalized linear regression model:

    quality ~ [Linear formula with 12 terms in 11 predictors]

    Distribution = Poisson 

Estimated Coefficients:

                           Estimate          SE         tStat        pValue   

    (Intercept)                28.094        11.144      2.5211      0.011698

    fixedacidity             0.012809      0.011881      1.0781         0.281

    volatileacidity          -0.33456      0.064234     -5.2085    1.9041e-07

    citricacid              0.0025292      0.053278    0.047471       0.96214

    residualsugar            0.014557     0.0043653      3.3347    0.00085393

    chlorides               -0.062667       0.31275    -0.20037       0.84119

    freesulfurdioxide      0.00062244    0.00046312       1.344       0.17894

    totalsulfurdioxide    -3.6945e-05    0.00021042    -0.17558       0.86063

    density                   -27.359        11.298     -2.4215      0.015457

    pH                         0.1235      0.059026      2.0922      0.036417

    sulphates                 0.10875      0.054501      1.9953      0.046011

    alcohol                   0.03036      0.014207       2.137      0.032594  

4898 observations, 4886 error degrees of freedom

Dispersion: 1

Chi^2-statistic vs. constant model: 185, p-value = 1.04e-33

  1. Now, in the white wine model it is clear that the overall p value is very close to zero which is less than considered significance level of 0.05. Hence, the model is appropriate. Now, the p values of the independent variables which are significant are volatileacidity, residualsugar, density, pH and alcohol respectively.

Similarly, the red wine model is a proper fit as overall p value is 6.81e-10 which is less than considered level of significance of 0.05.

Here, the p values of the independent variables which are significant are volatileacidity, sulphates and alcohol.

So, in white wine model there are more significant independent predictor variables than in red wine model.

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2021). Solved Exercises In Probability, Statistics, And Regression Analysis Essay.. Retrieved from https://myassignmenthelp.com/free-samples/enn543-data-analytics-and-optimisation/poisson-distribution.html.

"Solved Exercises In Probability, Statistics, And Regression Analysis Essay.." My Assignment Help, 2021, https://myassignmenthelp.com/free-samples/enn543-data-analytics-and-optimisation/poisson-distribution.html.

My Assignment Help (2021) Solved Exercises In Probability, Statistics, And Regression Analysis Essay. [Online]. Available from: https://myassignmenthelp.com/free-samples/enn543-data-analytics-and-optimisation/poisson-distribution.html
[Accessed 22 December 2024].

My Assignment Help. 'Solved Exercises In Probability, Statistics, And Regression Analysis Essay.' (My Assignment Help, 2021) <https://myassignmenthelp.com/free-samples/enn543-data-analytics-and-optimisation/poisson-distribution.html> accessed 22 December 2024.

My Assignment Help. Solved Exercises In Probability, Statistics, And Regression Analysis Essay. [Internet]. My Assignment Help. 2021 [cited 22 December 2024]. Available from: https://myassignmenthelp.com/free-samples/enn543-data-analytics-and-optimisation/poisson-distribution.html.

Get instant help from 5000+ experts for
question

Writing: Get your essay and assignment written from scratch by PhD expert

Rewriting: Paraphrase or rewrite your friend's essay with similar meaning at reduced cost

Editing: Proofread your work by experts and improve grade at Lowest cost

loader
250 words
Phone no. Missing!

Enter phone no. to receive critical updates and urgent messages !

Attach file

Error goes here

Files Missing!

Please upload all relevant files for quick & complete assistance.

Plagiarism checker
Verify originality of an essay
essay
Generate unique essays in a jiffy
Plagiarism checker
Cite sources with ease
support
close