# FIT5197 Statistical Data Modelling

• Course Code: FIT5197
• University: Monash University
• Country: Australia

## Question:

Build a linear regression model using the specific “auto mpg train.csv” provided with the assignment to predict mpg (mile per gallon). The second file “auto mpg test.csv”will be used for evaluation.

There are some missing values listed as “?”. Describe your strategy for treating missing values and update (edit by hand) the file accordingly.

Can you improve your model with different predictors?

Try out some different ratios or products of the better predictor variables. How will you evaluate the different alternative predictors on your existing model (not using the test set)?

There are some missing values listed as “?”. Describe your strategy for treating missing values, but note sometimes it is OK to leave missing value as a separate categorical value (we call this “missing informative”).

Consider a binomial distribution with n=500 and θ=0.001.  Use the appropriate CDF functions in R to compute p(k<10 | n=500, θ=0.001)

(a)  the exact value

(b)  the value according to the Gaussian approximation in lectures

(c)   the value according to the Poisson approximation in lectures

(d)  write down a consise formula for the exact value.

IQ is supposed to Gaussian with a mean of 100 and a standard deviation of 15.  At a high-school reunion, where everyone attends, 2 of your classmates out of 40 claim to have IQs greater than 150.  What is the probability that 2 or more would have an IQ greater than 140.  Represent your solution as an expression of θ=p(IQ>140) and give θ.

