Problem 1 (12 pts) – R is required. Consider the data in exam_data_SP20.csv. Load in this data using the read.csv() function using the proper path to the file.
The data concerns fruit flies. The variables are thorax which is a measurement of thorax length, longevity the lifetime of the fly in days, and activity which describes the activity group for the fly. The groups are: isolated = fly kept solitary, one = fly kept with one pregnant fruitfly, many = fly kept with eight pregnant fruitflies, low = fly kept with one virgin fruitfly, high = fly kept with eight virgin fruitflies.
i. Determine the average lifespan for flies in the different activity groups.
ii. Make a plot of longevity versus thorax length. Use different symbols and colors for the different activity groups (whatever you like, just make sure they are visible).
An experiment was performed in Sweden in 1962 to assess the effect of a speed limit on the motorway accident rate. The experiment was conducted on 92 days in each year, matched so that day j in 1962 was comparable to day j in 1961. On some days the speed limit was in effect and enforced, while on other days there was no speed limit and cars tended to be driven faster. The speed limit days tended to be in contiguous blocks.
Load the MASS library in R.
Within this library is a dataset called Traffic (you can type help(Traffic) to read more about it). Typing: data(Traffic) makes it show up in your environment, but it’s always there simply by typing Traffic after the MASS library has been loaded..
From the Traffic dataset, Make a side-by-side boxplot comparing the traffic accident counts for the limit cases when there was a speed limit being enforced versus when there was not. Overlay side-by-side stripcharts (with a jitter) of the counts for the different limit cases on top of the boxplot. Use different colors for the boxplots and stipcharts (whatever you like so long as we can see the plot elements).
Does enforcing a speed limit appear to reduce the number of traffic accidents?
A survey of engineering firms reveals that 80% have their own mainframe computer (M), 10% anticipate purchasing a mainframe computer in the near future (B), and 5% have a mainframe computer and anticipates buying another in the near future.
Find the probability that a randomly selected firm has a mainframe computer or anticipates purchasing one in the near future.
Find the probability that a randomly selected firm does not have a mainframe computer and does not anticipate purchasing one in the near future.
Find the probability that a randomly selected firm anticipates purchasing a mainframe computer given that it does not currently have one.
In a medical clinic, doctors A, B, and C receive 20% 30% and 50% of the incoming new patients respectively. If a new patient sees doctor A, the doctor will correctly diagnose the problem afflicting the patient 50% of the time. For doctors B and C the correct diagnosis percentages are 60% and 70% respectively.
Assuming that a patient got a correct diagnosis, what is the probability that the patient saw doctor C?
In an inspection of automobiles in Los Angeles, 60% of all automobiles had emissions that do not meet EPA regulations.
Suppose that we sample 10 automobiles at random, then do the following:
What is the probability that all 10 failed the inspection? What is the probability that all 10 passed the inspection?
What is the probability that six or more failed the inspection?
Suppose instead of 60% of automobiles failing to meetin EPA standards, the value was instead 30%. If we now have a larger
sample with n = 1000, determine P(Y ≤ 100). You MAY NOT use pbinom() to answer this question.
Once again, assuming the rate of failures to be 30%.
What is the probability that it takes more than 4 cars until we find an automobile that fails inspection? On average, how many automobiles do you expect to inspect before the first automobile fails inspection?
The pH level, a measure of acidity, is important in studies of acid rain. For a certain Florida lake, baseline measurements of acidity are made so that any changes caused by acid rain can be noted. The pH for water samples from the lake is a random variable X, with probability density function
f(x) = (k(7 − x) 2 for 5 ≤ x ≤ 7 0 elsewhere
Find the value of k such that f(x) is a valid probability density function for X, then find the cumulative distribution function F(x) for X. Clearly show both the final f(x) and F(x) in your final answer.
Find the probability that the pH of a water sample from this lake will be less than 5.5 given that it is known to be less than 6.
What is the 70th percentile of the pH distribution?
Suppose H(X) = 3X2 − X + 2. Find E[H(X)].
Let X denote the lag time in manufacturing queue at a particular factory. That is, X denotes the difference between the time that a person requested a certain product to be made and the time at which the manufacuring process for that productbegins. Assume that X is normally distributed with mean 15 hours and variance 25.
Find the probability that the lag time for a randomly selected product is fewer than 3 hours. What proportion of the time would the lag time exceed 30 hours for products that are ordered?
The factory is very unhappy with “excessive lag times”, that is, lag times greater than the 82nd percentile. Which lag time does this correspond to?
What is the probability that the number of orders it takes to see two excessive lag times is equal to 10 or less?
Suppose the factory is striving for lag times less than 12 hours. What is the probability that the lag time is fewer than 12 hours for 5 out of the next 12 orders.
The number of traffic accidents at a certain intersection is thought to be well modeled by a Poisson process with a mean of 3 accidents per year.
What is the probability that the number of accidents is greater than 4 in a given year?
Find the mean waiting time between accidents. Find the standard deviation of the waiting times between accidents.
If no accidents have occurred within the last seven months, what is the probability than an accident will occur within the next month
Let X denote the temperature (?C) and let Y denote the time in minutes that it takes for the diesel engine on a Semi-Truck to get ready to start using once it’s been turned on. (Diesel engines, particularly in large trucks, work better when they are “warmed” up as opposed to gasoline/petrol engines that require no warm up – seriously, there is no need to pre-warm a regular gas-engine car even in winter.)
Assume that the joint density for (X, Y ) is given by fXY (x, y) = c(4x + 2y + 1) for 0 ≤ x ≤ 40, 0 ≤ y ≤ 2
Find the value of c that makes this a valid probability density. Then find the marginal densities for X and Y .
Are the variables independent? This is not a simple yes or no question, you must prove why or why not using the statistical definition.
Find the probability that on a randomly selected day the air temperature will exceed 20? C and it will take at least 1 minute for the Semi-Truck to be ready to use.
Find the probability that on a randomly selected day it will take at least 30 seconds for the Semi-Truck to be ready to use.
What is the average air temperature?