import numpy as np
= np.array([12.1, 9.8, 15.2, 11.5, 7.6,
wt 6.9, 10.9, 8.4, 10.1, 13.2,
10.2, 10.8, 8.0 , 11.3, 9.3,
8.8, 8.9, 10.7, 10.5, 16.6])
= np.array([ 9.9, 9.2, 11.7, 11.5, 17.6,
mut 17.1, 8.0, 12.3, 12.3, 14.6,
15.2, 7.4, 10.0, 13.0, 13.3,
10.6, 12.2, 14.3, 14.5, 15.4])
74 Dance of the p-values
A researcher is performing a behavioral assay in which she times the amount of time it takes a mouse to traverse a tube and tap a lever to get a sugar pellet. She does this for a wild type mouse, and also for a mouse that is absent a gene for an olfactory receptor.
She gets the following results, with all units in seconds.
a) Do whatever EDA you think appropriate. (Remember, always start with EDA!)
b) Compute an estimate for the mean for mutant and wild type. Compute a confidence interval for each. May a display.
c) Perform a NHST of your choice and report a p-value.
d) You are momentarily omniscient. Let \(x_i^\mathrm{wt}\) be the measured time it takes for wild type mouse \(i\) to complete the task and \(x_i^\mathrm{mut}\) be the measured time it takes for mutant mouse \(i\) to complete the task. You know that the true generative distribution is
\[\begin{align} &x_\mathrm{wt} \sim \text{Norm}(10, 3),\\[1em] &x_\mathrm{mut} \sim \text{Norm}(13, 4). \end{align} \]
Generate 1000 new data sets from the true generative distribution. That is, do the following 1,000 times: Create an array of 20 wild type values drawn from Norm(10, 3) and an array of 20 mutant values drawn from Norm(13, 4). For each data set, perform the same hypothesis test you did in part (c). Make a plot of all the p-values you get. What do you think about the result?