Each line on this chart shows the probability that we would find our treatment had an effect significant at a given level, depending on the size of our treatment group. But why are there sudden drops in the probability that we'd find a statistically significant effect? And why, for so many sample sizes, does it not appear to matter whether we're seeking an effect significant at the 5% or 10% level?
The answer is that this is the wrong chart to draw, in this situation. If we expected only 1 in 2000 observations from the control group to be a success, and 1 in 500 observations from the treatment group to be a success, a simple application of the definition of significance levels gives the graph above. But besides looking weird, it is misleading. Notice where it implies that we'd have about a 30% chance of concluding our treatment works, with a sample size of about 225? Well, that's the probability we'd get at least one success in 225 trials.
This chart, where each line shows the probability of getting at minimum a given number of successes, depending on the size of the treatment group, is much easier to read, and much more illuminating. Even if a single success in the treatment group would technically be statistically significant, our study would be much more persuasive if we chose a sample size that allowed us to expect to find two or three successes, at a minimum.
Overlaid, the charts are even more useful. They show the probability of getting a certain number of successes at a given sample size, together with the level of statistical significance that would imply. And they answer our questions about the first chart-- the drops come when a particular level of statistical significance requires us to find an additional success, and the times when two levels of significance are equally likely are times when they are requiring us to find equal numbers of successes.
It's weird that you can have statistical significance with only one success!
ReplyDelete