Confidence Intervals and Hypothesis Testing: Wording on page 175 of the text

umakegoodcookies
Aug 03, 2016
First, I accidentally posted this in the updates section and if someone who has more privileges than me could delete that it would be very nice.

On page 175 the book goes into what is meant by "95% confidence" implying that the "95%" and "confidence" in the name "95% confidence interval" describe a state of mind regarding a specific interval. It's a very problematic area. I'd prefer an alternate wording.

This is the statement I'm concerned about, "If the interval spreads out 2 standard errors from the point estimate, we can be roughly 95% confident that we have captured the true parameter." I'd prefer, "If the interval spreads out 2 standard errors from the point estimate, we can expect that approximately 95% of the time we will be correct when we state that we have captured the true parameter in that interval" (an amazingly high level of accuracy for a decision in statistics). The latter statement is both correct and needs no further definition. It also makes the following paragraph easier to write.

I do like very much that you go into things the CI does not mean later on but you're also opening a bit of a can of worms because it depends on a very specific interpretation of the statement that you're XX% confident and many statisticians do not endorse it. I always have my students avoid saying such things because you really only obtain that 95% confidence if, each time, you decide the interval contains the value. This is similar to how you only get XX% Type II errors when you decide effects are significant or not based on a cutoff. You don't then turn those errors into statements of confidence, or lack thereof.

The "95% confidence" is connected only to the name of the interval... not your personal feelings of confidence in the interval. *It might be allowed to be connected to your personal feeling about the interval generating procedure but that's not the same as your confidence in a particular interval.

David
Aug 04, 2016
Per your request, I've removed your two comments from the other thread.

I agree your statement is correct and accurate, and indeed we follow-up with a similar explanation immediately following the quoted sentence. We also provide a figure to help further clarify.

My concern about eliminating that first sentence about "95% confident" is that I think it has practical value in allowing a reader's mind to momentarily wander to get some intuitive guess at what 95% confident means. Basically, I think that the content leading up to that sentence (in conjunction with the sentence itself) ensures a reader's mind will wander in an appropriate direction for interpreting the *practical* meaning of 95% confident, even if they won't immediately know the technical explanation. That's where the content immediately following that sentence comes in: it firms up the technical side, which then also (should) support the reader's original leanings on the meaning of "95% confident".

If we led with the definitional explanation, then my belief is that we're not leveraging the reader's thinking and are instead only instilling facts without practical understanding. A immediate definitional approach certainly works for some readers, and so I'm just affirming my view that the current approach will have a wider reach. You certainly may disagree!

Best,

David

umakegoodcookies
Aug 04, 2016
I'm glad we've opened this discussion. (in what's below I'm using "you" to refer to authors of the book)

It's of course perfectly appropriate to discuss what the 95% confidence interval means. Further, I think it's handled very well in the text (better than most). I'm most concerned in your discussion of how one reports their CI statistics and this is the seed of that. I believe in that case you do make an error. It looks like you're suggesting in 4.2.5 to report an individual CI in a paper and equating confidence in the CI confidence in the procedure. Those are most definitely not the same thing. Unless each time I report that the CI captures the true value I don't get the opportunity to have 95% confidence because my long run frequency can't come out. If I'm reporting each individual CI as something I'm only 95% confident in I'm not sure what that can possibly mean because that seems to reduce my confidence in the procedure. Don't I have to randomly now say some don't contain the interval? It's just not a sensible thing to say unless someone has been given a very strict definition of what those words mean. So the only people who would correctly interpret that statement are the ones who read the book, or something similar. That's not something I'm happy with as a general CI reporting recommendation.

If the phrase "95% confident" is used to really explain my confidence in the procedure and I'm concerned my audience needs a stats lesson in the results section (something you do not go into at all for hypothesis testing) then the correct reporting would not be "I'm 95% confident that the population parameter is between...", but you could say, "The population parameter is between X1 and X2 based on a procedure that yields a correct result 95% of the time." Or just leave that last part off if you feel the results section stats lesson is unnecessary. In the general case, the most appropriate thing would be to say, "The population parameter is in the 95% confidence interval [X1, X2]."

David
Aug 05, 2016
The goal in defining what "confident" means in statistical terminology is to ensure that readers can learn to use that type of language. I wouldn't introduce a term if I thought it's use should be forbidden. It's essentially an expansion of the reader's vocabulary, like other terms introduced in the textbook, e.g. p-value, and we expect readers to make use of that new vocabulary.

Note that I'd distinguish the discussion here from questions where a reader was asked to explain what "confident" means. Or if a reader was asked to provide an explanation of a confidence interval that anyone can understand, then I'd expect an answer that doesn't use statistical terminology.

umakegoodcookies
Aug 05, 2016
It sounds to me like we're pretty much in agreement then. So perhaps there should be a caution on how you report a CI in a results section. A naive reader would take your use of "95% confident" and describe their results that way for publication.

David
Aug 06, 2016
There may be room for exercises or a classroom worksheet on this topic. I think the concern is about basic communication skills: use terms the intended audience can understand, which applies to any context, not just statistics. This could be reminded through an exercise or in-class activity that doesn't distract or take the text itself down a tangential topic while still touching on the importance of communication skills.

