Statistics Public Forums

Confidence Intervals and Hypothesis Testing: How can I tell if the data are too skewed for a t test?

YourUseridAppearHere
Mar 15, 2013
ReplyFlag

The guidelines for what is too strongly skewed are pretty confusing.

David
Mar 15, 2013
ReplyFlag

First, it is worth noting that this is not an easy topic, but it is still very important to think about this type of question! I'm going to also add one related topic to consider: outliers. Outliers are the "canary" most useful for identifying.

For a t test, some rough rules of thumb:

Sample Size < 30

If you have any obvious outliers AND your distribution has no observations further than about 3 standard deviations from the mean (if your sample is under 15, then 2.5 standard deviations), then you can move forward with a t test (or t confidence interval).

30 < (or =) Sample Size < 100

If there are some outliers but they aren't more than about 4 standard deviations from the mean, you're good to go. (If on the lower end of the sample size range, be a little more strict, e.g. 3.5 standard deviations; if on the upper side, be a little more liberal, e.g. 4.5 standard deviations.)

100 < (or =) Sample Size < 500

As long as the observations are within about 5-7 standard deviations of the mean, you should be okay.

Sample Size > (or =) 500

You're probably fine unless you've got some very wild outliers.

As you might imagine, the suggested guidelines above are an oversimplification. That said, they are still a useful reference.

For a t test, some rough rules of thumb:

Sample Size < 30

If you have any obvious outliers AND your distribution has no observations further than about 3 standard deviations from the mean (if your sample is under 15, then 2.5 standard deviations), then you can move forward with a t test (or t confidence interval).

30 < (or =) Sample Size < 100

If there are some outliers but they aren't more than about 4 standard deviations from the mean, you're good to go. (If on the lower end of the sample size range, be a little more strict, e.g. 3.5 standard deviations; if on the upper side, be a little more liberal, e.g. 4.5 standard deviations.)

100 < (or =) Sample Size < 500

As long as the observations are within about 5-7 standard deviations of the mean, you should be okay.

Sample Size > (or =) 500

You're probably fine unless you've got some very wild outliers.

As you might imagine, the suggested guidelines above are an oversimplification. That said, they are still a useful reference.

David
Mar 15, 2013
ReplyFlag

Common follow-up question: if my data are not reasonable for the t test, what should I do?

If you are not using the results for anything too important and the rules of thumb are just violated "a little", then move forward but note the problems with the underlying assumptions for the test with your results.

If what you're doing is important, e.g. making a business decision based on the outcome, then don't move forward with using the t distribution for your confidence interval or hypothesis test. Instead, consider one of the following options:

1. Collect more data!

2. Learn a new statistical method that would be appropriate. The standard "go to" method here is the bootstrap, however, you'll need to get an advanced book to do this right... lots of people (and books) don't recognize the underlying assumptions for the method. Reference book I'd recommend: All of Nonparametric Statistics, though not for the mathematically squeamish... this is far from easy stuff. You'll want to look for the pivot-interval (or something analogous to it).

3. Talk to a statistician who can utilize a more advanced technique that would be appropriate for your data. If you're at a university, there might be free or subsidized consulting services on campus.

If you are not using the results for anything too important and the rules of thumb are just violated "a little", then move forward but note the problems with the underlying assumptions for the test with your results.

If what you're doing is important, e.g. making a business decision based on the outcome, then don't move forward with using the t distribution for your confidence interval or hypothesis test. Instead, consider one of the following options:

1. Collect more data!

2. Learn a new statistical method that would be appropriate. The standard "go to" method here is the bootstrap, however, you'll need to get an advanced book to do this right... lots of people (and books) don't recognize the underlying assumptions for the method. Reference book I'd recommend: All of Nonparametric Statistics, though not for the mathematically squeamish... this is far from easy stuff. You'll want to look for the pivot-interval (or something analogous to it).

3. Talk to a statistician who can utilize a more advanced technique that would be appropriate for your data. If you're at a university, there might be free or subsidized consulting services on campus.

umakegoodcookies
Aug 03, 2016
ReplyFlag

The most important thing to note is that you're using the data to tell you something about the population. The assumptions are population assumptions, not data ones. So there's more to it than just looking at the data. Perhaps you have a small sample and can't tell whether this kind of data meets assumptions. If you know of other sources where there are larger samples or there are published papers giving information about the distribution of this kind of data then you can (should) proceed regardless of what your data look like.

An additional thing you can look at when you find out what the data should be is discuss representativeness of your data. That goes toward interpretation and is the most fundamental assumption. If you've got information that says the data should have one property, like equal variance, can assess it and say something about how representative your sample is.

An additional thing you can look at when you find out what the data should be is discuss representativeness of your data. That goes toward interpretation and is the most fundamental assumption. If you've got information that says the data should have one property, like equal variance, can assess it and say something about how representative your sample is.

To add a comment or subscribe, please sign in or register.

Your User ID will appear with your posts.