See All Boards Subscribe To Thread
Statistics Public Forums
Regression: Adjusted R Squared Extremes
dgnscr Jan 10 ReplyFlag
I am reading section on adjusted R squared, and I think understand the spirit of how to interpret it.

However, when I look at the formula I can't help but extrapolate it to extremes.

What happens when I have more predictors than sample size - it seems like Adjusted r Squared starts to grow uncharacteristically? Are there unspoken assumptions on suitable n and k limits?
bahhmmbgg Jan 10 ReplyFlag
bahhmmbgg Jan 10 ReplyFlag
Indeed there is unusual behavior for the extremes of n and k regarding adjusted-r2, where
adjusted-r2 = 1 - (1-r2) * (n-1) / (n-k-1)
bahhmmbgg Jan 10 ReplyFlag
The n-k-1 term -- think of it as n-(k+1) -- is the degrees of freedom:
n = sample size
k = parameters in the model, +1 for the intercept parameter
bahhmmbgg Jan 10 ReplyFlag
"Usual"/"Positive" Case: The degrees of freedom n-(k+1) is positive
Adjusted-r2 has its usual interpretation of increasing with an added parameter only when the increase in r2 is greater than that expected by chance
bahhmmbgg Jan 10 ReplyFlag
"Zero" Case: The degrees of freedom n-(k+1) is zero
Always for this case you can perfectly model the samples using the parameters+intercept, meaning r2 = 1 and our equation becomes adjusted-r2 = 1 - 0*(n-1)/0
You could interpret this as adjusted-r2 = undefined, or better I think interpret it as adjusted-r2 = 1
bahhmmbgg Jan 10 ReplyFlag
"Negative" Case: The degrees of freedom n-(k+1) is negative
The model has more parameters than samples, and this means two things:
1) There are multiple solutions (parameter coefficients) possible
2) Always r2 = 1 (perfect fit)
This means that (1-r2) in the calculation of adjusted-r2 is 0, so adjusted-r2=1. Adjusted-r2 will never become larger than 1, because as soon as that (n-k-1) term becomes negative, the (1-r2) term becomes 0. I think this addresses the main part of your question: yes, the equation for adjusted-r2 would seem to allow it to become very largely positive, but when used for multiple regression it never goes above 1.
bahhmmbgg Jan 10 ReplyFlag
An example: You want to estimate height and have 3 people (n=3) for your model. Your other measurements are weight, right arm length, left arm length, right leg length, left leg length, right thumb length, left thumb length, and heart rate (k=8). If we use all the parameters for the model, we have n-k-1 = 3-8-1 = -6.
You could achieve a perfect fit (a plane connecting these 3 points) by plotting height on the z-axis, weight on the y-axis, right arm length on the x-axis, and using your intercept parameter. This is equivalent to having all the coefficients be zero except for the intercept and the coefficients for weight and right arm length. Or you could have your y-axis and x-axis be left thumb length and heart rate. Or right thumb length and left arm length. Multiple solutions, and always r2=1.
As this model is built, when the degrees of freedom is positive (3 people, just 1 parameter like weight, and the intercept: n-k-1 = 3-1-1 = 1), r2 will start off somewhere less than 1, and adjusted-r2 will be a bit less than that. As soon as you hit zero degrees of freedom (3 people, 2 parameters like weight and heart rate, and the intercept: 3-2-1 = 0), r2 will become 1, and the adjusted-r2 will now become 1 and stay there, no matter how many more parameters you have. Thus, based on the principle of optimizing adjusted-r2, we should make our model using 2 parameters and the intercept (when adjusted-r2 first hits that plateau at 1), although there are multiple solutions.

Your User ID will appear with your posts.