Statistics Public Forums

Regression: Adjusted R Squared Extremes

dgnscr
Jan 10, 2017
ReplyFlag

I am reading section on adjusted R squared, and I think understand the spirit of how to interpret it.

However, when I look at the formula I can't help but extrapolate it to extremes.

What happens when I have more predictors than sample size - it seems like Adjusted r Squared starts to grow uncharacteristically? Are there unspoken assumptions on suitable n and k limits?

However, when I look at the formula I can't help but extrapolate it to extremes.

What happens when I have more predictors than sample size - it seems like Adjusted r Squared starts to grow uncharacteristically? Are there unspoken assumptions on suitable n and k limits?

bahhmmbgg
Jan 10, 2017
ReplyFlag

test

bahhmmbgg
Jan 10, 2017
ReplyFlag

Indeed there is unusual behavior for the extremes of n and k regarding adjusted-r2, where

adjusted-r2 = 1 - (1-r2) * (n-1) / (n-k-1)

adjusted-r2 = 1 - (1-r2) * (n-1) / (n-k-1)

bahhmmbgg
Jan 10, 2017
ReplyFlag

The n-k-1 term -- think of it as n-(k+1) -- is the degrees of freedom:

n = sample size

k = parameters in the model, +1 for the intercept parameter

n = sample size

k = parameters in the model, +1 for the intercept parameter

bahhmmbgg
Jan 10, 2017
ReplyFlag

"Usual"/"Positive" Case: The degrees of freedom n-(k+1) is positive

Adjusted-r2 has its usual interpretation of increasing with an added parameter only when the increase in r2 is greater than that expected by chance

Adjusted-r2 has its usual interpretation of increasing with an added parameter only when the increase in r2 is greater than that expected by chance

bahhmmbgg
Jan 10, 2017
ReplyFlag

"Zero" Case: The degrees of freedom n-(k+1) is zero

Always for this case you can perfectly model the samples using the parameters+intercept, meaning r2 = 1 and our equation becomes adjusted-r2 = 1 - 0*(n-1)/0

You could interpret this as adjusted-r2 = undefined, or better I think interpret it as adjusted-r2 = 1

Always for this case you can perfectly model the samples using the parameters+intercept, meaning r2 = 1 and our equation becomes adjusted-r2 = 1 - 0*(n-1)/0

You could interpret this as adjusted-r2 = undefined, or better I think interpret it as adjusted-r2 = 1

bahhmmbgg
Jan 10, 2017
ReplyFlag

"Negative" Case: The degrees of freedom n-(k+1) is negative

The model has more parameters than samples, and this means two things:

1) There are multiple solutions (parameter coefficients) possible

2) Always r2 = 1 (perfect fit)

This means that (1-r2) in the calculation of adjusted-r2 is 0, so adjusted-r2=1. Adjusted-r2 will never become larger than 1, because as soon as that (n-k-1) term becomes negative, the (1-r2) term becomes 0. I think this addresses the main part of your question: yes, the equation for adjusted-r2 would seem to allow it to become very largely positive, but when used for multiple regression it never goes above 1.

The model has more parameters than samples, and this means two things:

1) There are multiple solutions (parameter coefficients) possible

2) Always r2 = 1 (perfect fit)

This means that (1-r2) in the calculation of adjusted-r2 is 0, so adjusted-r2=1. Adjusted-r2 will never become larger than 1, because as soon as that (n-k-1) term becomes negative, the (1-r2) term becomes 0. I think this addresses the main part of your question: yes, the equation for adjusted-r2 would seem to allow it to become very largely positive, but when used for multiple regression it never goes above 1.

bahhmmbgg
Jan 10, 2017
ReplyFlag

An example: You want to estimate height and have 3 people (n=3) for your model. Your other measurements are weight, right arm length, left arm length, right leg length, left leg length, right thumb length, left thumb length, and heart rate (k=8). If we use all the parameters for the model, we have n-k-1 = 3-8-1 = -6.

You could achieve a perfect fit (a plane connecting these 3 points) by plotting height on the z-axis, weight on the y-axis, right arm length on the x-axis, and using your intercept parameter. This is equivalent to having all the coefficients be zero except for the intercept and the coefficients for weight and right arm length. Or you could have your y-axis and x-axis be left thumb length and heart rate. Or right thumb length and left arm length. Multiple solutions, and always r2=1.

As this model is built, when the degrees of freedom is positive (3 people, just 1 parameter like weight, and the intercept: n-k-1 = 3-1-1 = 1), r2 will start off somewhere less than 1, and adjusted-r2 will be a bit less than that. As soon as you hit zero degrees of freedom (3 people, 2 parameters like weight and heart rate, and the intercept: 3-2-1 = 0), r2 will become 1, and the adjusted-r2 will now become 1 and stay there, no matter how many more parameters you have. Thus, based on the principle of optimizing adjusted-r2, we should make our model using 2 parameters and the intercept (when adjusted-r2 first hits that plateau at 1), although there are multiple solutions.

You could achieve a perfect fit (a plane connecting these 3 points) by plotting height on the z-axis, weight on the y-axis, right arm length on the x-axis, and using your intercept parameter. This is equivalent to having all the coefficients be zero except for the intercept and the coefficients for weight and right arm length. Or you could have your y-axis and x-axis be left thumb length and heart rate. Or right thumb length and left arm length. Multiple solutions, and always r2=1.

As this model is built, when the degrees of freedom is positive (3 people, just 1 parameter like weight, and the intercept: n-k-1 = 3-1-1 = 1), r2 will start off somewhere less than 1, and adjusted-r2 will be a bit less than that. As soon as you hit zero degrees of freedom (3 people, 2 parameters like weight and heart rate, and the intercept: 3-2-1 = 0), r2 will become 1, and the adjusted-r2 will now become 1 and stay there, no matter how many more parameters you have. Thus, based on the principle of optimizing adjusted-r2, we should make our model using 2 parameters and the intercept (when adjusted-r2 first hits that plateau at 1), although there are multiple solutions.

To add a comment or subscribe, please sign in or register.

Your User ID will appear with your posts.