Exercise - The Effect of Collinear Variables on MLR Models
Strongly correlated variables cause MLR models to become unstable. In
order to show this, try to introduce some collinear variables into the
data set BOILPTS and calculate a
multiple regression model with and without the collinear variables. To
be specific,
-
Create two copies of the variable "RandicToz" and add a small amount of
noise to both copies (2 %).
-
Calculate an MLR model for the boiling point, using the original "RandicToz"
variable and one of the noisy variables. Save the protocol in a file.
-
Calculate another MLR model by using the other noisy variable, instead
of the first one. Again, save the protocol file.
-
Create two copies of the variable "JHET" and add a small amount of noise
to both copies (2 %).
-
Calculate a third MLR model for the boiling point, using the original "RandicToz"
variable and one of the noisy JHET variables. Save the protocol in a file.
-
Finally, calculate a fourth MLR model for the boiling point, using the
original "RandicToz" variable and the other noisy JHET variable. Save the
protocol in a file.
-
Compare the four protocols. Look at the goodness of fit, the F values and
the regression coefficients. Do you see the difference between models 1
& 2 and models 3 & 4?
Now go to the DataLab
and carry out the above-mentioned steps. In addition to comparing the regression
parameters, you should also have a close look at the estimated values.
You will see that the estimated values will differ quite a lot for the
first two data sets, but only a little for the 3rd and the 4th sets.
|