
Signature
Rich Ulrich, wpilib@pitt.edu
http://www.pitt.edu/~wpilib/index.html
> > I recently ran into this statement
< not very informative nor necessary statement snipped to answer
the stated question>
> > I am not a statistician but this struck me as a bit strange. Can anyone
> > comment on the idea of keeping a non-significant variable in a model in
> > order to match another model?
Variables that are not "statistically significant" are kept NOT for
the reason of matching anything in most of the common usage.
Statistically REDUNDANT (superfluous, unnecessary) variables
are dropped because they not only add nothing to the model
but they may in fact make the model worse, much worse, in
terms of precision and stability.
Once you get away from those redundant variable cases, the
simplest answer to WHY you keep statisticall not-significant
variables is that for many problems, while they are not
statistically significant, they are much better than nothing. :-)
If you drop variables because they are statistically NOT
significant, they you may find, especially for sociological
data, that often you may end up with NO VARIABLE in a
regression equation because you have drop everything. :)
> What makes sense is to keep variables in models because you
> expect them to be meaningful.
That may be true sometimes, but often NOT true for seeking
only FITTING or PREDICTION models.
The "meaningful" idea is one of the common abuses by
social scientists in their misapplication of regression methods.
Variables do not have their unique meanings. In a multiple
regression, the meaning of a variable is its effect IN THE
PRESENCE OF ALL OTHER VARIABLES in the equation.
Therefore, the same variable may have thousands of
different meanings, all depending on which are the OTHER
variables in the equation.
> The too-frequent, common mistake is to drop variables from an
> equation merely because they fail to be "statistically significant"
> in a particular case.
That statement is clearly UNTRUE.
The dropping of variables is when the variables are "statistically
redundant" (or unnecessary) in the presence of other variables
already in the regression model.
> The question of stepwise regression has comments from years
> ago collected in my stats-FAQ.
Most comments I've seen are irrelevant, impertinent, or
technically flawed comments.
The problem in question, and the approach to the solution
of teh problem have very little, if anything, to do with
stepwise regressions.
-- Reef Fish Bob.
John Kane - 29 Sep 2006 16:07 GMT
> > > I recently ran into this statement
>
> < not very informative nor necessary statement snipped to answer
> the stated question>
Nonsense. Context is important :)
> > > I am not a statistician but this struck me as a bit strange. Can anyone
> > > comment on the idea of keeping a non-significant variable in a model in
[quoted text clipped - 20 lines]
> > What makes sense is to keep variables in models because you
> > expect them to be meaningful.
But in the context (that Bob clipped) the intend seems to be to make
the model somehow comparable to another model . This does not seem to
make sense. I can see keeping the variables if you expect them to be
useful when examing another data set particularly if there is a
theoretical reason.
> That may be true sometimes, but often NOT true for seeking
> only FITTING or PREDICTION models.
That was my thought and this was clearly an engineering study intended
for this purpose.
> The "meaningful" idea is one of the common abuses by
> social scientists in their misapplication of regression methods.
And those pesky traffic engineers it appears :)
> Variables do not have their unique meanings. In a multiple
> regression, the meaning of a variable is its effect IN THE
[quoted text clipped - 13 lines]
> redundant" (or unnecessary) in the presence of other variables
> already in the regression model.
My problem is that I cannot see what gain there is to retaining the
variables just to make a comparison against another model. Somehow I
seem to see it as soaking up a bit of variance that might be better
explained by the other variables.
If nothing else leaving a redundent variable in the regression seems to
me to be irresponsible given that the target audience are not likely to
be researchers but either practicing traffic/civil engineers or policy
makers who may not understand the "significance" of an insignificant
variable in a model.
> > The question of stepwise regression has comments from years
> > ago collected in my stats-FAQ.
[quoted text clipped - 7 lines]
>
> -- Reef Fish Bob.
Thanks to both of you for the comments. They have been helpful
John Kane, Kingston ON Canada
dave@autobox.com - 29 Sep 2006 17:08 GMT
> > > > I recently ran into this statement
> >
[quoted text clipped - 87 lines]
> Thanks to both of you for the comments. They have been helpful
> John Kane, Kingston ON Canada
Hello John ...
Oftentimes one is interested in testing the hypothesis that the
coefficients (collectively) are homogenous across groups ... leading to
the Gregory Chow Test ( Princeton University).
A similar problem in Time Series is to test for break points in
parameters i.e. is there a point in time that the coefficients fror an
ARIMA process change significantly.
We have implemented that test in order to test the idea of
non-transient structure ...which leads directly to segmenting the time
series at the identified break point(s) .
Regards
Dave Reilly
http://www.autobox.com
John Kane - 29 Sep 2006 17:25 GMT
> > > > > I recently ran into this statement
> > >
[quoted text clipped - 106 lines]
> Dave Reilly
> http://www.autobox.com
Thanks Dave.
I see what you mean there and that makes sense. However the
researchers seem to have some idea of comparing two models, developed
on the same data set but, if my cursory reading is correct, predicting
different driver behaviour and apparently left the redundent varibles
in to 'facilitate' comparisons.
The study was a very applied one, apparently intended to provide input
to government policy on road design.
Maybe I am suspicious of the faux-3D spreadsheet barplots they used :)
They also seemed to be using stepwise regression to establish the
models, which struck me as a bit dubious.
John Kane, Kingston ON Canada