Hi folks,
I'm conducting a cross-validation study in which I need to select one
model among a few based on selection criteria.
The selection criteria that I can think of is
1) Root Mean Squared Error of Diffrence (RMSD) between predictor and
observation and
2) R-squared value between predictor and observation.
Would any of you give me a guidance on what I can choose as selection
criterion and
what would be the consequence for choosing one over the other.
Thank you in advance.
Yoon
RichUlrich - 03 Oct 2008 00:07 GMT
>Hi folks,
>
[quoted text clipped - 10 lines]
>Would any of you give me a guidance on what I can choose as selection
>criterion and
Assuming that you are fitting to sample B an
equation that is derived from sample A, are
those ever going to be different?
>what would be the consequence for choosing one over the other.
>
>Thank you in advance.
>
>Yoon

Signature
Rich Ulrich
Ray Koopman - 03 Oct 2008 23:14 GMT
On Oct 1, 11:18 am, "yoon...@gmail.com" <yoon...@gmail.com> wrote:
> Hi folks,
>
[quoted text clipped - 16 lines]
>
> Yoon
Certainly *not* r^2, ever, because it treats positive and negative
correlations as equally good. Think r, not r^2.
In general, you should always use RMSD unless you are willing to
ignore bias and scale errors in the predicted values, which is what
r does.
Greg Heath - 06 Oct 2008 12:21 GMT
> On Oct 1, 11:18 am, "yoon...@gmail.com" <yoon...@gmail.com> wrote:
> > I'm conducting a cross-validation study in which I need to select one
[quoted text clipped - 4 lines]
> > 1) Root Mean Squared Error of Diffrence (RMSD) between predictor and
> > observation and
also known as jusr RMSE
> > 2) R-squared value between predictor and observation.
>
[quoted text clipped - 5 lines]
> Certainly *not* r^2, ever, because it treats positive and negative
> correlations as equally good. Think r, not r^2.
Usually R^2 is interpreted in terms of explained variance and
the sign of R is ignored. Clearly, the sign doesn't help in the OPs
task of model selection.
> In general, you should always use RMSD unless you are willing to
> ignore bias and scale errors in the predicted values, which is what
> r does
I don't see the big deal. Given the variance of y,
R^2 = 1- SSE/TSS = 1-(N*MSE)/((N-1)*var(y))
Therefore,
RMSE = sqrt( (1-R^2)*(N-1)*var(y)/N )..
My regressions are generally nonlinear. My choice
of summary statistics are normalized mean-square-error
NMSE = MSE/MSE0 where MSE0 = (N-1)*var(y)/N),
coefficient of determination R^2 = 1-NMSE and the
correlation coefficient r which for nonlinear models,
is not the same as R.
As far as selecting variables, either NMSE or R^2
can be used.
Hope this helps.
Greg
Greg Heath - 05 Oct 2008 21:20 GMT
On Oct 1, 2:18 pm, "yoon...@gmail.com" <yoon...@gmail.com> wrote:
> Hi folks,
>
[quoted text clipped - 5 lines]
> 1) Root Mean Squared Error of Diffrence (RMSD) between predictor and
> observation and
Since error is the difference, this is RMSE = sqrt(MSE)
> 2) R-squared value between predictor and observation.
R^2 = 1 - (SSE/TSS) = 1 - (MSE/MSE0)
where MSE0 = (N-1)*var(y)/N is MSE for the model yhat = mean(y).
> Would any of you give me a guidance on what I can choose as selection
> criterion and
>
> what would be the consequence for choosing one over the other.
Take your pick. Given var(y), the transformation is one-to-one.
Hope this helps.
Greg
RichUlrich - 05 Oct 2008 21:52 GMT
>On Oct 1, 2:18 pm, "yoon...@gmail.com" <yoon...@gmail.com> wrote:
>> Hi folks,
[quoted text clipped - 23 lines]
>
>Hope this helps.
I think that the language of the question is seductively misleading.
"RMSD" is what you refer to the product of a regression, and
yet, "cross-validation" is better used as a term for checking the
fit of one equation in an independent sample.
As Ray has posted, the Differences are better. For one thing, you
can use it in either case. For another, it accounts for bias (for a
continuous prediction). Also, for fitting, you don't have to worry
about the d.f. loss in fitting -- which can cause the "fitted" RMSE
to worsen with an extra predictor, even though the R-squared increases
with every predictor.

Signature
Rich Ulrich
Greg Heath - 06 Oct 2008 11:29 GMT
On Oct 1, 2:18 pm, "yoon...@gmail.com" <yoon...@gmail.com> wrote:
> Hi folks,
>
> I'm conducting a cross-validation study in which I need to select one
> model among a few based on selection criteria.
Cross-validation implies averaging over repeated sample splits into
training and testing subsets. Is this what you mean? Or are you
just emphasizing that the test set is an independent sample?
Either way, consideration of adjusted R^2 is unecessary.
Hope this helps.
Greg
> The selection criteria that I can think of is
>
[quoted text clipped - 11 lines]
>
> Yoon