Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Mathematics
General TopicsResearchOperations ResearchStatisticsMathematical LogicNumerical AnalysisUndergraduate MathAlgebra HelpRecreational Math
Math Software
MapleMathematicaMATLABScilabSASSPSS

Math Forum / Mathematics / Statistics / October 2008



Tip: Looking for answers? Try searching our database.

Root Mean Squared Difference (RMSPD) vs R-squared in a     cross-validation

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
yoonsup@gmail.com - 01 Oct 2008 19:18 GMT
Hi folks,

I'm conducting a cross-validation study in which I need to select one
model among a few based on selection criteria.

The selection criteria that I can think of is

1) Root Mean Squared Error of Diffrence (RMSD) between predictor and
observation and

2) R-squared value between predictor and observation.

Would any of you give me a guidance on what I can choose as selection
criterion and

what would be the consequence for choosing one over the other.

Thank you in advance.

Yoon
RichUlrich - 03 Oct 2008 00:07 GMT
>Hi folks,
>
[quoted text clipped - 10 lines]
>Would any of you give me a guidance on what I can choose as selection
>criterion and

Assuming that you are fitting to sample B an
equation that is derived from sample A, are
those ever going to be different?

>what would be the consequence for choosing one over the other.
>
>Thank you in advance.
>
>Yoon

Signature

Rich Ulrich

Ray Koopman - 03 Oct 2008 23:14 GMT
On Oct 1, 11:18 am, "yoon...@gmail.com" <yoon...@gmail.com> wrote:
> Hi folks,
>
[quoted text clipped - 16 lines]
>
> Yoon

Certainly *not* r^2, ever, because it treats positive and negative
correlations as equally good. Think r, not r^2.

In general, you should always use RMSD unless you are willing to
ignore bias and scale errors in the predicted values, which is what
r does.
Greg Heath - 06 Oct 2008 12:21 GMT
> On Oct 1, 11:18 am, "yoon...@gmail.com" <yoon...@gmail.com> wrote:
> > I'm conducting a cross-validation study in which I need to select one
[quoted text clipped - 4 lines]
> > 1) Root Mean Squared Error of Diffrence (RMSD) between predictor and
> > observation and

also known as jusr RMSE

> > 2) R-squared value between predictor and observation.
>
[quoted text clipped - 5 lines]
> Certainly *not* r^2, ever, because it treats positive and negative
> correlations as equally good. Think r, not r^2.

Usually R^2 is interpreted in terms of explained variance and
the sign of R is ignored. Clearly, the sign doesn't help in the OPs
task of model selection.

> In general, you should always use RMSD unless you are willing to
> ignore bias and scale errors in the predicted values, which is what
> r does

I don't see the big deal. Given the variance of y,

R^2 = 1- SSE/TSS = 1-(N*MSE)/((N-1)*var(y))

Therefore,

RMSE = sqrt( (1-R^2)*(N-1)*var(y)/N )..

My regressions are generally nonlinear. My choice
of summary statistics are normalized mean-square-error
NMSE = MSE/MSE0  where  MSE0 = (N-1)*var(y)/N),
coefficient of determination R^2 = 1-NMSE and the
correlation coefficient  r which for nonlinear models,
is not the same as R.

As far as selecting variables, either NMSE or R^2
can be used.

Hope this helps.

Greg
Greg Heath - 05 Oct 2008 21:20 GMT
On Oct 1, 2:18 pm, "yoon...@gmail.com" <yoon...@gmail.com> wrote:
> Hi folks,
>
[quoted text clipped - 5 lines]
> 1) Root Mean Squared Error of Diffrence (RMSD) between predictor and
> observation and

Since error is the difference, this is RMSE = sqrt(MSE)

> 2) R-squared value between predictor and observation.

R^2 = 1 - (SSE/TSS) = 1 - (MSE/MSE0)

where MSE0 = (N-1)*var(y)/N is MSE for the model yhat = mean(y).

> Would any of you give me a guidance on what I can choose as selection
> criterion and
>
> what would be the consequence for choosing one over the other.

Take your pick. Given var(y), the transformation is one-to-one.

Hope this helps.

Greg
RichUlrich - 05 Oct 2008 21:52 GMT
>On Oct 1, 2:18 pm, "yoon...@gmail.com" <yoon...@gmail.com> wrote:
>> Hi folks,
[quoted text clipped - 23 lines]
>
>Hope this helps.

I think that the language of the question is seductively misleading.
"RMSD" is what you refer to the product of a regression, and
yet, "cross-validation" is better used as a term for checking the
fit of one equation in an independent sample.

As Ray has posted, the Differences are better.  For one thing, you
can use it in either case.  For another, it accounts for bias (for a
continuous prediction).  Also, for fitting, you don't have to worry
about the d.f.  loss in fitting -- which can cause the "fitted" RMSE
to worsen with an extra predictor, even though the R-squared increases
with every predictor.

Signature

Rich Ulrich

Greg Heath - 06 Oct 2008 11:29 GMT
On Oct 1, 2:18 pm, "yoon...@gmail.com" <yoon...@gmail.com> wrote:
> Hi folks,
>
> I'm conducting a cross-validation study in which I need to select one
> model among a few based on selection criteria.

Cross-validation implies averaging over repeated sample splits into
training and testing subsets. Is this what you mean? Or are you
just emphasizing that the test set is an independent sample?

Either way, consideration of adjusted R^2 is unecessary.

Hope this helps.

Greg

> The selection criteria that I can think of is
>
[quoted text clipped - 11 lines]
>
> Yoon
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.