Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Mathematics
General TopicsResearchOperations ResearchStatisticsMathematical LogicNumerical AnalysisUndergraduate MathAlgebra HelpRecreational Math
Math Software
MapleMathematicaMATLABScilabSASSPSS

Math Forum / Math Software / SAS / July 2009



Tip: Looking for answers? Try searching our database.

Include Predictor that was used to derive the outcome?

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
paul wilson - 03 Jul 2009 15:58 GMT
Hi everyone,=0A=0AI am wondering if there are any issues in terms of biasin=
g my model if I want to include a predictor that was used to derive my depe=
ndant variable.=0A=0AHere is more detail:=0A=0ADependant variable is "chang=
e in sales volume per customer" meaning "sales per customer in period 1" mi=
nus "sales per customer in period 2".=0A=0AAmongst other things I'd like to=
investigate if the "amount of sales iper customern period 1" predicts the =
change in sales. In other words, perhaps customers who used to spend a lot =
are the ones who had the highest decline in sales=A0 when compared to perio=
d 2.=0A=0AI realize that one needs to be careful not to include a predictor=
that is really the outcome in discuise, but I'm not sure if that kind of l=
ogic applies to this situation.=0A=0AThanks a lot!=0A=0A=0A
Arthur Tabachneck - 03 Jul 2009 18:20 GMT
Paul,

Couldn't you use the delta (i.e., "sales per customer in period 2" / "sales
per customer in period 1" -1)as the dv?

Art
--------

>Hi everyone,

I am wondering if there are any issues in terms of biasing my model if I
want to include a predictor that was used to derive my dependant variable.

Here is more detail:

Dependant variable is "change in sales volume per customer" meaning "sales
per customer in period 1" minus "sales per customer in period 2".

Amongst other things I'd like to investigate if the "amount of sales iper
customern period 1" predicts the change in sales. In other words, perhaps
customers who used to spend a lot are the ones who had the highest decline
in sales� when compared to period 2.

I realize that one needs to be careful not to include a predictor that is
really the outcome in discuise, but I'm not sure if that kind of logic
applies to this situation.

Thanks a lot!
paul wilson - 03 Jul 2009 20:18 GMT
Hi Art,=0A=0AVery interesting point.=0A=0AWhat would be advantage of defini=
ng DV as "sales in period 2" divided by "sales in period 1" as opposed to=
=0A"sales in period 2" minus"sales in period 1"?=0A=0AMore importantly, do =
you see any issues in using "sales in period 1" as a predictor in this sort=
of a model=0Aknowing that it was one of the componets used to derive DV?=
=0A=0AThanks a lot!=0A=0A=0A=0A=0A________________________________=0AFrom: =
Arthur Tabachneck <art297@NETSCAPE.NET>=0ATo: SAS-L@LISTSERV.UGA.EDU; Paul =
Wilson <paulwilsn@YAHOO.COM>=0ASent: Friday, July 3, 2009 1:20:12 PM=0ASubj=
ect: Re: Include Predictor that was used to derive the outcome?=0A=0APaul,=
=0A=0ACouldn't you use the delta (i.e., "sales per customer in period 2" / =
"sales=0Aper customer in period 1" -1)as the dv?=0A=0AArt=0A--------=0AOn F=
ri, 3 Jul 2009 07:58:43 -0700, paul wilson <paulwilsn@YAHOO.COM> wrote:=0A=
=0A>Hi everyone,=0A=0AI am wondering if there are any issues in terms of bi=
asing my model if I=0Awant to include a predictor that was used to derive m=
y dependant variable.=0A=0AHere is more detail:=0A=0ADependant variable is =
"change in sales volume per customer" meaning "sales=0Aper customer in peri=
od 1" minus "sales per customer in period 2".=0A=0AAmongst other things I'd=
like to investigate if the "amount of sales iper=0Acustomern period 1" pre=
dicts the change in sales. In other words, perhaps=0Acustomers who used to =
spend a lot are the ones who had the highest decline=0Ain sales=EF=BF=BD wh=
en compared to period 2.=0A=0AI realize that one needs to be careful not to=
include a predictor that is=0Areally the outcome in discuise, but I'm not =
sure if that kind of logic=0Aapplies to this situation.=0A=0AThanks a lot!=
=0A=0A=0A=0A
Arthur Tabachneck - 03 Jul 2009 21:48 GMT
Paul,

My statistical training was quite a few years ago, thus I'd wait to see what
some of the more recently taught folk have to say.

I would calculate the delta, as it computes the percentage change from time1
to time2, irrespective of the actual amounts.

For example, which would you consider a bigger change: (1) from 10 to 100
or (2) from 1,000,000 to 1,001,000?

The delta of the former would be greater, while the subtraction calculation
would show the latter to be greater.

Using the delta, I don't see any conflict in using "sales in period 1" as a
predictor as it represents the hypothesis you want to test.

Art
---------

>Hi Art,

Very interesting point.

What would be advantage of defining DV as "sales in period 2" divided by
"sales in period 1" as opposed to
"sales in period 2" minus"sales in period 1"?

More importantly, do you see any issues in using "sales in period 1" as a
predictor in this sort of a model
knowing that it was one of the componets used to derive DV?

Thanks a lot!

________________________________
From: Arthur Tabachneck <art297@NETSCAPE.NET>
To: SAS-L@LISTSERV.UGA.EDU; Paul Wilson <paulwilsn@YAHOO.COM>
Sent: Friday, July 3, 2009 1:20:12 PM
Subject: Re: Include Predictor that was used to derive the outcome?

Paul,

Couldn't you use the delta (i.e., "sales per customer in period 2" / "sales
per customer in period 1" -1)as the dv?

Art
--------

>Hi everyone,

I am wondering if there are any issues in terms of biasing my model if I
want to include a predictor that was used to derive my dependant variable.

Here is more detail:

Dependant variable is "change in sales volume per customer" meaning "sales
per customer in period 1" minus "sales per customer in period 2".

Amongst other things I'd like to investigate if the "amount of sales iper
customern period 1" predicts the change in sales. In other words, perhaps
customers who used to spend a lot are the ones who had the highest decline
in sales� when compared to period 2.

I realize that one needs to be careful not to include a predictor that is
really the outcome in discuise, but I'm not sure if that kind of logic
applies to this situation.

Thanks a lot!
Sigurd Hermansen - 05 Jul 2009 23:18 GMT
However one defines a change in a time series from one time to the next, "errors" in observed values (measurement, effects of extraneous events, etc.) tend to carry over relatively short (daily, weekly, monthly, yearly, for instance) time intervals. This violates the usual assumption in statistical models of independence of residual errors. In practical terms, predictions of values at t when one knows the value at t-1 tend to be much more accurate than predictions in another sample of a time series where neither t or t-1 are known. Time series models adjust for serial correlation of errors by modeling the error term at t as, say, a linear function of the error at t-1, t-2, .....

In 1967, Longley illustrated serial correlation of different time series:
http://lib.stat.cmu.edu/datasets/longley

Predicting time series values using another serially correlated time series works great, but how does one predict the values of the correlated time series in advance?
S

-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of paul wilson
Sent: Friday, July 03, 2009 3:19 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Include Predictor that was used to derive the outcome?

Hi Art,

Very interesting point.

What would be advantage of defining DV as "sales in period 2" divided by "sales in period 1" as opposed to
"sales in period 2" minus"sales in period 1"?

More importantly, do you see any issues in using "sales in period 1" as a predictor in this sort of a model
knowing that it was one of the componets used to derive DV?

Thanks a lot!

________________________________
From: Arthur Tabachneck <art297@NETSCAPE.NET>
To: SAS-L@LISTSERV.UGA.EDU; Paul Wilson <paulwilsn@YAHOO.COM>
Sent: Friday, July 3, 2009 1:20:12 PM
Subject: Re: Include Predictor that was used to derive the outcome?

Paul,

Couldn't you use the delta (i.e., "sales per customer in period 2" / "sales
per customer in period 1" -1)as the dv?

Art
--------

>Hi everyone,

I am wondering if there are any issues in terms of biasing my model if I
want to include a predictor that was used to derive my dependant variable.

Here is more detail:

Dependant variable is "change in sales volume per customer" meaning "sales
per customer in period 1" minus "sales per customer in period 2".

Amongst other things I'd like to investigate if the "amount of sales iper
customern period 1" predicts the change in sales. In other words, perhaps
customers who used to spend a lot are the ones who had the highest decline
in sales� when compared to period 2.

I realize that one needs to be careful not to include a predictor that is
really the outcome in discuise, but I'm not sure if that kind of logic
applies to this situation.

Thanks a lot!
Søren Lassen - 06 Jul 2009 17:44 GMT
Paul,
I would consider analysing the logarithms of sales, rather than the sales
amounts themselves.  The real question is: what do you want to do with
customers who were not with your company in one of the periods? My
suggestion is that you lump all such customers together with the ones that
bought for less than one dollar (or whatever you consider an insignificant
amount - are you selling paper napkins or fighter planes?):
/* make a format for the logs */
Proc format;
 value logsales
 0='<=1'
 0<-1='1-10'
 ...

data ToBeAnalysed;
 set RawData;
 /* lump low sales and new/lost customers together */
 sales1=max(sales1,1); /* Period 1 */
 sales2=max(sales2,1); /* Period 2 */
 /* calculate logarithms */
 logsales1=log10(sales1);
 logsales2=log10(sales2);
 logchange=logsales2-logsales1;

I would then do the analysis with logchange as the dependent variable.
You could try using logsales1 or logsales2 as the independent variable,
but I wold try just printing the data, using the format suggested above for
the logsales* variables - it may tell you a lot more than a regression
analysis.

But returning to your original question - no, I do not think it is a great
problem to analyse with a variable that was derived the way that you
suggest. The only question is how to categorize new and lost customers,
it may be a good idea to give them separate categories in the analysis.

At the same time it may be relevant to consider your time scale - it
should be pertinent to the typical buying pattern of you customers - you
do not want to consider someone a loss because he does not buy
a new car every 3 months, but someone who has not bought any groceries
in 3 months is either deceased, or has moved (to another city/state/country,
or to another supplier).

Regards,
Søren

>Hi everyone,

I am wondering if there are any issues in terms of biasing my model if I
want to include a predictor that was used to derive my dependant variable.

Here is more detail:

Dependant variable is "change in sales volume per customer" meaning "sales
per customer in period 1" minus "sales per customer in period 2".

Amongst other things I'd like to investigate if the "amount of sales iper
customern period 1" predicts the change in sales. In other words, perhaps
customers who used to spend a lot are the ones who had the highest decline
in sales� when compared to period 2.

I realize that one needs to be careful not to include a predictor that is
really the outcome in discuise, but I'm not sure if that kind of logic
applies to this situation.

Thanks a lot!
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2010 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.