I have a 12 GB (SAS compressed) SAS file that I need to process and add some computed variables to it. The resulting file is 15 GB (SAS compressed, 40 million) records). For a variety of reasons, I had to re-run the process. Both times the job finished in approximately 1/2 hour.
Then I wanted to read just 3 variables (about 150 bytes per record) from the 15 GB file using a simple data step.
data svclib. service_codes (compress=yes);
set svclib.svc_span_transactions(keep=service_code tie_breaker source_system_id);
run;
The resulting file will be about 1 GB (SAS compressed). The job has been running for nearly 3 hours and hasn't finished yet.
Does anyone have any ideas about what might be going on? It seems to me that reading 3 variables from the 15 GB file shouldn't take 6+ times longer than creating the file in the first place. I am about to contact my IT people to see about disk problems, I/O contention, etc. but wanted to verify that there is no SAS file I/O reasons for the above results. I would be happy to provide any other info that might be helpful.
Puzzled near Seattle,
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
Joe Matise - 29 Jan 2010 17:31 GMT
Is it on a server? If so I'd blame the server, probably has use issues
(more people hitting it now). Or perhaps you have (nearly) run out of
storage space and it's scrambling to find space for the last bit? Can you
find the temporary work location and see how big the file is that's being
created? Should give you an idea of how far it is, at least.
I can't think of a sas i/o reason but I'm not an expert on that side either.
-Joe
> I have a 12 GB (SAS compressed) SAS file that I need to process and add
> some computed variables to it. The resulting file is 15 GB (SAS compressed,
[quoted text clipped - 28 lines]
> Research and Data Analysis Division
> Olympia, WA 98504-5204
Nordlund, Dan (DSHS/RDA) - 29 Jan 2010 17:47 GMT
Yes it is on a Windows Server and the files are on an attached high =
speed storage area network. The usual explanation for slow processing =
that I see is in fact I/O contention with other large jobs that are =
running concurrently. But I have never experienced this magnitude of =
slow down.
=20
Dan
=20
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
=20
From: Joe Matise [mailto:snoopy369@gmail.com]=20
Sent: Friday, January 29, 2010 9:32 AM
To: Nordlund, Dan (DSHS/RDA)
Cc: SAS-L@listserv.uga.edu
Subject: Re: puzzling SAS I/O question
=20
Is it on a server? If so I'd blame the server, probably has use issues =
(more people hitting it now). Or perhaps you have (nearly) run out of =
storage space and it's scrambling to find space for the last bit? Can =
you find the temporary work location and see how big the file is that's =
being created? Should give you an idea of how far it is, at least.
I can't think of a sas i/o reason but I'm not an expert on that side =
either.
-Joe
I have a 12 GB (SAS compressed) SAS file that I need to process and add =
some computed variables to it. The resulting file is 15 GB (SAS =
compressed, 40 million) records). For a variety of reasons, I had to =
re-run the process. Both times the job finished in approximately 1/2 =
hour.
Then I wanted to read just 3 variables (about 150 bytes per record) from =
the 15 GB file using a simple data step.
data svclib. service_codes (compress=3Dyes);
set svclib.svc_span_transactions(keep=3Dservice_code tie_breaker =
source_system_id);
run;
The resulting file will be about 1 GB (SAS compressed). The job has =
been running for nearly 3 hours and hasn't finished yet.
Does anyone have any ideas about what might be going on? It seems to me =
that reading 3 variables from the 15 GB file shouldn't take 6+ times =
longer than creating the file in the first place. I am about to =
contact my IT people to see about disk problems, I/O contention, etc. =
but wanted to verify that there is no SAS file I/O reasons for the above =
results. I would be happy to provide any other info that might be =
helpful.
Puzzled near Seattle,
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
=20
Terjeson, Mark - 29 Jan 2010 18:09 GMT
Hi Joe,
You certainly could do the traditional
process of elimination to help locate
where the problem is for sure. i.e. Most
PC's nowadays have multi-Gig harddrives.
If I write 40M obs with 3vars to a local
drive it completes in about 1m13s. Your
incoming dataset buffer and your datastep
processing may be more involved, but you
certainly could do a test and repoint
your program to output to your local drive.
If, indeed, it runs faster, then you have
determined that the slow-down is remote
from your local box, which would confirm
that your processing is ruled out of the
equation of contributing factors.
You could also reconfirm the SAN issues
by writing a very stripped down test with
the volume of records necessary and see
if the with/without SAN shows the difference.
Hope this is helpful.
Mark Terjeson
Investment Business Intelligence
Investment Management & Research
Russell Investments
253-439-2367
Russell
Global Leaders in Multi-Manager Investing
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Nordlund, Dan (DSHS/RDA)
Sent: Friday, January 29, 2010 9:48 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: puzzling SAS I/O question
Yes it is on a Windows Server and the files are on an attached high
speed storage area network. The usual explanation for slow processing
that I see is in fact I/O contention with other large jobs that are
running concurrently. But I have never experienced this magnitude of
slow down.
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
From: Joe Matise [mailto:snoopy369@gmail.com]
Sent: Friday, January 29, 2010 9:32 AM
To: Nordlund, Dan (DSHS/RDA)
Cc: SAS-L@listserv.uga.edu
Subject: Re: puzzling SAS I/O question
Is it on a server? If so I'd blame the server, probably has use issues
(more people hitting it now). Or perhaps you have (nearly) run out of
storage space and it's scrambling to find space for the last bit? Can
you find the temporary work location and see how big the file is that's
being created? Should give you an idea of how far it is, at least.
I can't think of a sas i/o reason but I'm not an expert on that side
either.
-Joe
I have a 12 GB (SAS compressed) SAS file that I need to process and add
some computed variables to it. The resulting file is 15 GB (SAS
compressed, 40 million) records). For a variety of reasons, I had to
re-run the process. Both times the job finished in approximately 1/2
hour.
Then I wanted to read just 3 variables (about 150 bytes per record) from
the 15 GB file using a simple data step.
data svclib. service_codes (compress=yes);
set svclib.svc_span_transactions(keep=service_code tie_breaker
source_system_id);
run;
The resulting file will be about 1 GB (SAS compressed). The job has
been running for nearly 3 hours and hasn't finished yet.
Does anyone have any ideas about what might be going on? It seems to me
that reading 3 variables from the 15 GB file shouldn't take 6+ times
longer than creating the file in the first place. I am about to
contact my IT people to see about disk problems, I/O contention, etc.
but wanted to verify that there is no SAS file I/O reasons for the above
results. I would be happy to provide any other info that might be
helpful.
Puzzled near Seattle,
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
Arthur Tabachneck - 29 Jan 2010 22:17 GMT
Dan,
In addition to what others have mentioned, I've run into similar problems
using a Windows-based network to analyze large files. And, when I've
confronted such problems they haven't been limited to SAS, but all
programs trying to do analyses across servers. That is, in my cases,
network activity was the culprit per se.
I ended up moving all of our large files onto the same server where we
keep SAS and, alas, all of the problems went away.
FWIW,
Art
---------
>I have a 12 GB (SAS compressed) SAS file that I need to process and add some computed variables to it. The resulting file is 15 GB (SAS
compressed, 40 million) records). For a variety of reasons, I had to re-
run the process. Both times the job finished in approximately 1/2 hour.
>Then I wanted to read just 3 variables (about 150 bytes per record) from the 15 GB file using a simple data step.
>
[quoted text clipped - 5 lines]
>
>Does anyone have any ideas about what might be going on? It seems to me that reading 3 variables from the 15 GB file shouldn't take 6+ times
longer than creating the file in the first place. I am about to contact
my IT people to see about disk problems, I/O contention, etc. but wanted
to verify that there is no SAS file I/O reasons for the above results. I
would be happy to provide any other info that might be helpful.
>Puzzled near Seattle,
>
[quoted text clipped - 5 lines]
>Research and Data Analysis Division
>Olympia, WA 98504-5204
Terjeson, Mark - 29 Jan 2010 22:26 GMT
Hi,
To second Art's motion, when dealing
with large files, i/o (across the wire)
between boxes is typically the slowest
link in the chain. --as the old saying goes.
Mark
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Arthur Tabachneck
Sent: Friday, January 29, 2010 2:18 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: puzzling SAS I/O question
Dan,
In addition to what others have mentioned, I've run into similar
problems
using a Windows-based network to analyze large files. And, when I've
confronted such problems they haven't been limited to SAS, but all
programs trying to do analyses across servers. That is, in my cases,
network activity was the culprit per se.
I ended up moving all of our large files onto the same server where we
keep SAS and, alas, all of the problems went away.
FWIW,
Art
---------
>I have a 12 GB (SAS compressed) SAS file that I need to process and add some computed variables to it. The resulting file is 15 GB (SAS
compressed, 40 million) records). For a variety of reasons, I had to
re-
run the process. Both times the job finished in approximately 1/2 hour.
>Then I wanted to read just 3 variables (about 150 bytes per record) from
the 15 GB file using a simple data step.
>data svclib. service_codes (compress=yes);
> set svclib.svc_span_transactions(keep=service_code tie_breaker source_system_id);
>run;
>
>The resulting file will be about 1 GB (SAS compressed). The job has been
running for nearly 3 hours and hasn't finished yet.
>Does anyone have any ideas about what might be going on? It seems to me
that reading 3 variables from the 15 GB file shouldn't take 6+ times
longer than creating the file in the first place. I am about to
contact
my IT people to see about disk problems, I/O contention, etc. but wanted
to verify that there is no SAS file I/O reasons for the above results.
I
would be happy to provide any other info that might be helpful.
>Puzzled near Seattle,
>
[quoted text clipped - 5 lines]
>Research and Data Analysis Division
>Olympia, WA 98504-5204
Proc Me - 29 Jan 2010 23:05 GMT
If you've got files over 2GB in Windows, the recommendation from SAS is to
use the SGIO=yes option. This instructs SAS to bypass the Windows file cache
and read data directly from disk. It would also be worth looking at the
amount of memory you are committing to the I/O task. This can be set using
the bufno and bufsize system options at runtime (although they have to be
kept within the maxmemsize system option setting specified at invocation.
This paper may help:
http://support.sas.com/resources/papers/IOthruSGIO.pdf
Whilst I was looking this up I also spotted:
http://support.sas.com/resources/papers/proceedings09/334-2009.pdf
Gold dust: I've got a new paper to read, thank you for giving me an excuse
to look!
Proc Me
Michael Raithel - 29 Jan 2010 23:22 GMT
Dear SAS-L-ers,
Dan posted the following clarification:
> Yes it is on a Windows Server and the files are on an attached high
> speed storage area network. The usual explanation for slow processing
> that I see is in fact I/O contention with other large jobs that are
> running concurrently. But I have never experienced this magnitude of
> slow down.
Dan, I am going to jump onto the dogpile of responses from Joe, Art, and Mark and also opine that it is a network I/O issue. Arf!
As you know, one of the unsung benefits one gets by compressing SAS data sets is fewer overall I/O's transferring data between disk storage and computer memory due to SAS transferring more observations per I/O. So, compression results in fewer overall I/O's when performing a sequential read of the compressed SAS data set than if the same data set were uncompressed. Fewer I/O's result in fewer seconds, minutes, hours, days, spent waiting for a program to run. Since your I/O times went precipitously upward doing simple DATA step tasks, it is unlikely that SAS's uncompressing of incoming observations and compressing outgoing observations is the culprit. It is more in line with the type of server I/O delays that all of us have experienced at one time or another. That is, the trip to and from the server was during a computer rush hour.
You know, your systems folks should be able to run a monitor on the server and been able to tell you that there was heavy I/O activity, or some other marker of poor performance. Hopefully, they are approachable enough and open enough to share that information with a SAS programmer!
Dan, best of luck in all of your SAS endeavors!
I hope that this suggestion proves helpful now, and in the future!
Of course, all of these opinions and insights are my own, and do not reflect those of my organization or my associates. All SAS code and/or methodologies specified in this posting are for illustrative purposes only and no warranty is stated or implied as to their accuracy or applicability. People deciding to use information in this posting do so at their own risk.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Michael A. Raithel
"The man who wrote the book on performance"
E-mail: MichaelRaithel@westat.com
Author: Tuning SAS Applications in the MVS Environment
Author: Tuning SAS Applications in the OS/390 and z/OS Environments, Second Edition
http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=58172
Author: The Complete Guide to SAS Indexes
http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=60409
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Never insult an alligator until after you have crossed the river. - Cordell Hull
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Michael Raithel - 29 Jan 2010 23:35 GMT
Dear SAS-L-ers,
Proc Me posted the following interesting contribution to this thread:
> If you've got files over 2GB in Windows, the recommendation from SAS is
> to
[quoted text clipped - 19 lines]
> excuse
> to look!
Proc Me; nice contribution! Not to burst anybody's bubble, but I was pretty excited about the SGIO=yes option when I first heard about it some years ago. So, excited that I asked one of my staff to read the paper (handed him my annotated copy) and perform benchmarks using our real data on SAS 9.1.3 on Windows XP. We got very mediocre results. I was crestfallen, because I really enjoyed both the paper and the concept and wanted to bring something new and beneficial to each and every SAS-hosting desktop at SAS Mecca.
I would be _VERY_ interested if any 'L-ers get meaningful reductions in I/O's on SAS 9.2 (TS2M0 or TS2M2) under Windows XP. Not on the pretty plaything data sets generated with multiple DO loops for proof-of-concept SAS-L postings, but on real-life large-sized data sets. Inquiring minds want to know!
BTW, both Tony Brown and Margaret Crevar have written other stellar papers; I'm a big fan of both of them... but let's just keep that between the two of us.
Proc Me, best of luck in all your SAS endeavors!
I hope that this suggestion proves helpful now, and in the future!
Of course, all of these opinions and insights are my own, and do not reflect those of my organization or my associates. All SAS code and/or methodologies specified in this posting are for illustrative purposes only and no warranty is stated or implied as to their accuracy or applicability. People deciding to use information in this posting do so at their own risk.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Michael A. Raithel
"The man who wrote the book on performance"
E-mail: MichaelRaithel@westat.com
Author: Tuning SAS Applications in the MVS Environment
Author: Tuning SAS Applications in the OS/390 and z/OS Environments, Second Edition
http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=58172
Author: The Complete Guide to SAS Indexes
http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=60409
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A conclusion is simply the place where someone got tired of thinking. - Arthur Block
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Nordlund, Dan (DSHS/RDA) - 29 Jan 2010 23:59 GMT
Thanks to all who have responded to my plea for help. I am trying to do some testing and will definitely look at the papers mentioned. Unfortunately I am up against a deadline and won't be able to do much testing until I deliver my deliverables (and today is [supposed to be] a day off). As a aside, since I was having a problem with the file I created, I decided that I would blow it away and recreate it. The process like before finished in just over 1/2 hour. Now I am going to try to subset it again. (what is it they say about doing the same thing and expecting different results? :-)
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
[quoted text clipped - 22 lines]
>
> Proc Me
Nordlund, Dan (DSHS/RDA) - 30 Jan 2010 07:25 GMT
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
[quoted text clipped - 46 lines]
> >
> > Proc Me
Well, this round of the saga is coming to a close. I don't know what has changed because I can't monitor performance on the server myself, and I did recreate the large 15 GB file before re-running the program to subset the data. This time around, the subset finished in about 17 minutes (instead of 3 hrs, and sorting the subset took only 1 minute (instead of 47 min.). Clearly resource contention on the server probably had an effect. But I find it hard to believe that performance would degrade that much for that reason alone (47 to 1 on the sort). Especially since my jobs run using a separate work "drive" from everyone else in the office, and separate from the drive for reading and writing the final datasets.
So, I will try to find some time to read the papers referenced in some of the other posts in this thread and try out some of the suggested I/O performance enhancements. Thanks again everyone for all the suggestions.
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
Proc Me - 01 Feb 2010 22:37 GMT
Michael,
Thank you for your kind words. I'm writing this intro after the content, so
I can see that this post may be of more interest to those beginning their
SAS journeys.
>I would be _VERY_ interested if any 'L-ers get meaningful reductions in
>I/O's on SAS 9.2 (TS2M0 or TS2M2) under Windows XP. Not on the pretty
>plaything data sets generated with multiple DO loops for proof-of-concept
>SAS-L postings, but on real-life large-sized data sets. Inquiring minds
>want to know!
I've done some testing and the results are better, but we're running
9.2TS2M2 against a Win2k3 Enterprise Edition server, rather than XP, with
data on SAN, so the results may not be comparable. I'd be happy to share
what we have if I can dig out, or regenerate, some meaningful evidence.
One thing that definitely paid off is monitoring resource usage, if you have
memory you're not using, use it. You can do this by increasing its usage
through the memory buffer product (bufsize*bufno). This has made a
difference, up to a ceiling of 2GB - now explained by the paper from TB and MC.
Setting the bufsize small also appears to help with sorting data -
intuitively this makes a certain sort of sense: its easier to get to the
bits you want when you've got lots of small chunks.
>BTW, both Tony Brown and Margaret Crevar have written other stellar
>papers; I'm a big fan of both of them... but let's just keep that between
>the two of us.
I agree: stars in the SAS firmament! I'll pop over to lexjansen.com to make
sure I've not missed anything, and would welcome any pointers.
When we relocated our server recently we did it by commissioning a new
machine. I wrote the spec based on their best practices, and I'm confident
that we've got a better machine for it. Our IS department really responded
when they saw "business users" who cared passionately and who had answers
for many of the design questions they had - the post-project debrief was a
bit of a love-in.
Proc Me