If you specify *NONE you will get the best performance, also the reorganize
is done "in-place" and will not require extra DASD to complete.
If you specify *FILE the records will be physically sorted according the key
fields specified in the DDS for the file.
You can also specify the name of a logical file that is built over the
physical file. In this case, the records will be physically sorted according
to the key fields specified in that logical file.
If you do specify *FILE or a file name the reorganize will take considerably
longer and require additional DASD to complete (up to the size of the file
being reorganized) this space is returned when the reorg completes
There is no particularly compelling reason to sort on a reorg. Unless you
have applications that rely on the physical sequence of records.
"Martijn Ruissen" <mar...@ruissen.tmfweb.nl> wrote in message
news:87q2ac$1uk$1...@azure.nl.gxn.net...
> Hi,
>
> I need some help with the parameters of the RGZPFM command, particularly
> the KEYFILE
> parameter.
> What should be used here?
> I'm planning to reorganize some large history files. One file is 4GB with
> 4,2 million
> records and 2,8 million deleted records and maximum record lenght of 561.
>
> The other file is about 5GB with about 9 million records, and about 10
> million deleted
> records, and a maximum record length of 233.
> So this will definitely free up from disk space, I think.
>
> I read that if you use the KEYFILE *NONE parameter, this will cut back on
> the time needed
> to reorganize.
> If these jobs are submitted over the weekend, they'll have about 52 hours
to
> complete.
> No other relevant system activity will take place during that time.
> I plan to run these jobs over seperate weekends, so I guess there is
enough
> time.
>
> It's a 620 model, with a 51GB ASP and 640MB RAM, the ASP% is about 71% at
> the moment.
>
> So which parameter should be used on the KEYFILE parameter? And what
exactly
> does it do?
> I guess it relates to the access path associated with the physical file?
>
> I'm struggling with this a little bit, and the as/400 help didn't really
> offer that much
> help:)
>
> Thanks in advance!!
> Regards,
> Martijn Ruissen
>
>
>
> There is no particularly compelling reason to sort on a reorg. Unless you
> have applications that rely on the physical sequence of records.
>
I have an application that uses a persons SS# as the key. I have a customer
master file, a demographics file, an event file, an account file and many
others. We load data to the database monthly. So for example SS# 111-11-1111 may
not exist in the database this month, however it comes in next month. When it
does come in next month its RRN will be at the end of the customer master file.
So for example, the key will sort it lower than SS# 222-22-2222 but it's RRN is
higher. Then all related records in all the other files will be scattered around
in RRN order. I reorg this database every quarter just before we are to print
statements. This improves the processing time for the batch program that pulls
the info to be printed, as well as all the screen programs that access this data
daily. The reason is that the physical order of the records in the file match
the keyed order. Obviously there are records in my files that are added between
reorg time and processing time, however 95 % of the records are in the same
physical sequence as keyed sequence. So I can read my customer master file in
RRN order, and block as high as possible. So for example, I read a record from
the customer master for SS# 111-11-1111. I need to setll & reade on 6 files to
get all the information about this customer in order to decide if they get a
statement or not. By the "child" files being physically sorted into the same
sequence as the "parent" file, there is less disk access then if the data was
all scattered around the file. I still read with the key, however all the
records are next to each other in physical sequence. I was able to save about
40% processing time on one batch program. There are 6 quarter end programs we
must run on this database and all were taking about 12 hours to complete.
Needless to say I need a solution. By reorging the file and using the keyfile
parameter, the weekend before the end of the quarter I reduced the average run
time to 8 - 9 hours a piece. Also we have telephone service representatives
talking phone calls from customers all day long. Immediately after the reorg
there is a noticeable difference in speed when you need to view all the accounts
a customer has and they have 7 or 8 different accounts, I store one record per
account per month for 24 months. By the records being in the same physical
sequence as the key, it speeds up the screen. BTW the reorg takes 12 hours or
so to run. So I guess what I am trying to say is that sometimes using the *FILE
parameter on the reorg is a useful thing. You just have to have the time to do
it.
Just my .02 worth
Oscar
Granted that RRN access is quicker than keyed access and granted that
blocked I/O typically performs better than 1 record at a time, the
description of your processing implies that you are doing neither. When you
access a file by its index you essentially lose the main benefit of a
physical sorting of records. It takes the system the same amount of work to
find a record when accessed through an index regardless of the physical
sequence.
Furthermore just because two records have adjacent RRNs does not imply that
they are "close" together on disk. The system places data where it will, and
you have very little control over this. In fact, the system is "designed" to
scatter data across disk to improve performance, so in actual fact any given
file is virtually guaranteed to scattered across all drives and in no
semblence to a physical ordering.
I will go even so far as to assert that the term "physical sequence" is an
illusion. It is short hand for saying that each record in a file has an RRN
and that records can be retrieved in RRN sequence. When a file is
"physically sorted" you are simply saying that a file's RRN sequence of
records happens to match the sequence you would obtain had you used such and
such keyed sequence. So your assertion that physcically sorted files require
less disk access is unlikely
Furthermore, predictions of performance you make based on this illusion of
physcial ordering are dubious. For whatever reason the OS designers may
decide to completely revamp their strategy and this strategy may alter
performance characteristics drastically. It is even possible that the OS
designers could in a new release optimize their code with the assumption
that records will be accessed in an essentially random manner, and such a
tweak could make a "physically sequenced" file perform less well.
I can only think of two cases when "physically sorted" records can improve
performance. One case relies on a little remembered parameter on the OVRDBF
called NBRRCDS this tells the system for each record read how many
additional records to read (in RRN sequence) this parameter only provides
advantage if records are "physically sorted" The other case is that database
paging will be improved if multiple records will fit into a single page and
those records are accessed in "physical sequence" This advantage has
actually improved with the larger page size available with RISC. but it is
essentially meaningless for large records. The database paging advantage of
"Physical sorting" is also significantly reduced when varying length data is
included in a record (especially for CLOB and BLOB and large VARCHAR fields)
I also find it interesting to note that you saved 3-4 hour savings in your
quarterly run time at the cost of a 12 hour dedicated reorganize. It makes
me wonder if you have actually improved your throughput when you consider
the down time to reorganize the file.
"Oscar" <o...@home.xxx.com> wrote in message
news:38A0DDD7...@home.xxx.com...
Oscar's update <it was a quick read...> suggests effectively, actual
benefits that can be achieved from physical ordering of the data to
match an index which is used for keyed retrieval. If your retrieval
is typically via this key, and the data is physically ordered by this
key, then the time/work it takes to "find" a record when accessed via
an index is reduced by reductions in database paging. This is a
valid use for this option via KEYFILE on RGZPFM.
Regards, Chuck
All comments provided "as is" with no warranties of any kind whatsoever.
It depends on the sequence you retrieve records from the database. A random
"seek" or "chain" to a logical requires two steps. The first is to find the
key in the index, the second step is to go and get the physical record.
Obviously, if the physical record is already in the buffer, then the second
step will be quicker.
>
> Furthermore just because two records have adjacent RRNs does not imply
that
> they are "close" together on disk. The system places data where it will,
and
> you have very little control over this. In fact, the system is "designed"
to
> scatter data across disk to improve performance, so in actual fact any
given
> file is virtually guaranteed to scattered across all drives and in no
> semblence to a physical ordering.
The allocations are large, otherwise OS/400 would spend an awful lot of time
housekeeping and keeping track of numerous "chunks" of objects.
> I will go even so far as to assert that the term "physical sequence" is an
> illusion. It is short hand for saying that each record in a file has an
RRN
> and that records can be retrieved in RRN sequence. When a file is
> "physically sorted" you are simply saying that a file's RRN sequence of
> records happens to match the sequence you would obtain had you used such
and
> such keyed sequence. So your assertion that physcically sorted files
require
> less disk access is unlikely
This is an old S/36 trick and it is perfectly valid on the AS/400. If you
read a file in a keyed sequence and the physical file is (perhaps mostly) in
that order, then the chances are that the next record you read will be in
the buffer, saving a lot of random I/O.
> Furthermore, predictions of performance you make based on this illusion of
> physcial ordering are dubious. For whatever reason the OS designers may
> decide to completely revamp their strategy and this strategy may alter
> performance characteristics drastically. It is even possible that the OS
> designers could in a new release optimize their code with the assumption
> that records will be accessed in an essentially random manner, and such a
> tweak could make a "physically sequenced" file perform less well.
Random I/O is the main performance problem of a relational database. Other
database models such as hierarchial and network store related records from
different tables together in the same area. This is the performance price
you pay for the flexibility of a relational database, you can join any table
to any other table --- this is very difficult with other database models.
The only development that I can think of that would make physical reordering
obsolete, would be if there came along some kind of storage device that can
do random I/O as quickly as sequential I/O (solid state disks perhaps?), I
don't think this will happen for a long time.
Please also bear in mind, that the disk controllers and disk units
themselves have read-ahead buffers. With a lot of AS/400 disk units, if you
request data from an area of the disk, the disk unit/controller will read in
the whole track or a good 64k chunk at the same time.
>
> I can only think of two cases when "physically sorted" records can improve
> performance. One case relies on a little remembered parameter on the
OVRDBF
> called NBRRCDS this tells the system for each record read how many
> additional records to read (in RRN sequence) this parameter only provides
> advantage if records are "physically sorted" The other case is that
database
> paging will be improved if multiple records will fit into a single page
and
> those records are accessed in "physical sequence" This advantage has
> actually improved with the larger page size available with RISC. but it is
> essentially meaningless for large records. The database paging advantage
of
> "Physical sorting" is also significantly reduced when varying length data
is
> included in a record (especially for CLOB and BLOB and large VARCHAR
fields)
>
> I also find it interesting to note that you saved 3-4 hour savings in
your
> quarterly run time at the cost of a 12 hour dedicated reorganize. It makes
> me wonder if you have actually improved your throughput when you consider
> the down time to reorganize the file.
This is where you have to be careful. But I have seen cases where the end of
period reports and updates need to be run as quickly as possible as they
require exclusive access. In preparation for these jobs, you can sort tables
over the preceeding days/weeks during spare overnight time (perhaps one
table per day). When you come to run the jobs, most of the records will be
in physical sequence.
The actual data records can be ordered physically the same as an
existing key/index. The key can be either one defined to the physical
file or to a <not select/omit or join> logical file which specifies
the reorganized member as its based-on. The decision to order the
data by a key depends on how the files are typically accessed by the
programs. If the physical file is neither keyed nor has keyed logical
files which reference it, this parameter has no meaning.
Important information missing from the already provided details is the
typical access method of the files and the physical/logical key info.
There is often a big benefit in removing all logical key maintenance
for the reorganize request, and rebuilding keys in parallel after the
reorg request completes; so investigate if/the logical file keys.
Given that both files have as much data deleted as existing, there is
also the possibility that copying the data to a new file <choosing to
not include deleted records> and then building the logicals over the
new file is faster -- again, in parallel. Given sufficient storage,
this also allows for immediate 'recovery' from a situation where the
reorganize request does not complete in the alotted time.
Additionally there is a user-tool RCLSPACE which may be suited to
your files. You can read about the command in source member TGPARCSI
in file QUSRTOOL/QATTINFO.
I can't say I know the particuliars of the AS/400's internal workings, you indeed may be correct in your response. I could be full of hot air.
I do know that 3-4 hours cutoff 6 batch jobs is 18-24 hours of processing time saved. The 12 hours it takes to run the weekend before is meaningless since it's on a weekend and no one is using the database. The following weekend though, all the processing done by Monday morning or else. This wasn't happening. I remember reading somewhere the technique I described and it worked well for me. So I assumed that the technique that I described was valid and shared it with this group.
Oscar
I do know that 3-4 hours cutoff 6 batch jobs is 18-24 hours of processing
time saved. The 12 hours it takes to run the weekend before is meaningless
since it's on a weekend and no one is using the database. The following
weekend though, all the processing better be done by Monday morning
or else. This wasn't happening. I remember reading somewhere the technique
I described and it worked well for me. So I assumed that the technique
that I described was valid and
shared it with this group.
I also just remembered a system where we had to read a record and take the VIN for this record, SETLL and READE through and event file, get all the events for this customer, do some date math and other logic to see if eligible. Daily volume was around 20,000 records. The process was taking 12 hours to run. There were 30,000,000 + records in the event file, keyed by VIN, Event Type, and Event Sub Type. (Key length of 23 bytes total length of 43 bytes) and it had never been reorged. We bit the bullet one weekend and reorged it. It took about 4 days to reorg, and our customer wasn't very happy, but when it was finished, the daily jobs ran in about 2 hours. After this, we scheduled a reorg once a month for this file. I also must note that this job was on our old AS/400. A model 510 I believe. So given today's faster machine's these numbers would probably would go down. So again, what I am trying to say is that my theory seemed plausible to me based on my experiences. Whether or not the actual techniques I described to achieve the results are correct is another story.
Oscar
> I basically disagree with your assesment in almost every particular
>
> Granted that RRN access is quicker than keyed access and granted that
> blocked I/O typically performs better than 1 record at a time, the
> description of your processing implies that you are doing neither. When you
> access a file by its index you essentially lose the main benefit of a
> physical sorting of records. It takes the system the same amount of work to
> find a record when accessed through an index regardless of the physical
> sequence.
Not necessarily. If it just read a page containing the next 10 records to be
processed, chances are that it won't be paged out before you've finished
processing them ...
> Furthermore just because two records have adjacent RRNs does not imply that
> they are "close" together on disk. The system places data where it will, and
> you have very little control over this. In fact, the system is "designed" to
> scatter data across disk to improve performance, so in actual fact any given
> file is virtually guaranteed to scattered across all drives and in no
> semblence to a physical ordering.
Adjacent RRNs don't guarantee that the records are physically adjacent nor on
the same device, agreed, but the chancess are high. It's a numbers game. I
doubt that the system fragments data to the extent you're implying. I tend to
think of it as emphasising *distributing* it rather than *fragmenting* it.
> I will go even so far as to assert that the term "physical sequence" is an
> illusion. It is short hand for saying that each record in a file has an RRN
> and that records can be retrieved in RRN sequence. When a file is
> "physically sorted" you are simply saying that a file's RRN sequence of
> records happens to match the sequence you would obtain had you used such and
> such keyed sequence. So your assertion that physcically sorted files require
> less disk access is unlikely
Well, illusion or not, it seems to work in practice :-)
[snip]
> I also find it interesting to note that you saved 3-4 hour savings in your
> quarterly run time at the cost of a 12 hour dedicated reorganize. It makes
> me wonder if you have actually improved your throughput when you consider
> the down time to reorganize the file.
If he runs the 12-hour reorganise at some "dead" time like overnight on a
weekend, the cost is probably "zero lost production hours". OTOH, the 3-4 hours
saved could be 3-4 hours that interactive users can process transactions in the
morning following the batch process ... likely to be a *good* trade-off.
-Ian.
--
Ian Stewart
i...@incognito.co.nz
I note that you mentioned using READE loops. There's a *potentially* dramatic
improvement in performance available by specifying BLOCK(*YES) on your file in RPG
IV. For this to work, you have to avoid READE (replace with READ, and test to see
if you've gone past the record you want manually), and the file would have to be
input only.
BTW, this would be independent of any gains provided by the physical sequence, I
presume you would get compound benefits by combining the two techniques.
YMMV warning: I would expect that the improvement would depend on what percentage of
the records you processed from each block. Conceptually, I'd expect it could do
worse than not blocking if you're only processing a handful of records for each
reposition operation ... but it might be worth a trial if you have a really
long-running job. I got one report to run in literally a fraction of the time
(we're talking *dramatic* improvement here!) However, it read significant blocks of
records between each reposition, it was probably the optimal candidate for this
treatment ... it's probably a case of experimenting to see if you can improve any
particular process this way.
Yes I have read about that. I haven't used it in practice though remember reading
something about the "C" function that READE uses versus READ. When you use READ blocking
works and when you use READE it doesn't. As I said, I read it somewhere but don't
remember where. For that matter I can't remember where I learned the technique I
described in my first post.
The system I mentioned began to run within the allotted time fine after I made the
improvements I explained. ( I have from 8:00 pm Friday until 11:30 PM Sunday) Now we
have a faster production machine, The operator fired it up around 8:30 pm and it was
complete by 11:30 am. at the end of last quarter. This was the first run on our new box.
I believe it's a 720 model, Dual processors. (I really don't know much about all the
different model numbers etc.) Wow what a difference !
Thanks for reminding me about the BLOCK(*YES) feature though. I had forgotten about
that. I'll try it when I get a chance.
Oscar