Unisys DMSII - AreasInUse attribute

Domagoj

unread,

Mar 19, 2009, 8:35:12 AM3/19/09

to

Hello all,

In a DMSII database we have a dataset whose AreasInUse attribute reached
around 920 last week. As we didn't need older records, we deleted them and
initiated garbage collection on the sets. Now sets, which previously had
around the same AreasInUse value as the dataset, only have a half of that
value - just as we expected.

But for the dataset, AreasInUse attribute continued to grow and is now 945,
although I expected it to stop growing until all the previously occupied
space would be occupied again.

So I have two questions I can't answer myself:
1. Why is it still growing.
2. What will happen when it reaches a limit of 1000 areas (this should be
fairly obvious :).

I would like to hear opinions and suggestions. Reorganization is not
planned for another ten days, but that might be too late.

Thanks,
Domagoj

LeifJ

unread,

Mar 20, 2009, 8:09:12 AM3/20/09

to

Do a GC on the dataset as well.
New records can occupy the place of deleted records if the new records
have the same size, otherwise recordspaces are 'added'.
Obviously you havnt' POPULATIONINCR & POPULATIONWARN set because then
you get warnings that limits are reached.
/LeifJ

Paul Kimpel

unread,

Mar 20, 2009, 9:42:06 AM3/20/09

to

Assuming that the data set you mentioned is Standard (fixed format), I
suspect you will find that the number of areas has stopped growing.
DMSII keeps track of deleted record space in "DKTABLE" blocks, which are
usually stored at the end of the file. Because you deleted a lot of
records, the structure needed a lot of DKTABLE space, which in turn
probably required all of those additional rows.

As Leif mentioned, newly inserted records will use up the space of
deleted records before DMSII expands the number of rows in the file.
He's also correct that a full garbage collect on the data set should
reduce the number of rows. It will also get rid of all of the DKTABLE
blocks, at least until you start deleting records again.

If the data set is Standard (variable format), then it's possible that
you are inserting records of a different type than the ones you deleted.
Once record space for a certain record type is allocated, it will not be
used for another record type.

If the data set is Compact, it's possible that the records you are
adding are all too large for the available space in any block. DMSII
consolidates available space in a Compact block each time a record in
the block is updated. If you deleted about half your old records, you
would normally expect that the Compact blocks would now be about
half-full on average, so this scenario seems unlikely.

A full garbage collection on the data set will also resolve both of
these situations.

In any case, when your table reaches the limit of 1000 rows (or 1024, I
forget which it is), your applications will probably start receiving
LIMITERROR exceptions whenever they attempt to insert a new records into
that data set.

--
Paul

Domagoj

unread,

Mar 21, 2009, 7:18:26 AM3/21/09

to

On Fri, 20 Mar 2009 06:42:06 -0700, Paul Kimpel wrote:

> On 3/19/2009 5:35 AM, Domagoj wrote:
>> Hello all,
>>
>> In a DMSII database we have a dataset whose AreasInUse attribute reached
>> around 920 last week. As we didn't need older records, we deleted them and

>> ...

>> So I have two questions I can't answer myself:
>> 1. Why is it still growing.
>> 2. What will happen when it reaches a limit of 1000 areas (this should be
>> fairly obvious :).
>

Thank you and LeifJ for the answers. Dataset is standard, fixed format, and
is has stopped growing during the last day or two. I wasn't aware of
DKTABLE blocks which keep track of deleted records space. GC on dataset
will be done when possible.

BR,
Domagoj

Andy Mountford

unread,

Mar 23, 2009, 9:36:40 AM3/23/09

to

On Sat, 21 Mar 2009 12:18:26 +0100, Domagoj <si.e...@domagoj.inverse>
wrote:

Hi Domagoj,

There's no particular need to garbage collect just to get rid of a
DKTABLE. Doing so won't do much else but give you back some disk
space back. Whenever you add records to a standard dataset, any
entries in the DKTABLE are reused in preference to writing a "new"
record.

It is entirely possible to have a standard dataset with 1000 areas
allocated and for that dataset to be logically empty because *all*
records are deleted. You can store just as many records in that
dataset as you could if you garbage collected it and threw away the
DKTABLE. The one disadvantage of a dataset in that state is you no
longer have the "quick check" on the number of areas to warn you of
the structure getting full...

Regards,

Andy

Alin

unread,

Mar 23, 2009, 5:36:40 PM3/23/09

to

On Mar 23, 6:36 am, Andy Mountford
<andrew.mountf...@minusthis.gb.unisys.com> wrote:
> On Sat, 21 Mar 2009 12:18:26 +0100, Domagoj <si.em...@domagoj.inverse>

> Andy- Hide quoted text -
>
> - Show quoted text -

Also you slow down reads by having "empty" records in each block.

Regards - Alin

Andy Mountford

unread,

Mar 24, 2009, 7:55:59 AM3/24/09

to

On Mon, 23 Mar 2009 14:36:40 -0700 (PDT), Alin <alin...@gmail.com>
wrote:

Well....

That becomes complicated. Nothing is really slowed down as long as
you need only the same number of physical IOs to get to the record you
want. Granted if the structure is so sparsely populated you're
ploughing through blocks of deleted records during sequential access,
that's not optimal. However, in these days of metered machines, to
"cure" that (by a garbage collect) might be much more expensive than
letting it "cure itself" (by reuse of DKTABLE entries).

Andy

Domagoj

unread,

Mar 26, 2009, 1:41:00 PM3/26/09

to

On Mon, 23 Mar 2009 13:36:40 +0000, Andy Mountford wrote:

> Hi Domagoj,
>
> There's no particular need to garbage collect just to get rid of a
> DKTABLE. Doing so won't do much else but give you back some disk
> space back. Whenever you add records to a standard dataset, any
> entries in the DKTABLE are reused in preference to writing a "new"
> record.
>
> It is entirely possible to have a standard dataset with 1000 areas
> allocated and for that dataset to be logically empty because *all*
> records are deleted. You can store just as many records in that
> dataset as you could if you garbage collected it and threw away the
> DKTABLE. The one disadvantage of a dataset in that state is you no
> longer have the "quick check" on the number of areas to warn you of
> the structure getting full...
>
> Regards,
>
> Andy

I understand that, but just for these quick checks it's worth doing GC on a
dataset.

Another interesting question is: if I set POPULATION parameter of a dataset
to MAX_VALUE, does it affect anything except I will never get LIMITERROR?

Tnx.

Paul Kimpel

unread,

Mar 27, 2009, 12:02:24 AM3/27/09

to

Another consideration is the amount of time that is required to back up
the data set. All of the deleted record space, plus the DKTABLE, will be
copied when you back up the data set. If the file is large, that can
take a significant amount of time. In addition, all of the checksumming
that takes place in DMUTILITY can be a significant consumer of metered
resources. You might find that any resources you spend doing the GC can
be reclaimed by shortening the backup process.

--
Paul

Paul Kimpel

unread,

Mar 27, 2009, 12:19:03 AM3/27/09

to

You'll still get a LIMITERROR eventually. POPULATION does not directly
control the maximum number of records in a data set. It may be used in
calculating the physical attributes of a file. For example, I usually
specify POPULATION and AREASIZE, allowing DASDL to compute the
appropriate number of areas. If you specify AREAS and AREASIZE,
POPULATION is ignored for the data set (although it may still have an
impact on the sizing of the index structures for that data set).

It is really AREAS*AREASIZE that determines when you get a LIMITERROR --
when you've exceeded the specified file size, you can't add any more
records.

When you specify POPULATIONINCR, that simply defers the LIMITERROR by
automatically increasing the number of areas in the file. Once you hit
the maximum of 1000 areas, however, you will get a LIMITERROR regardless
of what was specified for POPULATION in DASDL.

The situation is a little more complex with sectioned data sets, since
each section is represented by a file, which in turn is limited to 1000
areas, Once you fill all of the areas in all of the sections, however,
you'll still get a LIMITERROR.

Andy makes a good point, but my sense is that you've deleted about half
the records from a large data set, so without any more quantitative
information to go on, I would agree with you that the GC is worthwhile.

--
Paul

Andy Mountford

unread,

Mar 27, 2009, 6:02:35 AM3/27/09

to

On Thu, 26 Mar 2009 21:02:24 -0700, Paul Kimpel <paul....@digm.com>
wrote:

Paul,

It's funny you should mention that. We're in the middle of a
discussion on that very point: what is the "pay back time" for the
MIPS expended on a GC v the reduced cost of dumping the structure?

That can be quite hard to answer, especially when testing any theory
cost you MIPS! There are downsides to metered machines :-(

Andy

Paul Kimpel

unread,

Mar 28, 2009, 1:03:41 AM3/28/09

to

I agree that question can be difficult to answer, mostly because there
are so many variables that would need to be specified. To me the major
considerations would be:

* What is the magnitude of the file size? Small files probably are
not worth it; larger files might be.

* What percentage of that file size is currently occupied by deleted
records or DKTABLE blocks? If the file is mostly full, it's not worth
it; if the file is mostly empty, it probably is worth it.

* How soon do you anticipate consuming some significant portion of
the deleted record space with new records. If this is a permanent or
long-term reduction in the number of records, a GC is probably worth the
cost; if you are going to repopulate the deleted space fairly soon, it's
probably not worth it.

* How many index structures are there for the data set? Reorganizing
the data set itself is hardly more work than backing it up (assuming the
data will not be sorted). Most of the cost will probably be in fixing up
the index structures with new record addresses.

* What kind of a reorganization run can you afford to do? Off-line
reorgs are generally much more efficient than on-line reorgs.

Another thing to consider is that at some point the cost of having
system administrations sitting around worrying about the cost of MIPS
will exceed the cost of those MIPS.

--
Paul

Andy Mountford

unread,

Mar 30, 2009, 8:31:10 AM3/30/09

to

On Fri, 27 Mar 2009 22:03:41 -0700, Paul Kimpel <paul....@digm.com>
wrote:

I agree with all your considerations and I'd add one more:

Is the whole dataset read sequentially as part of daily batch
processing? If so, that's an additional weight on the reorg side of
the scales.

As to the kind of reorg: if the structure is extended, then REORGDB
can be used and then you have a winning combination of OFFLINE reorg
speed combined with an even better availability than the original
online reorg.

Andy

Domagoj

unread,

Apr 1, 2009, 9:28:15 AM4/1/09

to

On Thu, 26 Mar 2009 21:19:03 -0700, Paul Kimpel wrote:

> It is really AREAS*AREASIZE that determines when you get a LIMITERROR --
> when you've exceeded the specified file size, you can't add any more
> records.

As AREASIZE determines a number of records (by default) in an AREA, then
the maximum number of records = 1000 * AREASIZE, while current number of
records in a dataset = AREASINUSE * AREASIZE.

POPULATION doesn't define the maximum population. If I defined only
POPULATION = 10m, the system will choose AREAS and AREASIZE parameters and
it could be initially set to 750 and 30000 respectively, meaning there is
enough room for 30000 records * 750 at the start, is that right?

Record is an entry in a dataset containing exactly one (except if null)
value for every field, so a record is what is known as a row in (let's say)
Oracle? Is there any other interpretation of a record?

Paul Kimpel

unread,

Apr 1, 2009, 11:54:18 AM4/1/09

to

In the following, I am going to assume we are talking only about
fixed-length records in DMSII data sets, since that is the most common
case. You can declare a data set to have physically variable-length
records, in which case there is not a direct relationship between
AREASIZE (as specified in records) and the number of records in an area.

AREASIZE, if specified in DASDL, absolutely determines the number of
records in an area of a file. You are correct, though, that the maximum
number of records in a DMSII file is 1000*AREASIZE, since DMSII
currently supports a maximum of only 1000 areas/file. Note, however,
that "DMSII file" and "DMSII data set" are not necessarily synonymous,
since a data set can be "sectioned" and thus consist of multiple
physical files.

The current number of records in a file for a data set is usually less
than AREASINUSE*AREASIZE, since (a) the last area may not be full, (b)
some records in allocated [in-use] areas may have been deleted, and (c)
DMSII uses some space at the end of the file to hold the DKTABLE, which
maps deleted record space within the file. It's possible that the size
of the DKTABLE could require additional rows of the file to be
allocated, but those rows would not necessarily contain data records.

You are correct that POPULATION does not define the maximum number of
records. It is used as a factor to calculate physical file attributes if
those attributes are not specified explicitly. If you choose a
POPULATION of 10M (assuming a non-sectioned data set), DMSII would
probably not choose AREAS=750 and AREASIZE=30000, since the product of
those is 22.5M. A more likely result might be AREAS=335 and
AREASIZE=30000. DMSII normally rounds up the number of areas and then
adds a few more for potential DKTABLE space. A lot depends on the
BLOCKSIZE, since areas are really constructed of multiple whole blocks,
not records (i.e., AREASIZE will be a multiple of BLOCKSIZE).

Both blocks and areas will always represent some pre-determined number
of 180-byte logical sectors on the physical disk unit. Blocks are the
unit of physical data transfer for a file between the disk and memory.
Areas are the unit of contiguous space allocation for a file on the disk.

"Record" is a file concept and "row" is a relational table concept, but
for the purpose of this discussion the two are essentially equivalent.
The real difference comes in how the two are accessed. In DMSII, you
retrieve whole records, while with a relational data base you retrieve a
projection of columns from a query.

Note, however, that DMSII was designed (for better or for worse) from
the network/hierarchical data model, not the relational model, so
conceptually a DMSII record can contain an embedded data set. Physically
it doesn't work like that, of course, but the idea that records can
contain non-atomic fields is (for better or for worse) something that is
foreign to the relational model.

DMSII records can also contain arrays and variant sub-records, so while
you can say that a relational row is conceptually similar to a DMSII
record, you can't always say that a DMSII record is conceptually similar
to a relational row.

--
Paul

Domagoj

unread,

Apr 2, 2009, 5:52:54 AM4/2/09

to

Thanks, this is helpful.

Domagoj

unread,

Apr 2, 2009, 8:08:19 AM4/2/09

to

On Mon, 23 Mar 2009 13:36:40 +0000, Andy Mountford wrote:

Andy,

There is an update procedure described in 'Enterprise Database Server ...
DASDL Programming RM', section 13. It says population change can be
achieved by compiling DASDL source, DMSUPPORT library and updating CONTROL
file.

I would like to know if there are arguments not to do this online, while
programs run against a database.

This is not for a specific problem, I'm just interested.

Thanks.

Andy Mountford

unread,

Apr 3, 2009, 8:41:24 AM4/3/09

to

On Thu, 2 Apr 2009 14:08:19 +0200, Domagoj <si.e...@domagoj.inverse>
wrote:

<snip>

>
>Andy,
>
>There is an update procedure described in 'Enterprise Database Server ...
>DASDL Programming RM', section 13. It says population change can be
>achieved by compiling DASDL source, DMSUPPORT library and updating CONTROL
>file.
>
>I would like to know if there are arguments not to do this online, while
>programs run against a database.
>
>This is not for a specific problem, I'm just interested.
>
>Thanks.

Domagoj,

Assuming the structure is EXTENDED, I can think of no reason why you
couldn't do that using REORGDB.

I think you'd probably have to GENERATE the dataset so it would become
a combined {population change/garbage collect}. (Of course, if you
changed the AREASIZE then you'd have to GENERATE the structure
anyway.)

Andy

sbr...@uspto.gov

unread,

Apr 7, 2009, 11:14:36 AM4/7/09

to

> In a DMSII database we have a dataset whose AreasInUse attribute reached
> around 920 last week.

> Domagoj

Stop right there!! Metered system or not, as a matter of policy in
most cases no DMSII structure should be permitted to have more than
900 areas in use for any significant period of time. The only good
way to eliminate those infernal limiterrors is to immediately reorg
any structure that approaches the red zone; that is, 900 areas.

Also, note that for standard data sets the practical limit is neither
1,000 or 1,024 as some have stated above. Note this passage in the
DASDL manual: "For standard data sets, the DASDL compiler allocates
system areas for available space tables. Enough available space is
allocated to avoid run-time limit errors even if all records in the
data set are deleted. A syntax error is returned if the combination of
system and user areas exceeds 1000 areas."

Our experience has been that the accessroutines--apparently because it
never wishes to run out of areas on a delete, only a store--for each
structure enforces two separate limits, one for system areas and one
for data areas. So if the DASDL compiler has computed a need for 100
system areas for a structure then any attempt to have the system areas
exceed 100 - OR - the data areas to exceed 900 will result in a
limiterror. In the limiterrors that this site has experienced, not
one has been with the areas in use at or very near 1,000. (If anyone
can inform me how to programmatically determine the quantity of system
areas NOT yet allocated, I would really appreciate it.)

Domagoj

unread,

Apr 7, 2009, 12:30:08 PM4/7/09

to

Thanks for joining the discussion. We monitor the areasinuse parameter of
all datasets and, when it's somewhere around 850 - 900, decide whether to
clean up or increase dataset population parameter. Sometimes though, an
unusual amount of data enter datasets so it can get over 900 (which we also
consider the red zone).

I can't tell when usually limit error occurs but we usually know that it
will happen if we don't react. But I have seen datasets with areasinuse
over 950 running without problem. Anyway your remark is worth considering.

Andy Mountford

unread,

Apr 8, 2009, 6:12:43 AM4/8/09

to

On Tue, 7 Apr 2009 08:14:36 -0700 (PDT), sbr...@uspto.gov wrote:

I agree that ACR's "handling of areas" is not well documented. If
we're considering a standard dataset then it is true to say that the
sum of the user areas and the system areas cannot exceed 1000.

It is a requirement that the entire population of the structure can be
deleted and the dataset stay intact, so the user areas cannot exceed a
certain limit.

That limit is calcuated by the following formula

((29000 * segments_per_area) - 57))
DIV
((segments_per_area * 29) + (area_size_in_blocks * records_per_block))

The DASDL compiler can give you all the relevant values (or, for those
old-timers who remember it, DBANALYZER). DBAtools used not to be able
to give you the info but things may have moved on.

(I have supplied several sites with 'modern' versions of DBANALYZER if
that would be of any assistance...)

Andy

Vern

unread,

Apr 9, 2009, 11:04:24 AM4/9/09

to

Does this make ya'll nervous ?

LFILE *TUPHIS/UBHSX/UBHSEDLG ON DBASE1 : AREAS AREASIZE AREASINUSE
LASTRECORD
#RUNNING
7963
#?
ON
DBASE1
*TUPHIS :
DIRECTORY
. UBHSX :
DIRECTORY
. . UBHSEDLG : DBDATA AREALENGTH=24000 (800 RECORDS) AREAS=1000
AREASINUSE=998
LASTRECORD=797624 (798400
SECTORS)

For the past 3 months LASTRECORD and AREASINUSE has stayed constant.
The users have purged alot of entries so I ssume there are alot of
DKTABLE entries.

Another case here that hasn't changed either:

LFILE *TUPPRD/UDGRX ON DBASE1 : AREAS AREASIZE AREASINUSE
LASTRECORD

#RUNNING
8412
#?
ON
DBASE1
*TUPPRD :
DIRECTORY
. UDGRX :
DIRECTORY
. . DATA : DBDATA (IN USE) AREALENGTH=35910 (1197 RECORDS)
AREAS=1000
AREASINUSE=969 LASTRECORD=1158709 (1159893
SECTORS)

I'm rolling the dice as we are a manufacturing company with 7x24 plant
operations and it takes an act of GOD to get down time. Users expect
100 % uptime on our Libra.

They sure don't get it on the Winders and Unix platforms !

I get the operators to open up an incident whenever a new area is
opened up and do weekly reports on areas in use and keep a watchlist
via a spreadsheet.

- Vern

nicolas.oc...@gmail.com

unread,

Apr 15, 2009, 1:07:40 AM4/15/09

to

hola todo el grupo.

Necesito ayuda.

Donde puedo encontrar documentacion para analizar la poblacion de un
DATASET

Gracia

Domagoj

unread,

Apr 15, 2009, 9:53:25 AM4/15/09

to

On Wed, 08 Apr 2009 11:12:43 +0100, Andy Mountford wrote:

> I agree that ACR's "handling of areas" is not well documented. If
> we're considering a standard dataset then it is true to say that the
> sum of the user areas and the system areas cannot exceed 1000.
>
> It is a requirement that the entire population of the structure can be
> deleted and the dataset stay intact, so the user areas cannot exceed a
> certain limit.
>
> That limit is calcuated by the following formula
>
> ((29000 * segments_per_area) - 57))
> DIV
> ((segments_per_area * 29) + (area_size_in_blocks * records_per_block))
>
> The DASDL compiler can give you all the relevant values (or, for those
> old-timers who remember it, DBANALYZER). DBAtools used not to be able
> to give you the info but things may have moved on.
>
> (I have supplied several sites with 'modern' versions of DBANALYZER if
> that would be of any assistance...)
>
> Andy

When choosing the size of the area (records per area), what would one need
to consider? How does this choice affect performance (if any)?

Regarding population/areasize parameters, how do they affect performance?
If I know a dataset will just grow and I don't want to limit it, can I set
some really big numbers to ensure I will not have to check it every now and
then in the next year or more?

SteveT

unread,

Apr 15, 2009, 2:01:38 PM4/15/09

to

> <nicolas.oc...@gmail.com> wrote in message
> news:ea0e2313-2ca1-46b3...@k2g2000yql.googlegroups.com...
> hola todo el grupo.

> Necesito ayuda.

> Donde puedo encontrar documentacion para analizar la poblacion de un
> DATASET

> Gracia

(For those of you unfamiliar with Spanish, Nicolas is asking for help in
finding documentation regarding analyzing the population of a Dataset).
Buenos dias, Nicolas,
Espanol (Spanish): Hay un Web site en donde usted puede encontrar la
documentación en los productos de Unisys:
http://epas1.rsvl.unisys.com/common/epa/home.aspx. Lamento que no pueda
decirle específicamente qué documento tiene la información que usted desea.
Esperanzadamente alguien más bien informado que mí vendré adelante ayudarle
más lejos.
¡Buena suerte!

English (Ingles): There is a web site where you can find documentation
on Unisys products: http://epas1.rsvl.unisys.com/common/epa/home.aspx. I
regret that I can not tell you specifically which document has the
information you want. Hopefully someone more knowledgeable than I will come
along to help you further.

Translation services provided by: WorldLingo
(http://www2.worldlingo.com/microsoft/computer_translation.html).

Paul Kimpel

unread,

Apr 15, 2009, 8:47:14 PM4/15/09

to

The answer depends quite a bit on the type of population analysis that
is desired. The simplest analysis is to determine the number of
non-deleted records in a data set, and that is what I will reply to here.

In the absence of any other tools (such as dbaTools, OLE DB, or DMSQL)
the easiest way to determine the current population of a data set is to
write a small COBOL or Algol program to read through the records and
count them.

If you have an OLE DB interface configured for the DMSII data base, you
can use Microsoft SQL Server (via a linked server to the MCP OLE DB
provider) to count the records using a SQL "select count(*) from..."
statement.

Similarly, if you have the relatively new DMSQL interface configured for
the DMSII data base, you can use DMSQL to do a "select count(*) from..."
statement.

Deeper analysis of data set population requires knowing the type of data
set and something about how DMSII data sets are structured internally.
Appendix C of the DASDL manual describes DMSII file structures in some
detail. The latest version of that manual can be found here:

http://public.support.unisys.com/aseries/docs/ClearPath-MCP-12.0/PDF/86000213-411.pdf

Unisys sells a database monitoring and analysis package named "dbaTools"
which has extensive features for analyzing DMSII data sets and other
structures. It is not inexpensive, but arguably is the best tool for
this sort of analysis.

If this information does not answer your question, please post a more
specific question to the newsgroup and we will try to help you.

--
Paul

Paul Kimpel

unread,

Apr 15, 2009, 10:11:28 PM4/15/09

to

I think there are three main things about file areas that you need to
keep in mind when determining the trade-off between areas and areasize,
at least in the context of DMSII data sets.

1. Regardless of what you specify for an areasize, areas for DMSII
tables are always composed of whole blocks. I think DASDL rounds
areasize up if necessary to accommodate this.

2. Areas are composed of a contiguous number of logical sectors. In
the old days, when a disk unit was a physical box on the floor and
sectors were 180 bytes, that meant that an area consisted of some number
of physically-contiguous sectors. With smarter disk architectures like
RAID and SAN, it's now more difficult to say exactly what is where on
which disk, but in the context of the logical sectoring as it is
presented to the MCP I/O subsystem, areas still consist of some number
of contiguous sectors. For version 6 and 7 disk file headers, an area is
limited to 2^33-1 logical sectors (about 1.5 TB), or the size of the
logical disk unit, whichever is smaller.

3. There is a fixed upper limit on the number of areas a file can
have. Currently for DMSII, that limit is 1000 areas per file (for other
files it is 15,000 areas). Note that this limit is PER PHYSICAL FILE,
not necessarily PER DATA SET, as extended (DMSII-XE) data sets can be

sectioned and thus consist of multiple physical files.

My general preference is to have a smaller number of larger areas,
especially for large data sets, and especially for large data sets that
grow. The advantages are:

* You need fewer areas, and you run out of them at a slower rate.

* Long sequential scans on the data set are somewhat more efficient
(fewer discontinuities in sequential reads due to area switches). Any
advantage here, though, usually disappears if the data set it being hit
by multiple tasks at the same time.

* Disk file headers are smaller (each area requires one word in the
disk file header structure, which is resident in memory when a file is
open). Hundreds of files times thousands of areas is a lot of memory,
even on today's systems.

The major disadvantage is that larger areas can be more difficult to
allocate, especially if the available space on the family is heavily
checkerboarded. For this reason, I like to design areasizes so that they
are a near-multiple of some fairly large number of sectors (1000, 2000,
5000, 10000 -- something like that). Checkerboarding is caused by
deleting (or sometimes moving) files. With areasizes composed of a
standard multiple number of sectors, it's more likely that the areas for
deleted/moved files will coalesce into ones that will be usable by new
area allocation activity. With today's large disks, however,
checkerboarding is less of a problem that in was years ago.

In following this thread over the past few weeks, my conclusion is that
the areasizes for the data sets under discussion are way, way too small.
What often happens is that the data set was designed many years ago for
an expected population, and that design decision was never revisited as
it (inevitably) became obsolete. Another common situation is that the
data set was designed when disks were smaller and the potential for
space allocation problems with larger areas was more significant, so
smaller areas were chosen. A third (and all too common) situation is
that the file attributes were never designed in the first place, and the
sucker was just allowed to grow until there was a problem.

POPULATIONWARN and POPULATIONINCR are very nice features to have, but
they are not a substitute for considering the expected active population
and growth rate for a data set, and planning accordingly. Of course,
times change, and along with them requirements placed on the data base
and its structures. In those cases, you simply need to take the time to
reorg the files and apply areas/areasize values that can be expected to
have some longevity.

This thread got started on the subject of garbage collection. A
properly-designed DMSII table or index structure should not need to be
garbage collected on a regular basis. If you need to do a GC, you
probably need to reassess the physical attributes of the tables and
indexes involved and do a file format reorganization instead. One
significant case is where a GC can be beneficial is when a large number
of records have been purged from the data set, but in most other cases
doing just a GC is a lot like raking leaves into the wind.

While you're at it when doing a reorg, set the EXTENDED attribute for
the data set. With that attribute set, you will be able to do "reorg DB"
reorgs in the future, which can dramatically reduce (and perhaps
eliminate) the need for downtime.

So, to try to answer Domagoj's question directly, for tables that you
expect to grow, I would design a large enough areasize that you give
yourself at least a few years' of capacity. As long as you are
essentially adding rows to the disk family, you should not need to worry
about checkerboarding, so crank up the areasize.

In the absence of specific requirements, I would pick a "design limit"
population for the table that you know will be good for some number of
years (at least five, if possible), and then pick an areasize that
yields no more than 200 areas, and preferably no more than 100 areas.

Getting concerned when the number of allocated areas exceeds 900 is
getting concerned too late. You need to start getting concerned when the
active population of your data sets exceeds your design limit, even if
the population could more than quadruple before you would run out of areas.

If you have a data set that is expected to grow without limit, you
probably need to find out why. It usually means your application needs a
data retention and archiving policy, not more disk space.

--
Paul

Domagoj

unread,

Apr 20, 2009, 3:35:34 AM4/20/09

to

On Wed, 15 Apr 2009 19:11:28 -0700, Paul Kimpel wrote:

> So, to try to answer Domagoj's question directly, for tables that you
> expect to grow, I would design a large enough areasize that you give
> yourself at least a few years' of capacity. As long as you are
> essentially adding rows to the disk family, you should not need to worry
> about checkerboarding, so crank up the areasize.
>
> In the absence of specific requirements, I would pick a "design limit"
> population for the table that you know will be good for some number of
> years (at least five, if possible), and then pick an areasize that
> yields no more than 200 areas, and preferably no more than 100 areas.
>
> Getting concerned when the number of allocated areas exceeds 900 is
> getting concerned too late. You need to start getting concerned when the
> active population of your data sets exceeds your design limit, even if
> the population could more than quadruple before you would run out of areas.
>
> If you have a data set that is expected to grow without limit, you
> probably need to find out why. It usually means your application needs a
> data retention and archiving policy, not more disk space.

Thanks, I find your answer very helpful.