Yet another [Un]Zip behavior quirk. Non-stupid opinions sought.

s...@antinode.org

unread,

Nov 23, 2004, 12:04:11 AM11/23/04

to

In my large-file Info-ZIP [Un]Zip testing, I've discovered some
behavior in UnZip which I find troubling.

Currently, when expanding a "-V" archive, UnZip fills in the FAB/RAB
data for each output data file from the RMS attributes stored in its PK
(modern) or IM (old) extra field in the archive. These data include
fab$l_alq, the "allocation quantity", and using this value causes disk
space for the entire file to be allocated at once when the output file
is created.

This is probably most efficient, as no file extension will ever be
needed, but when allocating a large file, some unpleasant things happen.
The allocation seems to monopolize the destination disk drive for some
considerable time (for example, about 11.5 minutes for a 5GB file on an
otherwise idle PWS 500a[u], QLogic ISP1040B KZPBA-CX, FUJITSU MAF3364L
SUN36G (wide), VMS V7.3-1). Also, once the allocation has begun,
interrupting the UnZip program apparently does not interrupt the
allocation, so the disk may be tied up for a long time, irregardful.

My questions are:

1. Has this bothered anyone else?

2. Does anyone else expect to be bothered by it?

3. Does anyone, using any software, do such large file allocations
this way? (My _page_ files are not this big, and SYSGEN CREATE
is probably as close as I've previously come to making anything
so big at one time. I use UnZip considerably more often than
SYSGEN CREATE, too, though, like everyone else, I haven't been
doing files so big.)

Currently, when expanding a non-V archive, default RMS parameters are
used, and this makes for relatively slow extraction/creation (extension)
of a large data file. Raising the initial allocation and default
extension quantity seems to help considerably (about 2X), so this will
probably be included in the final UnZip 6.0 release.

Limiting the initial allocation for data files extracted from a -V
archive involves more work and more risk of undesired side-effects, so
I'm reluctant to dive in without some justification (such as whining
complaints from users).

And this time, please, if you don't know anything, don't tell me how
it works now, how it must work in the future, or how it's too dangerous
to touch the code. In hopeful anticipation, I offer my thanks.

------------------------------------------------------------------------

Steven M. Schweda (+1) 651-699-9818
382 South Warwick Street sms@antinode-org
Saint Paul MN 55105-2547

Eberhard Heuser-Hofmann

unread,

Nov 23, 2004, 3:14:33 AM11/23/04

to

In article <0411222...@antinode.org>, s...@antinode.org writes:
> In my large-file Info-ZIP [Un]Zip testing, I've discovered some
>behavior in UnZip which I find troubling.
>
> Currently, when expanding a "-V" archive, UnZip fills in the FAB/RAB
>data for each output data file from the RMS attributes stored in its PK
>(modern) or IM (old) extra field in the archive. These data include
>fab$l_alq, the "allocation quantity", and using this value causes disk
>space for the entire file to be allocated at once when the output file
>is created.
>
> This is probably most efficient, as no file extension will ever be
>needed, but when allocating a large file, some unpleasant things happen.
>The allocation seems to monopolize the destination disk drive for some
>considerable time (for example, about 11.5 minutes for a 5GB file on an
>otherwise idle PWS 500a[u], QLogic ISP1040B KZPBA-CX, FUJITSU MAF3364L
>SUN36G (wide), VMS V7.3-1). Also, once the allocation has begun,
>interrupting the UnZip program apparently does not interrupt the
>allocation, so the disk may be tied up for a long time, irregardful.
>
> My questions are:
>
> 1. Has this bothered anyone else?

I have noticed the same behavior with my DVDwrite-program:

DVDwrite allows me to copy a complete disk/CD/DVD into a file:
$ dvdwrite/dump disk: disk1:[dir]dumped.dsk

If the source disk is large (I can burn 8 GB on a DVD+R DL) it takes
a incredibly long time to allocate the complete file of again 8 GB.
There is no disk access possible by other processes until the allocation
is
finished.

eberhard

Roy Omond

unread,

Nov 23, 2004, 3:17:12 AM11/23/04

to

s...@antinode.org wrote:
> In my large-file Info-ZIP [Un]Zip testing, I've discovered some
> behavior in UnZip which I find troubling.
>
> Currently, when expanding a "-V" archive, UnZip fills in the FAB/RAB
> data for each output data file from the RMS attributes stored in its PK
> (modern) or IM (old) extra field in the archive. These data include
> fab$l_alq, the "allocation quantity", and using this value causes disk
> space for the entire file to be allocated at once when the output file
> is created.
>
> This is probably most efficient, as no file extension will ever be
> needed, but when allocating a large file, some unpleasant things happen.
> The allocation seems to monopolize the destination disk drive for some
> considerable time (for example, about 11.5 minutes for a 5GB file on an
> otherwise idle PWS 500a[u], QLogic ISP1040B KZPBA-CX, FUJITSU MAF3364L
> SUN36G (wide), VMS V7.3-1). Also, once the allocation has begun,
> interrupting the UnZip program apparently does not interrupt the
> allocation, so the disk may be tied up for a long time, irregardful.

> [...snip...]

Highwater-marking enabled on the output volume ?

s...@antinode.org

unread,

Nov 23, 2004, 3:39:43 AM11/23/04

to

From: Roy Omond <Roy....@BlueBubble.UK.Com>

> Highwater-marking enabled on the output volume ?

Yes, as it's the default. ("Affects Files-11 On-Disk Structure Level
2 disks only" What's true on ODS5 disks?)

From: vax...@chclu.chemie.uni-konstanz.de (Eberhard Heuser-Hofmann)

> I have noticed the same behavior with my DVDwrite-program:
>
> DVDwrite allows me to copy a complete disk/CD/DVD into a file:
> $ dvdwrite/dump disk: disk1:[dir]dumped.dsk
>
> If the source disk is large (I can burn 8 GB on a DVD+R DL) it takes
> a incredibly long time to allocate the complete file of again 8 GB.
> There is no disk access possible by other processes until the allocation
> is finished.

That's what I saw. I could tell when the allocation was finished, as
that's when a DIR would un-hang itself.

In today's trials (on the PWS 500a[u]), UnZip of a 4.2GB archive
containing two 5GB data files took about 57:45 when it allocated the
whole thing, about 58:15 with deq = 64K, and about 62:30 with deq = 16K.
With all default parameters, the same job took about 103:45.

Given the major annoyance with and the small benefit from the
full-allocation method, my current plan is to use a default deq = 64K,
and do full allocation up to twice the deq. (Unlike Zip creating the
archive, when UnZip creates a data file, it knows the size before it
starts.)

Thanks for the comments.

I'm starting to think that reforming the -V methods is sounding more
and more like a good idea (although not any easier). I suppose that I
could save the original parameters, change them, write the file, restore
the original parameters, and then close the file. If that actually
worked, it might not be too difficult, but there is the question of the
over-allocated (past EOF) files, which would need to be checked
carefully.

Roy Omond

unread,

Nov 23, 2004, 4:26:32 AM11/23/04

to

s...@antinode.org wrote:

> From: Roy Omond <Roy....@BlueBubble.UK.Com>
>
>>Highwater-marking enabled on the output volume ?
>
> Yes, as it's the default. ("Affects Files-11 On-Disk Structure Level
> 2 disks only" What's true on ODS5 disks?)

Try switching it off if you don't "need" it and repeat your tests.

$ Set Volume/NoHigh xxxx:

Please tell us the results ...

Roy Omond

unread,

Nov 23, 2004, 4:56:22 AM11/23/04

to

s...@antinode.org wrote:

> From: Roy Omond <Roy....@BlueBubble.UK.Com>
>
>>Highwater-marking enabled on the output volume ?
>
> Yes, as it's the default. ("Affects Files-11 On-Disk Structure Level
> 2 disks only" What's true on ODS5 disks?)

From VMS 7.3-1:

$ help set volume /high

SET

VOLUME

/HIGHWATER_MARKING

/HIGHWATER_MARKING
/NOHIGHWATER_MARKING

Determines whether the file highwater mark (FHM) volume attribute
is set. The FHM attribute guarantees that a user cannot read data
that was not written by the user. Applies to Files-11 On-Disk
Structure Level 2 (ODS-2) and 5 (ODS-5) volumes only.

So that's without a doubt the explanation for your observed behaviour.

Switch it off, and your UNZIP will fly ;-)

Eberhard Heuser-Hofmann

unread,

Nov 23, 2004, 6:19:58 AM11/23/04

to

In article <30gfu2F...@uni-berlin.de>, Roy Omond

Speed vs. Security - That's a perfectUn*x philosophy.

eberhard

Roy Omond

unread,

Nov 23, 2004, 6:26:16 AM11/23/04

to

Eberhard Heuser-Hofmann wrote:

> In article <30gfu2F...@uni-berlin.de>, Roy Omond
> <Roy....@BlueBubble.UK.Com> writes:
>

> [...snip...]

>
>>So that's without a doubt the explanation for your observed behaviour.
>>
>>Switch it off, and your UNZIP will fly ;-)
>
> Speed vs. Security - That's a perfectUn*x philosophy.

Dog forbid that *I* ever be accused of Un*x philosophy ...

That's why I originally mentioned: "if you don't *need* it".
If that level of security is *required*, then you have to
expect the observed behaviour (and live with it).

If applicable, have a single volume with high-water marking
switched off for such situations. The test is still worth
doing, and the results will hopefully prove, ahem, "interesting".

JF Mezei

unread,

Nov 23, 2004, 7:16:39 AM11/23/04

to

Eberhard Heuser-Hofmann wrote:
> > /HIGHWATER_MARKING

> Speed vs. Security - That's a perfectUn*x philosophy.

With it on, you double the time it takes to extend a file since you have a
first pass to write 0s over all new blocks, and then your application starts
to fill those blocks one by one.

You can compensate with SET VOLUME /ERASE_ON_DELETE which is the mirror image
for /HIGHWATER_MARKING (zaps data on delete instead of on allocate).

Eberhard Heuser-Hofmann

unread,

Nov 23, 2004, 8:09:31 AM11/23/04

to

In article <30gl6kF...@uni-berlin.de>, Roy Omond

<Roy....@BlueBubble.UK.Com> writes:
>Eberhard Heuser-Hofmann wrote:
>
>> In article <30gfu2F...@uni-berlin.de>, Roy Omond
>> <Roy....@BlueBubble.UK.Com> writes:
>>
>> [...snip...]
>>
>>>So that's without a doubt the explanation for your observed behaviour.
>>>
>>>Switch it off, and your UNZIP will fly ;-)
>>
>> Speed vs. Security - That's a perfectUn*x philosophy.
>
>Dog forbid that *I* ever be accused of Un*x philosophy ...
>

I'm sorry, but it reminded me of the famous sentence:
"Sure the file system isn't save at all but look who fast it is."

>That's why I originally mentioned: "if you don't *need* it".
>If that level of security is *required*, then you have to
>expect the observed behaviour (and live with it).
>
>If applicable, have a single volume with high-water marking
>switched off for such situations. The test is still worth
>doing, and the results will hopefully prove, ahem, "interesting".
>

This leeds me to the question:

Is there any idea of speed improvement when allocating a big file on
highwater marked disks?

Eberhard

Craig A. Berry

unread,

Nov 23, 2004, 8:58:00 AM11/23/04

to

In article <0411222...@antinode.org>, s...@antinode.org wrote:

> Currently, when expanding a "-V" archive, UnZip fills in the FAB/RAB
> data for each output data file from the RMS attributes stored in its PK
> (modern) or IM (old) extra field in the archive. These data include
> fab$l_alq, the "allocation quantity", and using this value causes disk
> space for the entire file to be allocated at once when the output file
> is created.
>
> This is probably most efficient, as no file extension will ever be
> needed, but when allocating a large file, some unpleasant things happen.
> The allocation seems to monopolize the destination disk drive for some
> considerable time (for example, about 11.5 minutes for a 5GB file on an
> otherwise idle PWS 500a[u], QLogic ISP1040B KZPBA-CX, FUJITSU MAF3364L
> SUN36G (wide), VMS V7.3-1). Also, once the allocation has begun,
> interrupting the UnZip program apparently does not interrupt the
> allocation, so the disk may be tied up for a long time, irregardful.
>
> My questions are:
>
> 1. Has this bothered anyone else?

It has bothered me. However, I think it would bother me more if I
didn't find out until after the file was half written that I couldn't
allocate sufficient space. I regard the current behavior as a design
to preserve data integrity by decreasing the chances that an incomplete
file will be left on disk after a failed unzip operation.

bri...@encompasserve.org

unread,

Nov 23, 2004, 9:13:19 AM11/23/04

to

In article <41A32A1E...@teksavvy.com>, JF Mezei <jfmezei...@teksavvy.com> writes:
> Eberhard Heuser-Hofmann wrote:
>> > /HIGHWATER_MARKING
>
>> Speed vs. Security - That's a perfectUn*x philosophy.
>
> With it on, you double the time it takes to extend a file since you have a
> first pass to write 0s over all new blocks, and then your application starts
> to fill those blocks one by one.

One of us has misunderstood how high water marking is implemented.

It is not "erase on extend". It is "erase when the highwater mark
moves". For ordinary sequential files, the highwater mark is the
highest block within the file that has ever been accessed. It is
not the same as the end-of-file block. It is not the same as the
last allocated block.

You can extend a 10,000 block file by 90,000 blocks and incur
no overhead. You can (I believe), move the end-of-file pointer
out to block 100,000 in the resulting file and still incur no
overhead. But if you try to read from block 100,000, you'd better
sit back and wait as the file system erases 90,000 blocks for you.

> You can compensate with SET VOLUME /ERASE_ON_DELETE which is the mirror image
> for /HIGHWATER_MARKING (zaps data on delete instead of on allocate).

Both techniques are intended to prevent "disk scavenging" -- a user
allocating free space that has not been overwritten and reading
sensitive data.

Erase on delete takes the obvious approach. Erase the data when files
are deleted and you've blocked the attack. Your free space is
clean. The downside is that it slows down file deletion.

Highwater marking is intended to optimize away much of the erasure
overhead. Instead of erasing the data right away, the system waits
and erases it just before the next user tries to read it. As long
as you have good control over the volume (nobody dismounts the
pack and walks away with it and nobody turns off highwater marking),
you can get equivalent security with much less overhead.

For each file on disk, the system maintains a highwater mark. Data
below the mark is clean. It has either been written by the user or
has already been erased by the system. Data above the mark is dirty.
It still contains stale and potentially scavengable information.

If a user attempts to read above the highwater mark, the high
water mark is moved and both the accessed block and the intervening
blocks are erased before the read is allowed to proceed.

If a user attempts to write above the highwater mark, the
intervening blocks are erased and the write is allowed to proceed.

In the typical case of a sequential file that is written in
sequential order, this algorithm completely avoids the need for
any disk erasure.

That's the way it's supposed to work anyway.

If you change from a highwater marking anti-scavenging strategy to
an erase-on-delete anti-scavenging strategy, you are vulnerable
to scavenging on the space that was free when you did the cutover.
If you are paranoid, you might want to allocate all that space
and then do a $ DELETE /ERASE.

In the case at hand (5 minute delay waiting for humonguous file
to be created), I would speculate that we're looking at a volume
lock while the file is being extended (contiguous best try?) and
the file system is going crazy trying to come up with extents
in the volume bit map. I can't see any reason for the file system
to hold the volume locked while doing a highwater erasure.

John Briggs

David B Sneddon

unread,

Nov 23, 2004, 9:14:15 AM11/23/04

to

Eberhard Heuser-Hofmann was overheard to say:

> This leeds me to the question:
>
> Is there any idea of speed improvement when allocating a big file on
> highwater marked disks?
>
> Eberhard

The difference is HUGE. The time can come down from many minutes to
a matter of seconds. I usually get caught with this when I set up
a new system and forget to set NOHIGHWATER_MARKING before
creating pagefiles... just go and get a coffee.

zen_FTA11> show time
23-NOV-2004 13:59:53
zen_FTA11> sysgen create rubbish.bin/size=5000000
%SYSGEN-I-CREATED, DBS0:[SCRATCH]RUBBISH.BIN;1 created
zen_FTA11> show time
23-NOV-2004 14:00:02
zen_FTA11> del rubbish.bin;
zen_FTA11> set volume dka100/highwater_marking
zen_FTA11> show time
23-NOV-2004 14:00:27
zen_FTA11> sysgen create rubbish.bin/size=5000000
%SYSGEN-I-CREATED, DBS0:[SCRATCH]RUBBISH.BIN;1 created
zen_FTA11> show time
23-NOV-2004 14:10:39
zen_FTA11>

9 sec as opposed to 10 min 12 sec

Regards,
Dave.
--
David B Sneddon (dbs) VMS Systems Programmer dbsn...@bigpond.com
Sneddo's quick guide ... http://www.users.bigpond.com/dbsneddon/
DBS freeware http://www.users.bigpond.com/dbsneddon/software.htm

Stanley F. Quayle

unread,

Nov 23, 2004, 9:20:34 AM11/23/04

to

On 23 Nov 2004 at 7:58, Craig A. Berry wrote:

> [...] However, I think it would bother me more if I didn't find out

> until after the file was half written that I couldn't allocate
> sufficient space.

Well, you could allocate space in chunks (to allow other things to
run), but check SYS$GETDVI after each to make sure there's still
space. Sure, you still run the risk of running out at the last
minute due to other processes taking up space. That's a "false
negative".

Of course, if you detect that you're not going to have room, then
what? Abort the program? Might be a "false positive".

Guess you could sum this up as "between a rock and a hard place".

--Stan Quayle
Quayle Consulting Inc.

----------
Stanley F. Quayle, P.E. N8SQ +1 614-868-1363
8572 North Spring Ct., Pickerington, OH 43147 USA
stan-at-stanq-dot-com http://www.stanq.com

JF Mezei

unread,

Nov 23, 2004, 10:04:34 AM11/23/04

to

bri...@encompasserve.org wrote:
> moves". For ordinary sequential files, the highwater mark is the
> highest block within the file that has ever been accessed. It is
> not the same as the end-of-file block. It is not the same as the
> last allocated block.

Where does the "highwater" mark reside ? Is it in the file header ?

if one had read-only access to the file, and one attempst to read past the
high water mark, does this mean that one magically gets write access to the
file while the system is busy raising the highwater mark and then rewrites the
header to reflect the new location ?

John Laird

unread,

Nov 23, 2004, 10:06:48 AM11/23/04

to

On 23 Nov 2004 08:13:19 -0600, bri...@encompasserve.org wrote:

>One of us has misunderstood how high water marking is implemented.
>
>It is not "erase on extend". It is "erase when the highwater mark
>moves". For ordinary sequential files, the highwater mark is the
>highest block within the file that has ever been accessed. It is
>not the same as the end-of-file block. It is not the same as the
>last allocated block.
>
>You can extend a 10,000 block file by 90,000 blocks and incur
>no overhead. You can (I believe), move the end-of-file pointer
>out to block 100,000 in the resulting file and still incur no
>overhead. But if you try to read from block 100,000, you'd better
>sit back and wait as the file system erases 90,000 blocks for you.

And if you don't read but instead your process dies, then what is left in
the file to indicate to the next reader that those blocks were never written
or safely zeroed ? The only clue in the file header is the EOF position.

My *guess* would be that when EOF is updated, either all file extents or all
file blocks from the current EOF block to the new one are initialised. The
file creation could trigger this, I suppose. One huge extent could = very
long creation time.

The OP could investigate if the SQO bit is set on his file - I found some
Google refs suggesting that if this is set, then the unwanted delays are
eliminated (which kinda blows holes in the initialize-by-extent theory).
SQO won't work if UNZIP writes randomly around the file, however, so look
out for fseek and fpos type calls.

--
We die only once, and for such a long time.

Mail john rather than nospam...

bri...@encompasserve.org

unread,

Nov 23, 2004, 10:35:14 AM11/23/04

to

In article <41A3516C...@teksavvy.com>, JF Mezei <jfmezei...@teksavvy.com> writes:
> bri...@encompasserve.org wrote:
>> moves". For ordinary sequential files, the highwater mark is the
>> highest block within the file that has ever been accessed. It is
>> not the same as the end-of-file block. It is not the same as the
>> last allocated block.
>
> Where does the "highwater" mark reside ? Is it in the file header ?

Yes.

From SYS$LIBRARY:LIB.MLB in module $FH2DEF:

$EQU FH2$L_HIGHWATER 76

> if one had read-only access to the file, and one attempst to read past the
> high water mark, does this mean that one magically gets write access to the
> file while the system is busy raising the highwater mark and then rewrites the
> header to reflect the new location ?

The "writing" is done automatically by the file system and is not
affected by the user's file permissions.

I don't see any opportunity for exploitation here. The virtual picture
of the file as presented to the users is that of a pre-zeroed array
of blocks. The time as which the physical zeroing takes place is virtually
irrelevant.

Hmmm. I wonder how things are handled on read-only volumes.
One would hope that the read is optimized away and a zeroed buffer
returned.

John Briggs

John Laird

unread,

Nov 23, 2004, 11:07:19 AM11/23/04

to

On 23 Nov 2004 09:35:14 -0600, bri...@encompasserve.org wrote:

>In article <41A3516C...@teksavvy.com>, JF Mezei <jfmezei...@teksavvy.com> writes:
>> bri...@encompasserve.org wrote:
>>> moves". For ordinary sequential files, the highwater mark is the
>>> highest block within the file that has ever been accessed. It is
>>> not the same as the end-of-file block. It is not the same as the
>>> last allocated block.
>>
>> Where does the "highwater" mark reside ? Is it in the file header ?
>
>Yes.
>
>From SYS$LIBRARY:LIB.MLB in module $FH2DEF:
>
>$EQU FH2$L_HIGHWATER 76

Please ignore the gibberish in my earlier reply ;-)

(I thought I'd scanned all the likely macro files. Is this ever reported by
dump/header ?)

--
Stop the world! I want to get off!

John Laird

unread,

Nov 23, 2004, 11:28:41 AM11/23/04

to

On 23 Nov 2004 10:18:28 -0600, bri...@encompasserve.org wrote:

>In article <k1k6q05fb6b8k5b1b...@4ax.com>, John Laird <nos...@laird-towers.org.uk> writes:
>
>>The only clue in the file header is the EOF position.
>

>Or the high water mark in the file header.

Noted, thanks.

--
Yesterday's flower children are today's blooming idiots.

JF Mezei

unread,

Nov 23, 2004, 11:19:30 AM11/23/04

to

bri...@encompasserve.org wrote:
> > Where does the "highwater" mark reside ? Is it in the file header ?

Thanks. finally got to the VMS documentation. Not sure if there is more
complete one.

Here is relevant text:
For nonshared sequential files, the performance impact of high-water
marking is minimal. However, for files of nonsequential format, high-water
marking creates some overhead; the system erases the previous
contents of the disk blocks allocated every time a file is created or
extended.

So, if one were to create a huge empty sequential file, the HW mark would be
at block 0 with the remainder containing confidential payroll formerly from
another user, right ?

Now, if you were to use SET FILE/ATTRIB=ORG=REL , the file would now be in a
state where normally all blocks allocated woudl have been zeroed at
allocation. Would the system still see that the HW mark is still at bolock 0
and then zap blocks 0 to 20 when you tried to read block 20 ?

s...@antinode.org

unread,

Nov 23, 2004, 1:30:57 PM11/23/04

to

From: Roy Omond <Roy....@BlueBubble.UK.Com>

> >>Highwater-marking enabled on the output volume ?
> >
> > Yes, as it's the default. ("Affects Files-11 On-Disk Structure Level
> > 2 disks only" What's true on ODS5 disks?)
>

> Try switching it off if you don't "need" it and repeat your tests.

From: John Laird <nos...@laird-towers.org.uk>

> The OP could investigate if the SQO bit is set on his file - I found some
> Google refs suggesting that if this is set, then the unwanted delays are
> eliminated (which kinda blows holes in the initialize-by-extent theory).
> SQO won't work if UNZIP writes randomly around the file, however, so look
> out for fseek and fpos type calls.

With no highwater marking, the time to unpack the test archive
dropped from about 58 minutes to about 36 minutes. Restoring highwater
marking and setting fab$v_sqo gave about 36.5 minutes, which is close
enough for me.

From: vax...@chclu.chemie.uni-konstanz.de (Eberhard Heuser-Hofmann)

> This leeds me to the question:
>
> Is there any idea of speed improvement when allocating a big file on
> highwater marked disks?

Apparently the answer is to set fab$v_sqo.

So far as I can tell (DIR /FULL), fab$v_sqo is not a durable file
attribute, so I don't need to worry about setting it and then not
restoring the original value. Confirmation and/or dire warnings to the
contrary would be appreciated.

Zip does some dancing around in the archive file when it writes it,
and UnZip does some when it reads it, but that allows UnZip to be very
sequential when it writes the extracted data files.

This means that Zip can't use all these tricks when creating the
archive, but because it doesn't know the archive size in advance, it was
doomed anyway. Larger-chunk allocation there is still a winner,
however.

From: "Craig A. Berry" <craig...@mac.com.spamfooler>

> It has bothered me. However, I think it would bother me more if I

> didn't find out until after the file was half written that I couldn't

> allocate sufficient space. I regard the current behavior as a design
> to preserve data integrity by decreasing the chances that an incomplete
> file will be left on disk after a failed unzip operation.

I'd check the status of the UnZip operation, and not trust any of the
results if it failed. Allocating all the space does not ensure that all
the data get written.

Well. This has certainly been productive. Unless someone discloses
a good reason to do otherwise, I suspect that the next UnZip release
will always be doing the full initial allocation, and setting fab$v_sqo
to avoid the extended periods of paralysis.

John Laird

unread,

Nov 23, 2004, 2:40:32 PM11/23/04

to

On Tue, 23 Nov 2004 12:30:57 -0600 (CST), s...@antinode.org wrote:

> Apparently the answer is to set fab$v_sqo.

Google to the rescue.

> So far as I can tell (DIR /FULL), fab$v_sqo is not a durable file
>attribute, so I don't need to worry about setting it and then not
>restoring the original value. Confirmation and/or dire warnings to the
>contrary would be appreciated.

RMS manual:
"The FAB$V_SQO option is input to the Create and Open services."

It is not a file attribute, just a declaration of your intents for the
current file processing.

My RMS memory is very rusty tonight, but essentially you are restricted to
sequential reads and writes. Any attempts at random file access will fail.
Once this is ensured, you can see that the highwater marking may be delayed
until the eof moves along, or possibly even avoided altogether if RMS never
consequently reads any existing data into its buffers before overwriting
them. As you found run-times almost exactly equivalent to having no
highwater marking, the latter is a distinct possibility.

The Google reference I came across (from the famous CJL) was referring to
append access, and querying why the C run-time library failed to set this
bit. I would have hoped UNZIP put enough clues in the create option for SQO
to be set. But clearly not, or no amount of clue makes this work. I
presume you merely needed to put something like "fop=sqo" in a creat or
fopen call ?

--
The current death rate? One per person, of course.

s...@antinode.org

unread,

Nov 23, 2004, 2:52:06 PM11/23/04

to

From: John Laird <nos...@laird-towers.org.uk>

> > Apparently the answer is to set fab$v_sqo.
>
> Google to the rescue.

Only if you have a clue as to for what to search.

> RMS manual:
> "The FAB$V_SQO option is input to the Create and Open services."
>
> It is not a file attribute, just a declaration of your intents for the
> current file processing.

Sounds good (and reasonable).

> [...]

> I would have hoped UNZIP put enough clues in the create option for SQO
> to be set. But clearly not, or no amount of clue makes this work. I
> presume you merely needed to put something like "fop=sqo" in a creat or
> fopen call ?

Hope all you wish, but clearly you haven't seen this code. There are
three distinct, medium-to-low-level I/O paths: one for non-V archives,
one for (the default) PK-style -V archives, and one for (the older)
IM-style -V archives.

That translates into two instances of "outfab-> fab$v_sqo = 1;", and
one of something else to be fed into a "sys$qiow( ...
IO$_CREATE|IO$M_CREATE|IO$M_ACCESS ...)".

Setting the FAB bit before a sys$create() call seems to work fine,
as does or-ing FIB$M_SEQONLY into xxx.FIB$L_ACCTL (along with
FIB$M_NOREAD ("no other readers"), without which it's apparently
useless) before the qio().

Thanks again for the help.

I do always enjoy telling an HP Web page to search "OpenVMS systems
sites" for something like "FIB$M_SEQONLY", and getting "No results were
found for your search." Does anyone _ever_ get any useful results from
this search "feature"? I can't remember an instance.

John Laird

unread,

Nov 23, 2004, 4:38:53 PM11/23/04

to

On Tue, 23 Nov 2004 13:52:06 -0600 (CST), s...@antinode.org wrote:

>From: John Laird <nos...@laird-towers.org.uk>
>
>> > Apparently the answer is to set fab$v_sqo.
>>
>> Google to the rescue.
>
> Only if you have a clue as to for what to search.

It looks like I used "highwater marking eof".

>> RMS manual:
>> "The FAB$V_SQO option is input to the Create and Open services."
>>
>> It is not a file attribute, just a declaration of your intents for the
>> current file processing.
>
> Sounds good (and reasonable).
>
>> [...]
>> I would have hoped UNZIP put enough clues in the create option for SQO
>> to be set. But clearly not, or no amount of clue makes this work. I
>> presume you merely needed to put something like "fop=sqo" in a creat or
>> fopen call ?
>
> Hope all you wish, but clearly you haven't seen this code. There are
>three distinct, medium-to-low-level I/O paths: one for non-V archives,
>one for (the default) PK-style -V archives, and one for (the older)
>IM-style -V archives.
>
> That translates into two instances of "outfab-> fab$v_sqo = 1;", and
>one of something else to be fed into a "sys$qiow( ...
>IO$_CREATE|IO$M_CREATE|IO$M_ACCESS ...)".

Argh, it's talking to the disk ACP direct:-( One would have hoped all the
-V restore functionality could have been done with higher-level routines and
filling in FAB and XAB blocks, but it appears not. Coping with new and
future extensions is always going to be tricky for a 3rd-party utility, and
it is possible that more can be done at lower levels. I dunno. I do know
that if I planned to zip files for posterity and wanted to be absolutely and
utterly sure, then creating a single BACKUP saveset and zip'ing that is
probably the safest bet. Hides all the VMS-ness inside and compresses a
file with a simple structure.

> Setting the FAB bit before a sys$create() call seems to work fine,
>as does or-ing FIB$M_SEQONLY into xxx.FIB$L_ACCTL (along with
>FIB$M_NOREAD ("no other readers"), without which it's apparently
>useless) before the qio().
>
> Thanks again for the help.
>
> I do always enjoy telling an HP Web page to search "OpenVMS systems
>sites" for something like "FIB$M_SEQONLY", and getting "No results were
>found for your search." Does anyone _ever_ get any useful results from
>this search "feature"? I can't remember an instance.

I've sometimes had better results at many a site by using Google's "search
this site" option rather than the site's own search facility...

--
When people are free to do as they please, they usually imitate each other.

David B Sneddon

unread,

Nov 23, 2004, 5:46:43 PM11/23/04

to

Eberhard Heuser-Hofmann was overheard to say:

> This leeds me to the question:
>
> Is there any idea of speed improvement when allocating a big file on
> highwater marked disks?
>

Craig A. Berry

unread,

Nov 23, 2004, 8:48:17 PM11/23/04

to

In article <0411231...@antinode.org>, s...@antinode.org wrote:

> From: "Craig A. Berry" <craig...@mac.com.spamfooler>
>
> > It has bothered me. However, I think it would bother me more if I
> > didn't find out until after the file was half written that I couldn't
> > allocate sufficient space. I regard the current behavior as a design
> > to preserve data integrity by decreasing the chances that an incomplete
> > file will be left on disk after a failed unzip operation.
>
> I'd check the status of the UnZip operation, and not trust any of the
> results if it failed. Allocating all the space does not ensure that all
> the data get written.

Obviously. But what you or I would do isn't the point. There are
many, many poor man's transaction processing systems out there where
somebody downloads a file, unzips it, and processes the results with
little or no error checking. If the unzipped file simply isn't there,
the error is much more likely to be caught and dealt with than if it is
there but is incomplete. I simply wanted to caution you against
introducing a new scenario that would exercise this pathology. Quite
possibly unzip wouldn't do this anyway since I think it only renames a
temp file as the last step in unpacking.

s...@antinode.org

unread,

Nov 23, 2004, 10:39:44 PM11/23/04

to

Actually, what I do is exactly what I'm worried most about. It's
what got me involved in this stuff to begin with.

Zip creates the archive as "ZIxxxxxx", and renames it when it's
complete. UnZip extracts files directly to their ultimate destinations.
If someone whacks the program after it allocates the space and before it
fills in the data, then he's out of luck. If not all the space is
allocated first, the result is similar, just smaller.

Zip includes checksums for integrity checking, and UnZip recognizes a
truncated archive as an error. If you wish to rewrite this part of the
code, I'm sure the Zip developers would be willing to consider your
offering. So far, Zip and UnZip lack a BACKUP-like /VERIFY feature, and
I'd not expect that to change soon.

If you actually care about the results but can't be bothered to check
the status of the UnZip operation, then I'm not very sympathetic.

David J Dachtera

unread,

Nov 23, 2004, 10:59:31 PM11/23/04

to

s...@antinode.org wrote:
> [snip]

> My questions are:
>
> 1. Has this bothered anyone else?

I'm not bothered by it, since I expect it. I'd like to see it be
otherwise, but I understand the constraints.

> 2. Does anyone else expect to be bothered by it?

No, unless it suddenly takes much longer in a future VMS version.

> 3. Does anyone, using any software, do such large file allocations
> this way? (My _page_ files are not this big, and SYSGEN CREATE
> is probably as close as I've previously come to making anything
> so big at one time. I use UnZip considerably more often than
> SYSGEN CREATE, too, though, like everyone else, I haven't been
> doing files so big.)

As rarely as I do, I either use:

$ MC SYSGEN CREATE

...or...

$ COPY NLA0: filespec/ALLOC=n

...followed by SET FILE/END or SET FILE/ATTR to set the end of file
byte/block manually.

--
David J Dachtera
dba DJE Systems
http://www.djesys.com/

Unofficial OpenVMS Hobbyist Support Page:
http://www.djesys.com/vms/support/

Unofficial Affordable OpenVMS Home Page:
http://www.djesys.com/vms/soho/

Unofficial OpenVMS-IA32 Home Page:
http://www.djesys.com/vms/ia32/

s...@antinode.org

unread,

Nov 23, 2004, 11:29:31 PM11/23/04

to

From: David J Dachtera <djesys...@comcast.net>

> > 1. Has this bothered anyone else?
>
> I'm not bothered by it, since I expect it. I'd like to see it be
> otherwise, but I understand the constraints.

Well, you sould have been. Anyway, now that I've discovered the
miracle bits (fab$v_sqo, FIB$M_SEQONLY, and FIB$M_NOREAD), it should be
all better in the next release.

> > 2. Does anyone else expect to be bothered by it?
>
> No, unless it suddenly takes much longer in a future VMS version.

Try using Zip 2.3 on larger files. I can say with some confidence
that it'll be plenty bothersome.

Craig A. Berry

unread,

Nov 24, 2004, 12:19:44 AM11/24/04

to

In article <0411232...@antinode.org>, s...@antinode.org wrote:

> From: "Craig A. Berry" <craig...@mac.com.spamfooler>
>

>> There are

> > many, many poor man's transaction processing systems out there where
> > somebody downloads a file, unzips it, and processes the results with
> > little or no error checking. If the unzipped file simply isn't there,
> > the error is much more likely to be caught and dealt with than if it is
> > there but is incomplete. I simply wanted to caution you against
> > introducing a new scenario that would exercise this pathology. Quite
> > possibly unzip wouldn't do this anyway since I think it only renames a
> > temp file as the last step in unpacking.

<snip>

> Zip creates the archive as "ZIxxxxxx", and renames it when it's
> complete. UnZip extracts files directly to their ultimate destinations.

OK, then you're proposing to remove a defensive programming practice
that's probably been there for years. It looks like BACKUP also
pre-allocates space for the entire file on a restore, which to me is
yet another reason to think it's the right thing to do.

> If someone whacks the program after it allocates the space and before it
> fills in the data, then he's out of luck. If not all the space is
> allocated first, the result is similar, just smaller.

The fact that you can't protect from something external whacking the
program doesn't mean you shouldn't preserve the existing modest
precautions against exceeding disk quota or filling up the disk. Maybe
the exception handling is good enough that it will delete an incomplete
file regardless of the reason for incompleteness, but that's something
worth testing for if you go ahead and give it a couple more possible
reasons.

<snip>

> If you actually care about the results but can't be bothered to check
> the status of the UnZip operation, then I'm not very sympathetic.

Assuming the exception handling is rock solid and there aren't any
latent exit() calls lying around with odd numbers being passed to them
to indicate a POSIX-style error, then this is merely uncharitable.

s...@antinode.org

unread,

Nov 24, 2004, 1:11:21 AM11/24/04

to

From: "Craig A. Berry" <craig...@mac.com.spamfooler>

> > Zip creates the archive as "ZIxxxxxx", and renames it when it's

> > complete. UnZip extracts files directly to their ultimate destinations.
>
> OK, then you're proposing to remove a defensive programming practice
> that's probably been there for years. It looks like BACKUP also
> pre-allocates space for the entire file on a restore, which to me is
> yet another reason to think it's the right thing to do.

Those statements describe Zip and UnZip behavior before and after my
changes. What do you think that I'm removing?

With my changes, files extracted from non-V archives will, for the
first time, be fully allocated at once, instead of incrementally. I
fail to see how this wrecks anything.

> > If someone whacks the program after it allocates the space and before it
> > fills in the data, then he's out of luck. If not all the space is
> > allocated first, the result is similar, just smaller.
>
> The fact that you can't protect from something external whacking the
> program doesn't mean you shouldn't preserve the existing modest
> precautions against exceeding disk quota or filling up the disk. Maybe
> the exception handling is good enough that it will delete an incomplete
> file regardless of the reason for incompleteness, but that's something
> worth testing for if you go ahead and give it a couple more possible
> reasons.

Other than the initial larger allocation causing the error sooner, I
fail to see any difference from the previous behavior. I believe that
Zip tries to delete a bad temporary ("ZIxxxxxx") before it renames it,
but I doubt that UnZip does anything except fail on a particular file,
and try to continue (or not). Just as they did before.

To what "existing modest precautions against exceeding disk quota or
filling up the disk" do you refer? So far as I know, these programs try
to make files and either succeed or fail. We ain't got no precautions.
We don't need no precautions. I don't have to show you any _stinking_
precautions!

> > If you actually care about the results but can't be bothered to check
> > the status of the UnZip operation, then I'm not very sympathetic.
>
> Assuming the exception handling is rock solid and there aren't any
> latent exit() calls lying around with odd numbers being passed to them
> to indicate a POSIX-style error, then this is merely uncharitable.

You lost me there. I think you may be straying beyond the boundaries
of the "Non-stupid opinions" region.

JF Mezei

unread,

Nov 24, 2004, 2:21:03 AM11/24/04

to

s...@antinode.org wrote:
> With my changes, files extracted from non-V archives will, for the
> first time, be fully allocated at once, instead of incrementally. I
> fail to see how this wrecks anything.

One caveat. You need to look into cluster size as well. If the original disk
had a cluster size of 100, and the file was 101 blocks, it occupied 200 blocks
on disk. If you move it to a disk with cluster size of 1, you want to make
sure you allocate 101 blocks and not 200. (eg: use space used , not space allocated).

It becomes a bit harder to determine if the owner of the file had intended to
leave some free space for growth, or if the free space at the end was just the
result of cluster size.

pr...@prep.synonet.com

unread,

Nov 23, 2004, 11:58:57 AM11/23/04

to

s...@antinode.org writes:

> This is probably most efficient, as no file extension will ever
> be needed, but when allocating a large file, some unpleasant things
> happen. The allocation seems to monopolize the destination disk
> drive for some considerable time (for example, about 11.5 minutes
> for a 5GB file on an otherwise idle PWS 500a[u], QLogic ISP1040B
> KZPBA-CX, FUJITSU MAF3364L SUN36G (wide), VMS V7.3-1). Also, once
> the allocation has begun, interrupting the UnZip program apparently
> does not interrupt the allocation, so the disk may be tied up for a
> long time, irregardful.

11 min seems a bit long. Have you looked at your SYSGEN XQP values?
Try doing a couple of big files, then run AUTOGEN so you can see what
it thinks is the right thing. The cluster factor of the drive will also
matter lots for this.

> My questions are:

> 1. Has this bothered anyone else?

It can be an anoyance if you are not expecting it.

> 2. Does anyone else expect to be bothered by it?

> 3. Does anyone, using any software, do such large file allocations

> this way? (My _page_ files are not this big, and SYSGEN CREATE
> is probably as close as I've previously come to making anything
> so big at one time. I use UnZip considerably more often than
> SYSGEN CREATE, too, though, like everyone else, I haven't been
> doing files so big.)

You will have to pay the XQP time to find and mark the bitmap for the
clusters you allocate, plus possibly zeroing the blocks. No way around
that, but doing it in one efficient lump does make it stand out some!
Doing the allocation in one hit also gives a BIG reduction in
fragmantation, usually.

Pay now, pay later, but you will pay!

--
Paul Repacholi 1 Crescent Rd.,
+61 (08) 9257-1001 Kalamunda.
West Australia 6076
comp.os.vms,- The Older, Grumpier Slashdot
Raw, Cooked or Well-done, it's all half baked.
EPIC, The Architecture of the future, always has been, always will be.

s...@antinode.org

unread,

Nov 24, 2004, 9:19:57 AM11/24/04

to

From: JF Mezei <jfmezei...@teksavvy.com>

> > With my changes, files extracted from non-V archives will, for the
> > first time, be fully allocated at once, instead of incrementally. I
> > fail to see how this wrecks anything.
>
> One caveat. You need to look into cluster size as well.

No, I don't.

> If the original disk
> had a cluster size of 100, and the file was 101 blocks, it occupied 200 blocks
> on disk. If you move it to a disk with cluster size of 1, you want to make
> sure you allocate 101 blocks and not 200. (eg: use space used , not space allocated).

With a non-V archive, the size used is derived from the original byte
count. With a -V[V] archive, the VMS attributes will be used, (more or
less) as before. That is, the allocation will be preserved, as it's one
of those preserved attributes. If you don't like the results, you'll
have to go in and truncate the files after they've been extracted. (The
OS appears to handle the case where the allocation must be larger, due
to a different cluster size.)

> It becomes a bit harder to determine if the owner of the file had intended to
> leave some free space for growth, or if the free space at the end was just the
> result of cluster size.

Impossible, I'd say, and, as getting -V to work right in the first
place triggered such a load of hissy fits and threats to go postal (from
the ill-informed) in this forum, I'm reluctant to suggest any kind of
automatic truncation at extraction.

However, as must be obvious by now, the Info-ZIP folks are willing to
take suggestions from anyone. If you submit better code, I'm sure
someone will consider it.

Craig A. Berry

unread,

Nov 24, 2004, 9:21:39 AM11/24/04

to

In article <0411240...@antinode.org>, s...@antinode.org wrote:

> With my changes, files extracted from non-V archives will, for the
> first time, be fully allocated at once, instead of incrementally. I
> fail to see how this wrecks anything.

It doesn't, and I'm glad to hear that's what we're talking about. Your
original post was mostly about how bothersome pre-allocation was when
unzipping -V archives for large files, and since most of the thread has
been about that case, I thought the discussion had moved toward
changing that behavior rather than adding it for the non-V case.

> > The fact that you can't protect from something external whacking the
> > program doesn't mean you shouldn't preserve the existing modest
> > precautions against exceeding disk quota or filling up the disk. Maybe
> > the exception handling is good enough that it will delete an incomplete
> > file regardless of the reason for incompleteness, but that's something
> > worth testing for if you go ahead and give it a couple more possible
> > reasons.
>
> Other than the initial larger allocation causing the error sooner, I
> fail to see any difference from the previous behavior.

> To what "existing modest precautions against exceeding disk quota or

> filling up the disk" do you refer? So far as I know, these programs try
> to make files and either succeed or fail. We ain't got no precautions.
> We don't need no precautions. I don't have to show you any _stinking_
> precautions!

It's pretty simple. An unexpected failure that leaves a partial file
on disk is more dangerous than an unexpected failure that leaves no
file at all on disk. If the exception handling and clean-up code are
good enough, the former case never happens. It's just simpler and more
robust to get the bad news at file creation time rather than depending
on clean-up code later on. But since you're keeping pre-allocation for
the -V case and adding it for the others, you're improving things by
this measure, and I have no argument with you.

bri...@encompasserve.org

unread,

Nov 23, 2004, 11:18:28 AM11/23/04

to

In article <k1k6q05fb6b8k5b1b...@4ax.com>, John Laird <nos...@laird-towers.org.uk> writes:

> On 23 Nov 2004 08:13:19 -0600, bri...@encompasserve.org wrote:
>
>>One of us has misunderstood how high water marking is implemented.
>>
>>It is not "erase on extend". It is "erase when the highwater mark
>>moves". For ordinary sequential files, the highwater mark is the
>>highest block within the file that has ever been accessed. It is
>>not the same as the end-of-file block. It is not the same as the
>>last allocated block.
>>
>>You can extend a 10,000 block file by 90,000 blocks and incur
>>no overhead. You can (I believe), move the end-of-file pointer
>>out to block 100,000 in the resulting file and still incur no
>>overhead. But if you try to read from block 100,000, you'd better
>>sit back and wait as the file system erases 90,000 blocks for you.
>
> And if you don't read but instead your process dies, then what is left in
> the file to indicate to the next reader that those blocks were never written
> or safely zeroed ? The only clue in the file header is the EOF position.

Or the high water mark in the file header.

John Briggs

David J Dachtera

unread,

Nov 24, 2004, 11:32:22 PM11/24/04

to

s...@antinode.org wrote:
>
> From: David J Dachtera <djesys...@comcast.net>
>
> > > 1. Has this bothered anyone else?
> >
> > I'm not bothered by it, since I expect it. I'd like to see it be
> > otherwise, but I understand the constraints.
>
> Well, you sould have been. Anyway, now that I've discovered the
> miracle bits (fab$v_sqo, FIB$M_SEQONLY, and FIB$M_NOREAD), it should be
> all better in the next release.
>
> > > 2. Does anyone else expect to be bothered by it?
> >
> > No, unless it suddenly takes much longer in a future VMS version.
>
> Try using Zip 2.3 on larger files. I can say with some confidence
> that it'll be plenty bothersome.

Quantify "larger" (in blocks, GB or whatever unit of measure is most
comfortable for you). I'll try to test it here on my little Alpha (AS200
4/233).

s...@antinode.org

unread,

Nov 24, 2004, 11:49:20 PM11/24/04

to

From: David J Dachtera <djesys...@comcast.net>

> > Try using Zip 2.3 on larger files. I can say with some confidence

> > that it'll be plenty bothersome.
>
> Quantify "larger" (in blocks, GB or whatever unit of measure is most
> comfortable for you). I'll try to test it here on my little Alpha (AS200
> 4/233).

The bigger the files, the more bothersome. You can't expect current
[Un]Zip to work on anything over about 2GB. My more recent tests used a
CD-ROM image, so about 650MB for the data file, which compressed down to
about 270MB, and I found that plenty bothersome.

As the AlpSta 200 internal SCSI is narrow, and my PWS test disk was
wide, I'd expect it to be worse for you.

s...@antinode.org

unread,

Nov 28, 2004, 1:17:45 AM11/28/04

to

As another example of the benefits of fab$v_sqo, in the creat() call
in my QREADCD program, I changed '"mbc = 64"' to '"mbc = 127"', and
'"fop = mxv, tef"' to '"fop = mxv, sqo, tef"'.

These changes (mostly the "sqo") cut about 30% off the total time,
and eliminated those annoying three-second pauses when the CD-ROM image
file was extended (on an ODS2 disk with highwater marking enabled, of
course).

As before, the latest QREADCD.C source is available near
"http://www.antinode.org/dec/sw/qreadcd.html".

I claim that fab$v_sqo definitely deserves more publicity.