Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

MAIL$nnnnnnnnnnnnnnnn.MAI

7 views
Skip to first unread message

Eschew Obfuscation

unread,
Jan 2, 1991, 4:14:41 PM1/2/91
to
Simon Szeto, sz...@star.enet.dec.com indicates:

>Unfortunately, the 16 digits in the MAIL$ssn.MAI file name do not represent
>the system time. The system time is shifted over by 16 bits. As a result,
>the "V5" filenames have a period 2**16 times 7 minutes (approx.) or about
>325 days. (The most recent rollover occurred on 20-Nov-1990.) This is
>practically no better than the "V4" name format. If it's any consolation,
>it's no worse.
>
>VMS Engineering is now aware of this embarassing blunder.

Geez, these DEC guys are really hard on DEC aren't they?

The information I have about the reason for spreading the filenames out is to
help with the file search algorithm when trying to find a file in a directory.
Finding a file in a directory is 'essentially' a binary search based mostly on
the high order characters in the name (binary search on disk, but I believe it
is linear through the file system caches in memory). Having filenames
'scattered' throughout the alphabet is good for the on-disk search process.

BTW, note however that if the directory file 'x.DIR;1' exceeds 127 blocks,
then the search process becomes strictly uncached, and I think the search
process is also linear, not binary. The size of the directory file is
dependent on the number of files in the directory and the size of the
filenames (ie. number of characters in the filename and extension, in
characters, not the size of the file itself).

In the real life example here, I guess there is poor scattering, although 325
days isn't as bad as no scattering. I don't profess to be a typical user, but
I have numerous mail files going back several years. Scattering these
filenames with granularity of 325 days on the 6th character (VMS V5.2+) for
sure is better than the pre-v5.2 method (but only marginally). Also, note that
it's too late for pre-existing messages; the name is derived from the system
time when the external message file is created.

The bigger performance problem in my opinion is that MAIL.MAI, which is
searched for every time you enter mail, is towards the end of the collating
sequence. This doesn't make much difference with a binary search, but is a
killer in a sequential search. Every time you enter the mail utility, it has
to find and open MAIL.MAI before you get the 'MAIL>' prompt. Even making the
names 'MAIL_nnn.MAI' instead of 'MAIL$nnn.MAI' would improve matters ('$'
collates before letters of the alphabet, '_' comes after). My MAIL.DIR file
is currently 254/280 disk blocks in size, containing 1457 MAIL$nnnnnn.MAI
files. When I type '$ MAIL', it takes about 6 seconds before I get the
'MAIL>' prompt (on a 16 Meg VAXstation 2000, yucko).

IMHO a better way to do it would have been (1) to make the 'randomized'
filenames shorter in length so that more names fit into a given size of
directory file and (2) making the names such that the name "MAIL.MAI" is no
further than midway into the binary search of filenames (Best: exactly at the
middle for binary search, at the beginning for linear search).

____________________________________________________________________________
Harvey Brydon | SLB DECnet: ASL::BRYDON
Schlumberger Anadrill | Internet: bry...@asl.sinet.slb.com
200 Macco Blvd | X.400: %RMS-E-RTB, Data Exceeds 65536 byte buffer
Sugar Land, TX | FAX: (713)240-6546
77478 USA | P.O.T.S.: (713)274-8000 x8281 or 274-8281 (d.i.d.)
"Live free or die" - on license plates made by convicts

Simon Szeto

unread,
Jan 3, 1991, 8:17:39 AM1/3/91
to
In a previous posting I wrote:
>I looked at the code. I infer that the value 4 or 5 indicates a "version
>number." Although this is not necessarily identical to the VMS version
>number, the external mail files were introduced with VMS V4.0, and the code
>for swapping the longwords was, judging from the change history, in fact
>made for V5.0. For some reason unknown to me, the one line of code that
>actually caused the "V5" filename format to be produced was not inserted
>until V5.2.

I was reminded that V5.0 through V5.1-1 supported a rolling upgrade from
V4.7. A cluster that included at least one V4.7 node couldn't have handled
the V5 filename format. With V5.2 and later, it was safe to generate the
V5 format filenames.

Simon Szeto (Internet: sz...@star.enet.dec.com)
International Systems Engineering
Digital Equipment Corporation
Nashua, New Hampshire, USA

Jerry Leichter

unread,
Jan 3, 1991, 10:39:00 PM1/3/91
to

[Miscellaneous comments on why MAIL's use of MAIL$nnn files that
are in alphabetical order is a problem.]

There's another issue here: If MAIL only CREATED such files, then creating
them in alphabetical order would be neither better nor worse than creating
them in any other order. But MAIL also DELETES most of these files, and
that's where the problem comes from.

A directory consists of a series of blocks, each containing a number of
entries. The entries within a block are sorted, and the blocks themselves
are in sorted order (i.e., if block A preceeds block B in the directory, then
every entry in block A precedes every entry in block B in sorted order).
Each block may also have some free space. When an entry needs to be added,
if there is room in the appropriate block, it is just stuck in; if not, the
file is extended and entries "slide up" to make room. (I don't recall
whether entire blocks are slid up, leaving an empty new block, or whether
the entries are split.)

When entries are deleted, the space they used is added to the free space in
their block. However, blocks are never reclaimed - even if all the entries
in a block are deleted, there is no mechanism to "slide down" subsequent
entries.

Now consider MAIL's files. They are created in sorted order, so always get
added at or near the end of the directory. Great: No need to slide entries
over to make room.

If ALL the new files are quickly deleted, the result is a bunch of free space
at the end of the directory, just where more files will go. Again, great -
that space will be re-used quickly.

In practice, however, SOME MAIL files get kept around. Once a "retained"
MAIL file entry ends up anywhere but in the last block of the directory -
that is, as soon as enough new entries are created after it to require another
block - that entry will effectively block off a range of empty space: It
will no longer be available at the end of the directory, which is where all
MAIL file creations take place.

The result is that a MAIL directory keeps growing: It's internally fragmented
and while full of empty space, often has no space at exactly the point where
it will be needed.

Once the directory grows beyond the magic limit of 127 blocks, RMS will no
longer be able to cache it. Then access to it will become very slow.

The fix is simple: Create a new directory, rename everything in the MAIL
directory to the new one, then get rid of the old MAIL directory and rename
the new one to be the new MAIL directory. The directory created as a result
of this operation will be as small as it can be.

The following is an extract from a command file I run as a batch job on a
weekly basis to clean up my mail. My WASTEBASKET folder is DELETED - I keep
things in there for 30 days. Stuff in my DEAD folder is purged every week.
My mail subdirectory is [.MAIL].

This procedure can fail if mail arrives while it is running - a surrounding
command procedure tells me (by mail) if this happens. (Since newly-arrived
mail is created with no DELETE access to OWNER, the attempt to rename it to
the [.WEEKLY_TEMP] directory will fail. If this is the cause, just running
the procedure again will fix the problem (though you might want to have it
skip the PURGE/RECLAIM). The complexity - the special handling of MAIL.MAI,
the business with [.WEEKLY_TEMP1] - are all there to minimize the time in
which the mail configuration is "screwed up".

-- Jerry

$ verify = 'f$verify(1)'
$ write SYS$ERROR "WEEKLY.COM running"
$ set default SYS$LOGIN
$!
$! Clean up mail - purge all deleted messages that are more than 1 month old;
$! leave the others in DELETED
$!
$ on error then exit 40 !If we fail, things blow up. Get help!
$ mail
select deleted/since:today-31- !Stuff less than one month old
file/all/noconfirm __WEEKLY_SAVE__
$ mail
select MAIL
1
copy/noconfirm DEAD !Make sure DEAD exists
copy/noconfirm __WEEKLY_SAVE__ !__WEEKLY_SAVE__, too
select DEAD
delete/all
$ mail
purge/stat/reclaim
compress
select __WEEKLY_SAVE__
delete/all
$ create/dir/prot=(S:RWE,O:RWED,G,W) [.WEEKLY_TEMP]
$ set file/prot:O:REWD [.MAIL]*.*;*
$ rename/nolog [.MAIL]*.*;*/exclude=mail.mai;* [.WEEKLY_TEMP]
$ set file/prot:O:REWD mail.dir
$ rename mail.dir weekly_temp1.dir;1
$ rename weekly_temp.dir mail.dir;1
$ rename [.weekly_temp1]*.*;* [.mail]
$ set file/prot:O:REW [.MAIL]*.*;*/exclude:mail.old;*
$ set file/prot=O:REWD weekly_temp1.dir
$ delete weekly_temp1.dir;
$ mail/subj:"Check and delete MAIL.OLD" _NL: 'f$getjpi("","USERNAME")
$ verify = f$verify(verify)

David L. Cathey

unread,
Jan 7, 1991, 6:05:12 PM1/7/91
to
In article <910104043...@BULLDOG.CS.YALE.EDU>, leic...@LRW.COM (Jerry Leichter) writes:
> [Miscellaneous comments on why MAIL's use of MAIL$nnn files that
> are in alphabetical order is a problem.]
>
> A directory consists of a series of blocks, each containing a number of
> entries. The entries within a block are sorted, and the blocks themselves
> are in sorted order (i.e., if block A preceeds block B in the directory, then
> every entry in block A precedes every entry in block B in sorted order).
> Each block may also have some free space. When an entry needs to be added,
> if there is room in the appropriate block, it is just stuck in; if not, the
> file is extended and entries "slide up" to make room. (I don't recall
> whether entire blocks are slid up, leaving an empty new block, or whether
> the entries are split.)
>
> When entries are deleted, the space they used is added to the free space in
> their block. However, blocks are never reclaimed - even if all the entries
> in a block are deleted, there is no mechanism to "slide down" subsequent
> entries.
>
I looked into this once. The records in the directory blocks do not
span record boundries. This insures that at the very beginning of very block
is the first byte of some record.

The some parts of his comment are true. By experimentation, I found
out the following:

1) A record is added to a block by inserting into the existing records,
moving all records in the block if required to maintain alpha order.

2) If a block fills up, all blocks following the current block are
"moved down" a block. The current block is split (about in half) between the
current block and the subsequent block.

3) As files are deleted, blocks are not collapsed unless ALL records in
the block are deleted, then all subsequent blocks are "moved up" a block.
The only empty blocks in a directory are at the end, and the EOF is moved back
a block (to help in file searches). The file allocation is not changed.

I believe I have seen it "reclaim" the free space. Please note that
the block splits and collapses can be rather expensive in overhead. I'm sure
part of the changes to mail are to reduce the occurances of this, as well as
speed file searching.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
David L. Cathey |INET: dav...@montagar.lonestar.org
Don't blame me! I voted for Bill and |UUCP: ...!texsun!montagar!davidc
Opus for President! Ack! Thhrrptt! |Fone: (214)-618-2117

0 new messages