Have an application which has been generating files (62,000 approx)
where the first 22 characters are always the same letters and numbers,
a year down the line it taking ages just doing a simple OS command
like ‘$ dire'. You can forget something like ‘$ dire/before=today'.
1. How many files should you keep in directory?
2. Why should you keep the .dir file (which contains my files) to
below 128 Blocks?
3. Does having the first 22 characters the same cause any problem to
OpenVMS?
4. Why does renaming rebuild the directory index?
5. Should you rename the file to same directory, a different
directory, or another disk?
6.How about FAQ on the matter?
What are HP recommendations?
> 2. Why should you keep the .dir file (which contains my files) to
> below 128 Blocks?
It is/was a caching issue. Not sure it is still valid. Also you have not
mentioned if your drive is formatted as ODS-2 or ODS-5.
> 3. Does having the first 22 characters the same cause any problem to
> OpenVMS?
I *believe* it does. I remember the ALL-IN-1 folks mentioning how they had
tried very hard to have their "random" file names be totally random to help
with hashing techniques used to access a directory. But that was years ago,
this may have changed. Reversing your name to have ddmmyyyy instead of
yyyymmdd might give this better hanshing capability.
> 4. Why does renaming rebuild the directory index?
Directory files are sequential files. Any changes to it require not only the
record be rewritten, but also all records after that one. (which is why when
you do a mass delete, it is faster to delete the directory from last file to
the first file since a whole lot less rewriting will be needed).
> 5. Should you rename the file to same directory, a different
> directory, or another disk?
Not sure. If you $RENAME RECIPE.TXT CHOCOLATE.TXT does this involve 2
updates to the directory (one from RECIPE.TXT to end, the other from
CHOCOLATE.TXT to the end, or does it rewrite the file just once, adding
CHOCOLATE and removing RECEPE in the same pass ?
> What are HP recommendations?
You want recommendatiosn from VMS engineers. not HP.
You may wish to distribute files amongst multiple directories.
You can define 2 logicals:
$define READ_CHOCOLATE [dir1],[dir2],[dir3],[dir4],[dir5],[dir6]
$define WRITE_CHOCOLATE [dir3]
and change WRITE_CHOCOLATE at regular intervals to point to a different directory.
You create files in WRITE_CHOCOLATE: (which will distribute files amongst many
directories) and then use READ_CHOCOLATE logical to access existing files, and
being a searchlist, it will automatically locate your file.
This will help keep each directory file at a more manageable size.
Can't remember what version the caching issue was alleviated, but I
think it was 7.3, so you have it.
Suggest you use a directory a month, or whatever's easiest/appropriate
to keep the directory size within bounds.
>>3. Does having the first 22 characters the same cause any problem to
>>OpenVMS?
>
> I *believe* it does. I remember the ALL-IN-1 folks mentioning how they had
> tried very hard to have their "random" file names be totally random to help
> with hashing techniques used to access a directory. But that was years ago,
> this may have changed. Reversing your name to have ddmmyyyy instead of
> yyyymmdd might give this better hanshing capability.
No - this will make it worse, as I understand it - as per the next
answer, the file is sequential - edits nearer the end will incur less
overhead, so if you can append at the end, you'll do ok.
When you do clean up, use dfu (off freeware), and compress your directories.
Chris
> I am urgently seeking some information on how files are stored and
> indexed by OpenVMS V7.2-1
> Have an application which has been generating files (62,000 approx)
> where the first 22 characters are always the same letters and
> numbers, a year down the line it taking ages just doing a simple OS
> command like $ dire'. You can forget something like $ dire/before=today'.
> 1. How many files should you keep in directory?
How ever many you need to.
> 2. Why should you keep the .dir file (which contains my files) to
> below 128 Blocks?
see below
> 3. Does having the first 22 characters the same cause any problem to
> OpenVMS?
The XQP keeps a hint to the directory. This used to be a partial key
to each of 128 blocks max in the directory. If you exceeded that,
directory lookup time went way up. There is no longer a 128 block
limit, but there is a max of 128 key kint pointers.
The key comes from the file name, so if the name is almost identical,
the scheme fails as the keys have the same value. So the xqp has to
do a full search of the .dir
> 4. Why does renaming rebuild the directory index?
???
> 5. Should you rename the file to same directory, a different
> directory, or another disk?
You CAN'T rename to another disk. You have to COPY. The rest depends
on the exact details of what you are doing.
--
Paul Repacholi 1 Crescent Rd.,
+61 (08) 9257-1001 Kalamunda.
West Australia 6076
comp.os.vms,- The Older, Grumpier Slashdot
Raw, Cooked or Well-done, it's all half baked.
EPIC, The Architecture of the future, always has been, always will be.
Upgrade to V7.3-2, and your problems will vanish. Prior versions
only cached 128 blocks of the directory file, regardless of how big
it was.
--Stan Quayle
Quayle Consulting Inc.
----------
Stanley F. Quayle, P.E. N8SQ +1 614-868-1363
8572 North Spring Ct., Pickerington, OH 43147 USA
stan-at-stanq-dot-com http://www.stanq.com
Wasn't this 128-block limit lifted for some version of VMS or at least
some Alpha version of VMS?
>
> > 3. Does having the first 22 characters the same cause any problem to
> > OpenVMS?
>
> I *believe* it does. I remember the ALL-IN-1 folks mentioning how they had
> tried very hard to have their "random" file names be totally random to help
> with hashing techniques used to access a directory. But that was years ago,
> this may have changed. Reversing your name to have ddmmyyyy instead of
> yyyymmdd might give this better hanshing capability.
>
>
> > 4. Why does renaming rebuild the directory index?
>
> Directory files are sequential files. Any changes to it require not only the
> record be rewritten, but also all records after that one. (which is why when
Are you sure about that? I think that it only rewrites records in the
same directory block, *unless* that block overflows to the next one,
in which case you'd have to rewrite at least one more block.
> you do a mass delete, it is faster to delete the directory from last file to
> the first file since a whole lot less rewriting will be needed).
This is true.
[...]
The 127-block limit was removed before V7.3, perhaps as far back
as V7.1
I suspect that other things, such as XFC, have also improved large
directory file peformance.
--
Rob Brooks VMS Engineering -- I/O Exec Group brooks!cuebid.zko.dec.com
I'm not sure I really understand this 128 block limit. Back when
AUTOGEN was new I found ACP_DIRCACHE was 128 pages and ignored by
AUTOGEN even though I was getting tons of cache misses. I raised
it myself after looking at the sizes of directories and performance
and cache hit rates both went up.
Was I seeing the results of caching multple directories or did this
actually cause larger directories to be cached?
I would recommend increasing ACP_DIRCACHE to anyone who is seeing
low hit rates in "monitor file_system_cache"'s corresponding columns.
Which makes it interesting that a VMS DELETE *.*.* deletes files first to
last instead of last to first. I wonder if this will ever be fixed?
Bob Kaplow NAR # 18L TRA # "Impeach the TRA BoD"
>>> To reply, remove the TRABoD! <<<
Kaplow Klips & Baffle: http://nira-rocketry.org/LeadingEdge/Phantom4000.pdf
www.encompasserve.org/~kaplow_r/ www.nira-rocketry.org www.nar.org
We have awakened a sleeping giant and instilled in it a terrible
resolve. -- Admiral Isoroku Yamamoto, WWII.
STOP THAT!!!!
> where the first 22 characters are always the same letters and numbers,
So they can be eliminated, right?
> a year down the line it taking ages just doing a simple OS command
> like ‘$ dire'. You can forget something like ‘$ dire/before=today'.
Two main reasons:
1. Directory caching is likely an issue (even the recent changes have
their limits).
2. To check for dates, the system must retrieve the (first) file header
for each file. That's a *LOT* of thrashing! (Read the directory, read
INDEXF, read the directory, read INDEXF, ...)
> 1. How many files should you keep in directory?
Depends.
How long is the file name? (Determines how many directory blocks will be
needed to store pointers to all the versions.)
How many versions of each? (Each new version eats up a few longwords,
until the current directory record is full (max 512, no span), then a
new one is started).
> 2. Why should you keep the .dir file (which contains
...pointers to...
> my files) to below 128 Blocks?
Depends on the o.s. version, but again, its a caching issue. Also, .DIR
files are sequential, not indexed (ISAM). So, they have to be searched
sequentially. Of course, that said, the .DIR is kept sorted in ascending
order by filename.ext; so, some binary search is possible, though I
don't know if the system actually does that.
> 3. Does having the first 22 characters the same cause any problem to
> OpenVMS?
Other than wasting CPU cycles and I/O bandwidth on meaningless info.,
no.
> 4. Why does renaming rebuild the directory index?
Well, first off, there is no directory "index". In a way, the directory
*IS* the "index".
RENAME is little more than the combination of removing an entry for a
file and making a new one. That said, it's roughly equivalent to "half"
a delete and "half" a create. You could emulate it like so in DCL:
$ SET FILE/ENTER=new_filespec filespec
$ SET FILE/REMOVE filespec
> 5. Should you rename the file to same directory
Normal.
>, a different directory
Also, normal.
>, or another disk?
Not possible. You must COPY to another disk, then (optionally) delete
the original.
> 6.How about FAQ on the matter?
Did you check the FAQ?
> What are HP recommendations?
See the on-line doc.'s.
--
David J Dachtera
dba DJE Systems
http://www.djesys.com/
Unofficial OpenVMS Hobbyist Support Page:
http://www.djesys.com/vms/support/
Unofficial Affordable OpenVMS Home Page:
http://www.djesys.com/vms/soho/
Unofficial OpenVMS-IA32 Home Page:
http://www.djesys.com/vms/ia32/
One screenful!
> 2. Why should you keep the .dir file (which contains my files) to
> below 128 Blocks?
> 3. Does having the first 22 characters the same cause any problem to
> OpenVMS?
Yes.
> 4. Why does renaming rebuild the directory index?
> 5. Should you rename the file to same directory, a different
> directory, or another disk?
> 6.How about FAQ on the matter?
"VMS File SYstem Internals" by Kirby McCoy
ISBN 1-55558-056-4
Digital Press 1990
As much as this book needs updating, we REALLY need updates to Ken Bates
"VAX I/O Subsystem: Optimizing Performance" [set in the days of the HSC50]
and Roy Davis "VAXCluster Principles" [pre Alpha].
Propose a solution (how-to) at the next Engineering Panel.
Note to Sue, Hoff, whoever...
How receptive are the publishers to new/updated book concepts?
Exactly my feeling. I've been using that rule of thumb for decades.
IIRC both DFU and backup/delete will do last file first. Get DFU if
you need to delete a lot of files and let VMS Engineering work on
select() and fork().
No, that's not how DFU does it. It's much cleverer than that,
and the speedup is orders of magnitude better than doing
last-file-first.
Basically DFU marks the directory file as no-directory, having
noted the FIDs of all files in the directory. Deletion of the
files by FID no longer has any of the directory updating overhead
after that.
Remember: DFU is your friend, and *every* system manager should
make use of such a good friend.
Well, I certainly qualify as "whoever" and I have heard the publishers
speak to this issue.
> How receptive are the publishers to new/updated book concepts?
Regarding suggestions for new publications I would classify the
Digital Press (a division of ...) folk as "polite".
For suggestions from a would-be _author_ I would classify them as
"enthusiastic". It is not the case that Digital Press has a vast
staff of authors waiting in the wings for something to write.
Volume type: ODS2
>Bob Koehler wrote:
>
>> In article <HTieN5...@eisner.encompasserve.org>,
> > kapl...@encompasserve.org.TRABoD (Bob Kaplow) writes:
>>
>>>Which makes it interesting that a VMS DELETE *.*.* deletes files first to
>>>last instead of last to first. I wonder if this will ever be fixed?
>>
>>
>> IIRC both DFU and backup/delete will do last file first. Get DFU if
>> you need to delete a lot of files and let VMS Engineering work on
>> select() and fork().
>
>No, that's not how DFU does it. It's much cleverer than that,
>and the speedup is orders of magnitude better than doing
>last-file-first.
To the PP, I've never noticed backup/delete operating in any other way than
the normal directory-tree, alphabetical order that it uses to backup the
files, to be honest.
>Basically DFU marks the directory file as no-directory, having
>noted the FIDs of all files in the directory. Deletion of the
>files by FID no longer has any of the directory updating overhead
>after that.
I guess it *might* note the FIDs, but as long as it prevents deletion of the
(now non-) directory file, it could of course simply scan that, deleting by
FID as it goes. That technique could get a little messy within a tree, I
guess.
>Remember: DFU is your friend, and *every* system manager should
>make use of such a good friend.
I couldn't agree more. I just feel sorry for those VMS system managers who
have to work under "no unsupported or third-party software" edicts and do
without it.
--
Your call will be answered in the order it was ignored.
Mail john rather than nospam...
Out of curiosity, when a file has multiple entries in different directories,
at what level is there a marker to define how many entries exist for that file
? Is it in indexf or in the actual file header itself ?
Does DFU manually handle those cases in its own code, or does it call a system
service that either deletes a file or decrements the entry count ?
None.
John Briggs
There is no such count kept.
The only indication of aliassing is the directory back link., but that's
just one link in the header.
Only the leading characters are used. No real hashing, just a range index
into the sorted directory. Right now, the minimum hash length is 3
characters. At maximum it is 15 characters. Only one block dirindxcache
(512 bytes) for a directory of any size. The more entries need, the lower
the number of lead characters used.
The slowdown on larger directories, when using the XQP not RMS) is thus
because of
- The dir index cache entries are not large enough to map a small range of
file names. Past a few hundred blocks, the index maps only 3 characters of
the file name. So when you needs it most, you have it the least :-(.
- The number of available ACP_DIRCACHE blocks is too small for a given index
range. The blocks must be reused frequently in order to read the blocks to
know where to put the next entry.
Hein.
> Have an application which has been generating files (62,000 approx)
> where the first 22 characters are always the same letters and numbers,
> a year down the line it taking ages just doing a simple OS command
> like '$ dire'. You can forget something like '$ dire/before=today'.
>
> 3. Does having the first 22 characters the same cause any problem to
OpenVMS?
- Yes, It disables any and all starting block lookup by the xqp forcing
linear read.
- Yes, It causes directory blocks to fill up pre-maturely, requiring splits
and condendes much (2x?) more often.
- Yes, it will be hard to see the 'real name' in the crud.
If speed is critical, please consider removing all redundant data from file
names.
Change files like:
[ORDERS]MYSOOPERDOOPERSTORE20041111092304.DAT
To
[ORDERS_MY.200411]11092304.D;
Yes, the D likely to be redundant also, or at least can become part of the
directory naming]
[I suppose that applications which are really desparet for directory
efficiency, and can trust their programs and operators, they could use file
versions as the final name piece: [200411]1109.D;2304
File versions just occupy a 16 bit word for the version, and the regular 6
bytes for the file ID. And those are at no extra cost as a they would have
been part of a normal entry anyway.
Thus in the crazy example each day+hour would just have one directory entry
and each minute+second would become a specific version. Or the version could
be and ordernumber or whatever other convoluted scheme you can dream up]
Cheers,
Hein.
Neat. So one could have a directory with some 32000 files but with a very very
small directory file since it would contain essentially one logical record
(possibly split into a couple of phsysical records).
What is needed now is some flag in a directory file to prevent the use of the
Purge command.
Then, you could have some applications such as NNTP/news servers that create
large amounts of small files run extremely fast on VMS.
Yeah. If one were to seriously consider that though, then I would probably
design towards avoiding/minimizing splitting those records over directory
blocks. That would mean roughly 60 files per name = per block.
If I was really realy serious about directory performance I might choose NOT
to use a directory. Just create the files with 'tmd' and use an indexed file
to maintain 'names' to fid info, or use such indexed file in addition to the
regular directory.
> What is needed now is some flag in a directory file to prevent the use of
the Purge command.
:-)
That's precisely why I added "can trust their programs and operators" in my
original suggestion
Hein.
I think you still end up with 8 bytes per file. 6 byte file ID plus
2 byte version number. So you get a maximum of around 30 files per
block. By comparison, typical file name will give you somewhere in
the area of 7 files per block.
A factor of four to one is nothing to sneeze at, to be sure. But it's
not going to make a 32000 file directory useful.
> Then, you could have some applications such as NNTP/news servers that create
> large amounts of small files run extremely fast on VMS.
If you want performance out of NNTP, you want to use container files,
not VMS directories.
John Briggs
Ok, so 32k files would still represent 500 records in the directory file. Not
quite fully cached.
> If I was really realy serious about directory performance I might choose NOT
> to use a directory. Just create the files with 'tmd' and use an indexed file
> to maintain 'names' to fid info, or use such indexed file in addition to the
> regular directory.
Problem with this is that there are issues with system management since you
have so many files that are "unmanaged" by DCL utilities sur as DIR, and if
someone does a ANA/DISK/REPAIR, it will add all those files to SYSLOST (or is
it SYS$LOST ?).
ALLIN1 used 9 character file names (+4 for extension), as well as indexed
files to map the document names to the obscrure VMS file names. It also split
files amongst a number of directories to spread the load.
In the current implementation aliases are distinct from original
entries. Original entries can be determined by the content of the
file's header.
For COE links things may be different.
To do otherwise would be difficult in the face of the ODS-2 philosophy
of atomic disk updates. "Rebuilding" an ODS-2 disk after a crash can
be deferred, since the only inaccuracy is some missing free space.
By using DFU to delete a directory tree and aborting it midway with
^Y, you can see many "no such file" entries. This means the file
headers of these files have been recycled for new files while the
corresponding .DIR;1 files are still functional. When all the FID's
listed in a directory have been marked for reuse, the .DIR;1 file is
deleted. This avoids the need for continually updating the .DIR;1 file
with the cost of not being able to abort cleanly.
I should think that goes without saying.
The idea was to look for opportunities to update the existing library,
then perhaps extend it as VMS has advanced beyond the existing
publications.
The "Who" would be me and others like me (actually, I would hope for
folks better than me, but we're all so busy these days...).
I'd actually like to collaborate with Steve Schweda on a "ZIP and UNZIP
for VMS" book; but, given his opinion of me, that's not very likely.
I'd like to collaborate with Hein on an RMS book.
If there were a TPU guru around, I'd like to collaborate with that
person on an "Applied TPU" book, complete with a CD containing a
complete emulation of EDT (when you press CTRL+K, it responds exactly
the way EDT does - no EVE/TPU syntax).
I'd like to collaborate with Guy on a DCL book.
I'd like to collaborte with Jeff and Saul on a DCSC / SLS book focusing
on implementing the STK L700e and other tape libraries (that SLS was
never intended to support).
Then, of course, there's all the new features like PIPE, support for
extending volume sizes, etc. that needs a straight-forward explanation,
with examples.
So much to do, so little time before Itanic heads to Davy Jones's locker
and VMS's time runs out.
D'ya ever have to deal with All-in-1?
[SYSLOST]
I though directory access was via XQP, not RMS. Global buffers
only affect RMS access.
Regards,
Dave
--
David B Sneddon (dbs) VMS Systems Programmer dbsn...@bigpond.com
Sneddo's quick guide ... http://www.users.bigpond.com/dbsneddon/
DBS freeware http://www.users.bigpond.com/dbsneddon/software.htm
I believe that it's a mix. Both components have the ability to do
directory lookup by looking into .DIR file contents and both components
have the ability to cache relevant information.
That said, I have a hard time believing that RMS level directory access
is performed in such a way that a "global buffers" setting would be
respected.
John Briggs
IIRC only RMS knows how to use directory files. At the XQP level
you must know the DID to open a file by name.
> D'ya ever have to deal with All-in-1?
No. Never used it, never missed it.
And then the XQP reads from the pointed-to directory file, looking
for file names and determining FIDs.
I think what you're trying to say is that only RMS knows how
to deal with file names in the "[dir1.dir2.dir3]filename.dat;123" syntax
that the end users are familiar with.
Unfortunately, that statement does not bear on the question of
which component(s) have the ability to do directory lookups in
.DIR files.
John Briggs
You're lucky. It was a MOTHER! ...but it did some useful things.
All I heard was hog-in-1, due to it's large memory requirements
(for the time), about 1 MB per user. Then DECwindows shipped and
our 4MB VAXStation 2000 couldn't handle one user, we got the 12MB
upgrade.
I still remember doing Fortran IV-Plus applications running VMS 1.x
on an 11/780 with 1/2 MB, the minimum DEC recommended just to boot.
And compiling source files from labeled tape because I didn't have
enough disk space for sources and objects.