Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Which delete statement is faster?

11 views
Skip to first unread message

Mike Minor

unread,
Oct 16, 2007, 2:07:14 PM10/16/07
to
Is there any difference in the speed at which the command is executed in the
following examples?

del a*.*;*
del a*.txt;*
del a*.txt;1

Thank you for your help

Mike


Bob Gezelter

unread,
Oct 16, 2007, 2:28:35 PM10/16/07
to

Mike,

The answer depends on what else is in the directory. In the degenerate
case (only a.txt;1 is present), the differences are minuscule. If
there are thousands of files in the directory, the answer is
different.

There is also a semantic problem. Specifying ";1" will not have the
desired result if a command procedure is interrupted, leaving the file
behind. Specifying the wildcad (e.g., ";*") deletes ALL versions.
(Specifying ";0" only deletes the highest numbered version).

In general, I advise my clients to exercise EXTREME CAUTION with this
type of coding, as it is quite easy to create mayhem if two different
processes are executing the code in the same directory.

- Bob Gezelter, http://www.rlgsc.com

bri...@encompasserve.org

unread,
Oct 16, 2007, 2:44:37 PM10/16/07
to
In article <13h9vel...@corp.supernews.com>, "Mike Minor" <mmin...@earthlink.net> writes:
> Is there any difference in the speed at which the command is executed in the
> following examples?
>
> del a*.*;*
> del a*.txt;*
> del a*.txt;1

All things being equal (i.e. the only files in the directory that
match the a*.*;* wildcard also match a*.txt;1), I'd expect no significant
performance difference.

The real work is going to be the disk I/O writing directory contents
back to disk. Reading directory entries into cache and parsing
and searching directory entries from cache is unlikely to be the
bottleneck.

Why do you ask?

Historically, the thing that absolutely kills delete performance is
the "bubble down" that can take place if you delete the last directory
entry in a block near the front end of a _HUGE_ .DIR file.

Various tweaks over the years have improved this behavior by orders
of magnitude. If it's still an issue for you, a reverse-alphabetical-order
delete is one thing that can sometimes be of use.

JF Mezei

unread,
Oct 16, 2007, 3:20:53 PM10/16/07
to
bri...@encompasserve.org wrote:
> In article <13h9vel...@corp.supernews.com>, "Mike Minor" <mmin...@earthlink.net> writes:
>> Is there any difference in the speed at which the command is executed in the
>> following examples?
>>
>> del a*.*;*
>> del a*.txt;*
>> del a*.txt;1

I think that a non-wildcarded delete will be the fastest since it can do
a direct lookup. (but this is not in your example)

In the above cases, the delete command would still have to sequentially
scan all files in the directory beginning with "a" and then see if it
matches the mask. Obviously, the more "*" you have in a mask, the more
CPU will be needed to decide if a full file spec matches the wildcard,
but unless you are running an All Mighty Microvax II, you might not see
any difference since the delete command will spend the most time in IO
and the CPU time needed to check a string against a wildcard
specification is fairly trivial.

Mike Minor

unread,
Oct 16, 2007, 3:23:26 PM10/16/07
to
I have a directory with 200000+files, all in the a*.txt;1 range. I need to
ftp these files to another server. After sending 30,000+ files via FTP I
realized the magnatude of the ftp process, and interrupted it. I want to
delete the 30,000+ file already ftp'ed before going back and looking at
continueing the ftp process in a different manner and it just seems to be
taking an extremely long time to perform the delete. I think the hang up is
the re-write of the directory contents back to disk after a few files are
deleted. I did the delete with a /log to watch how long it took to delete a
file. I noticed a pause of a few seconds after it listed 15 to 20 files that
were deleted.....


Thank you for your help....

Mike

<bri...@encompasserve.org> wrote in message
news:iO4j52...@eisner.encompasserve.org...

bri...@encompasserve.org

unread,
Oct 16, 2007, 3:38:13 PM10/16/07
to
In article <13ha3ti...@corp.supernews.com>, "Mike Minor" <mmin...@earthlink.net> writes:
> I have a directory with 200000+files, all in the a*.txt;1 range. I need to
> ftp these files to another server. After sending 30,000+ files via FTP I
> realized the magnatude of the ftp process, and interrupted it. I want to
> delete the 30,000+ file already ftp'ed before going back and looking at
> continueing the ftp process in a different manner and it just seems to be
> taking an extremely long time to perform the delete. I think the hang up is
> the re-write of the directory contents back to disk after a few files are
> deleted. I did the delete with a /log to watch how long it took to delete a
> file. I noticed a pause of a few seconds after it listed 15 to 20 files that
> were deleted.....

Top posting. *sigh*.

In any case, you've hit the "bubble down" problem that I alluded to.
And, because you want to delete the first 30,000 files in a 200,000
file directory, reverse alphabetical order isn't going to do
much for you.

Hmmmm...

There _is_ a sneaky approach.

How about if instead of

$ delete a*.txt;1

you

$ rename a*.txt;1 *.*;2 /log

That should run pretty fast because the directory entries can be
modified in place. There won't be any "bubble down".

You can press control-Y when you come to the 30,000th file.

Then you can go back in and ftp all of your remaining version 1 files.

ftp> mput a*.txt;1

When you're finished you can do a reverse alphabetical order delete
on the whole directory.

Jan-Erik Söderholm

unread,
Oct 16, 2007, 3:45:39 PM10/16/07
to

Note also that if you'd like to delete *all* files in the
directory, The DELETE option in DFU is (claimed to be) much
faster then DCL DELETE.

Mike Minor

unread,
Oct 16, 2007, 3:50:01 PM10/16/07
to

<bri...@encompasserve.org> wrote in message
news:HmX+K+...@eisner.encompasserve.org...

Sorry for the top down....the group I spend the most of my time in prefers
it that way....

any way,

BRILLIANT!.....the rename makes alot of sense and I will give that a try.

Thanks!

Mike


Hein RMS van den Heuvel

unread,
Oct 16, 2007, 3:58:14 PM10/16/07
to
On Oct 16, 3:23 pm, "Mike Minor" <mminor...@earthlink.net> wrote:
> I have a directory with 200000+files, all in the a*.txt;1 range. I need to
> ftp these files to another server. After sending 30,000+ files via FTP I
> realized the magnatude of the ftp process, and interrupted it. I want to
> delete the 30,000+ file already ftp'ed before going back and looking at

So those files are 'early on' in the directory.

'In place' renaming to ;2 for easy exclusion with FTP is not a bad
thought!

What OpenVMS version?

The problem is somewhat similar to one discussed in:

http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=1165943

You may want to check out my rename suggestion there.
It does a double rename to allow the system to allways take from and
add to the end.

This avoids the expensive 1-block shuflef up to make room when
inserting an 'early' file and the
equally expensive shuffle down when the last entry from vbn 1 is
removed.

You may want to adjust it to pre-establish 5K - 25K - chunks of file
to be dealt with.

Cheers,
Hein.


Bob Gezelter

unread,
Oct 16, 2007, 4:39:40 PM10/16/07
to

Mike,

Richard B. Gilbert

unread,
Oct 16, 2007, 4:45:41 PM10/16/07
to
Mike Minor wrote:
> I have a directory with 200000+files, all in the a*.txt;1 range. I need to
> ftp these files to another server. After sending 30,000+ files via FTP I
> realized the magnatude of the ftp process, and interrupted it. I want to
> delete the 30,000+ file already ftp'ed before going back and looking at
> continueing the ftp process in a different manner and it just seems to be
> taking an extremely long time to perform the delete. I think the hang up is
> the re-write of the directory contents back to disk after a few files are
> deleted. I did the delete with a /log to watch how long it took to delete a
> file. I noticed a pause of a few seconds after it listed 15 to 20 files that
> were deleted.....
>
>
> Thank you for your help....
>
> Mike

200,000+ files in one directory is so ridiculous I can scarcely imagine
anyone doing it. You are now finding out just one of the reasons why
it's ridiculous.

Can you just INIT the disk and start over? Or back up everything else,
INIT and restore?

The last time I had to cope with something like this was eight or nine
years ago at McGraw-Hill. Some woman had gone on maternity leave. She
had some sort of self resubmitting job that just went right on creating
these files.... ISTR that, by the time we discovered it, the directory
was well over 2000 blocks in size! It took DAYS to delete all those files!


Jan-Erik Söderholm

unread,
Oct 16, 2007, 5:29:04 PM10/16/07
to

VMS's handling of .DIR files is much better today then 8-9 years ago,
if I'm not wrong. And, as I wrote in another post, DFU has some tools
such as bulk erase of a dir or dir-tree and compress/defrag of .DIR
files to make the "cleaning job" easier. Re-init of the disk should
not be needed on a reasonable new VMS version and using reasonable
modern disk subsystems.

Ron Johnson

unread,
Oct 16, 2007, 5:58:06 PM10/16/07
to

Reverse sorting makes DELETE run oodles faster.

--
Ron Johnson, Jr.
Jefferson LA USA

Give a man a fish, and he eats for a day.
Hit him with a fish, and he goes away for good!

David J Dachtera

unread,
Oct 16, 2007, 8:32:37 PM10/16/07
to

...; however, be VERY CAREFUL which DFU version you use.

V2.7-1 can't handle big trees. It will crash and leave a mess to cleanup,
although ANALYZE/DISK/REPAIR handles it nicely.

--
David J Dachtera
dba DJE Systems
http://www.djesys.com/

Unofficial OpenVMS Marketing Home Page
http://www.djesys.com/vms/market/

Unofficial Affordable OpenVMS Home Page:
http://www.djesys.com/vms/soho/

Unofficial OpenVMS-IA32 Home Page:
http://www.djesys.com/vms/ia32/

Unofficial OpenVMS Hobbyist Support Page:
http://www.djesys.com/vms/support/

David J Dachtera

unread,
Oct 16, 2007, 8:34:19 PM10/16/07
to

Really??!! Which group is that? Even the UN*X groups soundly thrash people for
top posting.

David J Dachtera

unread,
Oct 16, 2007, 8:59:51 PM10/16/07
to
Mike Minor wrote:
>
> Is there any difference in the speed at which the command is executed in the
> following examples?
>
> del a*.*;*
> del a*.txt;*
> del a*.txt;1

As I think you've discovered, the answer is, "it depends". Lots of ;1 files
means lots of directory entries - not necessarily a good or bad thing, just
something to consider, in light of what it takes to delete a single version file
at the beginning of a large directory.

Unusual part of VMS directories is that a single directory entry can represent
multiple versions of a "name.ext". Pull a small directory into EDT sometime
(*PLEASE* use /READ!!!) and check it out.

Each directory record begins with a length attribute (but you won't see that
because EDT does RECORD I/O!), then a version limit, some binary fields before
the "name.ext", then the version numbers and FIDs of the various versions out to
the end of the record.

I don't have the code at hand just now, but I wrote a FILCNT.COM to count the
number of files in a directory simply by reading the directory. It never hits
the file headers, and so is a bit faster than trying to use the DIRECTORY
command using only the default qualifiers (/HEADING, /TRAILING). It does only
one READ for each directory entry, then calculates the number of versions
represented by the entry.

The code *IS* in http://www.djesys.com/freeware/vms/4038_freeware.zip

I went ahead and extracted the code:

FILCNT.COM

$ open/read/share=write dir &p1
$ filcnt_l = 0
$read_loop:
$ read/end=eof_dir dir p9
$ namlen_s = f$extr( 3, 1, p9 )
$ namlen_l = f$cvui( 0, 8, namlen_s )
$ versns_l = (f$length( p9 ) - 4 - namlen_l) / 8
$ filcnt_l = filcnt_l + versns_l
$ goto read_loop
$eof_dir:
$ close dir
$ show symbol filcnt_l
$ exit

I suppose you could make "filcnt_l" (file count, long) a global symbol and use
it for another purpose.

Example usage:

$ @filcnt mydir.dir

I was just sitting here watching defrag run on my W98-SE machine, thinking about
various - totally UNSUPPORTED!!! - ways to accomplish such things as deleting
files the way you said you needed to in your response to Mr. Briggs. Creative,
but not recommendable. Involves doing things folks here would scoff at.

You could, of course, employ the RENAME to ;2 strategy, then start up another
FTP and kick off a delete of the ;2 files in batch. That's fully
supported/-able.

Rudolf Wingert

unread,
Oct 17, 2007, 1:54:43 AM10/17/07
to
Hello,

the search for a*.*;* should be the fastest way. If the first letter
matches, the rest can be ignored and the file can be deleted. For the
a*.txt;* the file extension must match too and in the third case OpenVMS
have to search for three matches.

Best regards Rudolf Wingert.

Richard Brodie

unread,
Oct 17, 2007, 5:54:51 AM10/17/07
to

"Richard B. Gilbert" <rgilb...@comcast.net> wrote in message
news:471522F5...@comcast.net...

> 200,000+ files in one directory is so ridiculous I can scarcely imagine anyone doing it.
> You are now finding out just one of the reasons why it's ridiculous.
>
> Can you just INIT the disk and start over? Or back up everything else, INIT and
> restore?

A bit drastic, perhaps. I would make a load of directories
(say taking the last 2-3 digits of the version number). Then just
SET FILE/NODIR and DELETE the original. Making sure I
had a good backup just in case.


Mike Minor

unread,
Oct 17, 2007, 8:39:37 AM10/17/07
to
intersystems.public.cache at news.intersystems.com

Mike Minor
"David J Dachtera" <djes...@spam.comcast.net> wrote in message
news:4715588B...@spam.comcast.net...

Mike Minor

unread,
Oct 17, 2007, 8:46:20 AM10/17/07
to
Thanks to everyone that gave me a hand here. I really appreciate all of the
information and suggestions.

Thank you,

Mike Minor


AEF

unread,
Oct 17, 2007, 8:48:12 AM10/17/07
to
On Oct 16, 8:59 pm, David J Dachtera <djesys...@spam.comcast.net>
wrote:

> Mike Minor wrote:
>
> > Is there any difference in the speed at which the command is executed in the
> > following examples?
>
> > del a*.*;*
> > del a*.txt;*
> > del a*.txt;1
>
> As I think you've discovered, the answer is, "it depends". Lots of ;1 files
> means lots of directory entries - not necessarily a good or bad thing, just
> something to consider, in light of what it takes to delete a single version file
> at the beginning of a large directory.
>
> Unusual part of VMS directories is that a single directory entry can represent
> multiple versions of a "name.ext". Pull a small directory into EDT sometime
> (*PLEASE* use /READ!!!) and check it out.
>
> Each directory record begins with a length attribute (but you won't see that
> because EDT does RECORD I/O!), then a version limit, some binary fields before
> the "name.ext", then the version numbers and FIDs of the various versions out to
> the end of the record.
>
> I don't have the code at hand just now, but I wrote a FILCNT.COM to count the
> number of files in a directory simply by reading the directory. It never hits
> the file headers, and so is a bit faster than trying to use the DIRECTORY
> command using only the default qualifiers (/HEADING, /TRAILING). It does only
> one READ for each directory entry, then calculates the number of versions
> represented by the entry.

David,

Are you sure you ran DIR without file-header-hitting quals? I find
DIRECTORY/TOTAL works much faster than DIR/TOTAL where DIR is a symbol
defined to be something like the typical DIRECTORY/SIZE/DATE/
PROTECTION. Apparently, DIR/SIZE/DATE/PROT/TOTAL and similar commands
hit the file headers just as if /TOTAL weren't there. So try it being
sure you use JUST DIRECTORY/TOTAL.

I just tried your program on a huge directory and DIRECTORY/TOTAL runs
a little faster.

[...]

AEF

AEF

unread,
Oct 17, 2007, 8:50:52 AM10/17/07
to
On Oct 17, 5:54 am, "Richard Brodie" <R.Bro...@rl.ac.uk> wrote:
> "Richard B. Gilbert" <rgilber...@comcast.net> wrote in messagenews:471522F5...@comcast.net...

>
> > 200,000+ files in one directory is so ridiculous I can scarcely imagine anyone doing it.
> > You are now finding out just one of the reasons why it's ridiculous.
>
> > Can you just INIT the disk and start over? Or back up everything else, INIT and
> > restore?
>
> A bit drastic, perhaps. I would make a load of directories
> (say taking the last 2-3 digits of the version number). Then just
> SET FILE/NODIR and DELETE the original. Making sure I
> had a good backup just in case.

Can you please explain what this means? Make a load of directories and
do what?

If you SET FILE/NODIR on a directory file that has files in it, the
files will still be on the disk and have to be recovered via ANAL/DISK/
REPAIR and then deleted from [SYSLOST].

AEF

Richard Brodie

unread,
Oct 17, 2007, 8:56:09 AM10/17/07
to

"AEF" <spamsi...@yahoo.com> wrote in message
news:1192625452.9...@y27g2000pre.googlegroups.com...

>> A bit drastic, perhaps. I would make a load of directories
>> (say taking the last 2-3 digits of the version number). Then just
>> SET FILE/NODIR and DELETE the original. Making sure I
>> had a good backup just in case.
>
> Can you please explain what this means? Make a load of directories and
> do what?

Make alias entries for the files. Sorry, I rather overedited the previous post.


AEF

unread,
Oct 17, 2007, 9:08:43 AM10/17/07
to

Sorry, I meant something like DIR/DATE/TOTAL, or with any other
qualifiers that don't produce output with /TOTAL.

AEF

bri...@encompasserve.org

unread,
Oct 17, 2007, 9:08:32 AM10/17/07
to

Reading between the lines I took his approach to be:

Hash the file names from the original directory to split them into
a bunch of separate buckets. Suggested hash functions are
version number mod 1000 or version number mod 100.

Create a directory for each such bucket

$ SET FILE /ENTER each file from the original directory so it has
a new entry in the chosen target directory.

Nuke the original directory rather than deleting from it piecemeal.


My impression is that the files in the case at hand are all version 1,
so hashing them based on version number is a poor idea.

I also get the impression that the original poster is trying to get
his 200,000 files migrated to another system, probably in preparation
to deleting them all anyway.

Bob Koehler

unread,
Oct 17, 2007, 9:12:13 AM10/17/07
to
In article <13h9vel...@corp.supernews.com>, "Mike Minor" <mmin...@earthlink.net> writes:
> Is there any difference in the speed at which the command is executed in the
> following examples?
>
> del a*.*;*
> del a*.txt;*
> del a*.txt;1
>

That depends on how many files fit the different patterns, of course,
but I assume that's not what your point is.

There is some work needed to process the wildcard and look for all
the possible matches. If this actually becomes measureable it's
probably being much more heavily influenced by other issues, such as
a directory file larger than the directory cache.

Bob Koehler

unread,
Oct 17, 2007, 9:14:10 AM10/17/07
to
In article <13ha3ti...@corp.supernews.com>, "Mike Minor" <mmin...@earthlink.net> writes:
> I have a directory with 200000+files, all in the a*.txt;1 range. I need to
> ftp these files to another server. After sending 30,000+ files via FTP I
> realized the magnatude of the ftp process, and interrupted it. I want to
> delete the 30,000+ file already ftp'ed before going back and looking at
> continueing the ftp process in a different manner and it just seems to be
> taking an extremely long time to perform the delete. I think the hang up is
> the re-write of the directory contents back to disk after a few files are
> deleted. I did the delete with a /log to watch how long it took to delete a
> file. I noticed a pause of a few seconds after it listed 15 to 20 files that
> were deleted.....
>

This is a good reason to get and use DFU, but in the meantime you
make get faster results via backup/delete to the null device.

Message has been deleted

Bob Koehler

unread,
Oct 17, 2007, 9:18:03 AM10/17/07
to
In article <ff4m5b$l3l$1...@south.jnrs.ja.net>, "Richard Brodie" <R.Br...@rl.ac.uk> writes:
>
> A bit drastic, perhaps. I would make a load of directories
> (say taking the last 2-3 digits of the version number). Then just
> SET FILE/NODIR and DELETE the original. Making sure I
> had a good backup just in case.

That's a good way to:

a) temporarily loose disk space since all the files in the directory
are still on the disk, just not entered into a directory

b) make the next anal/disk/repair really slow as it enters all those
lost files in [syslost]

c) move the problem to having to delete all those files from
[syslost] instead of thier original directory

JF Mezei

unread,
Oct 17, 2007, 2:06:07 PM10/17/07
to
Have you considered renaming the files needing to be deleted to a
different directory, and then deleting them in that directory where the
size will be more manageable ?

Also,

delete az*.*;*
delete ay*.*;*
delete ax*.*;*
...
delete ac*.*;*
delete ab*.*;*
delete aa*.*;*

would be faster since it would begin the deletes further down the list
and while not a full "reverse order" delete, it would reduce the amount
of shuffling it needs to do for each delete.

Also, make sure your volume is not set to "erase on delete" as this will
greatly slow down deletes. (SHOW DEV <disk>/FULL) will tell you if it is
set or not). (SET VOLUME is the command to set/unset that feature).

Phillip Helbig---remove CLOTHES to reply

unread,
Oct 17, 2007, 3:49:22 PM10/17/07
to
In article <13ha3ti...@corp.supernews.com>, "Mike Minor"
<mmin...@earthlink.net> writes:

> I have a directory with 200000+files, all in the a*.txt;1 range. I need to
> ftp these files to another server.

Why not put them in a backup saveset or zip archive before the transfer?

Hein RMS van den Heuvel

unread,
Oct 17, 2007, 5:21:42 PM10/17/07
to
On Oct 17, 3:49 pm, hel...@astro.multiCLOTHESvax.de (Phillip Helbig---
remove CLOTHES to reply) wrote:
> In article <13ha3tiakjbl...@corp.supernews.com>, "Mike Minor"

>
> <mminor...@earthlink.net> writes:
> > I have a directory with 200000+files, all in the a*.txt;1 range. I need to
> > ftp these files to another server.
>
> Why not put them in a backup saveset or zip archive before the transfer?

Because once you are in this situation it is too late?
You can out tham in an archive, but the removal will still cost as
much.

JF wrote...


> Have you considered renaming the files needing to be deleted to a
> different directory, and then deleting them in that directory where the
> size will be more manageable ?

That's what I suggested early on, along with the hint, to do a double
renaming making sure only to take from teh end, and add to the end.
I even included a pointer to a and a working example in perl,

But in the mean time I hacked up something cute....

Attached a tool which can split a directory in two parts in units of
disk-clusters.

It takes the time of a file create and a handfull of IOs for ANY
number of files.
10 files or 10,000 files move literally just as quickly with a slide
of hands.
Perfect scaling! :-)

It's all done with smoke and mirrors involving file headers and
mapping pointers.
Very minimal testing to date... just with empty files on an small LD
device.
I did test the cluster size code, but I really only tested 1 Retrieval
Pointer format for now.


Check this out though... :-)

$ ld create sys$login:lda4.disk /size=10000
$ ld connec sys$login:lda4.disk lda4:
$ init lda4: lda4
$ moun lda4: lda4
$ create/dir lda4:[A]
$ perl -e "foreach $i (1..1000) { open X,"">lda4:[A]$
{i}_blah_blah_blah_${i}""}" ! A little random order
$ dir LDA4:[A]
Directory LDA4:[A]

1000_BLAH_BLAH_BLAH_1000.;1 100_BLAH_BLAH_BLAH_100.;1
101_BLAH_BLAH_BLAH_101.;1 102_BLAH_BLAH_BLAH_102.;1
103_BLAH_BLAH_BLAH_103.;1 104_BLAH_BLAH_BLAH_104.;1
:
998_BLAH_BLAH_BLAH_998.;1 999_BLAH_BLAH_BLAH_999.;1
99_BLAH_BLAH_BLAH_99.;1 9_BLAH_BLAH_BLAH_9.;1

Total of 1000 files.

$ mcr dev:[disk]SPLIT_DIRECTORY lda4:[000000]a.dir lda4:[000000]b.dir
9
! First filename after split: 161_BLAH_BLAH_BLAH_161.;1

$ dir lda4:[a]

Directory LDA4:[A]

161_BLAH_BLAH_BLAH_161.;1 162_BLAH_BLAH_BLAH_162.;1
163_BLAH_BLAH_BLAH_163.;1 164_BLAH_BLAH_BLAH_164.;1
:
99_BLAH_BLAH_BLAH_99.;1 9_BLAH_BLAH_BLAH_9.;1

Total of 932 files.
$ dir lda4:[b]

Directory LDA4:[B]

1000_BLAH_BLAH_BLAH_1000.;1 100_BLAH_BLAH_BLAH_100.;1
:
158_BLAH_BLAH_BLAH_158.;1 159_BLAH_BLAH_BLAH_159.;1
15_BLAH_BLAH_BLAH_15.;1 160_BLAH_BLAH_BLAH_160.;1

Total of 68 files.
$ mcr dev:[disk]SPLIT_DIRECTORY lda4:[000000]a.dir lda4:[000000]c.dir
20
! First filename after split: 291_BLAH_BLAH_BLAH_291.;1

$ dir/total lda4:[*...]
Directory LDA4:[A]
Total of 788 files.
Directory LDA4:[B]
Total of 68 files.
Directory LDA4:[C]
Total of 144 files.
Grand total of 3 directories, 1000 files.

I'll check more configurations if there (ever) is a business
justification.
In the mean time, if I was in a crunch and needed a tool like this
then I would
- mark the directory no-dir
- take a copy
- re-mark as directory

Hope this helps someone, somewhere, somehow
Please let me know if it does!

Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting


/*
** split_directory.c
** Copyright ... Hein van den Heuvel, Oct 2007
**
** This program can be used to split a directory into two, for faster
deletes
** and renames. This is mostly just a fun excercise, but it could come
in handy
** some day. This program workings are relatively simple because
directory files
** are contiguous. So there is just one mapping pointer and extention
headers
** are not likely (!-). We can take a number of blocks (multiple of
cluster
** size) from the bottom of the mapped area and bequeat them to an
other file.
** Then adjust that mapping pointer and the EOF and such and be done
with
** just a file create an a handful IOs.
**
** Method:
** 1) create an empty normal file, on the selected disk
** The main data in the file header for this file will be
replaced
** by adjusted data from the source directory.
** 2) copy the file header attributes from the source directory,
** over the target file header.
** - set eof-block to select high-block
** - adjust mapping pointer (just one.... directories are
contiguous)
** - adjust file-id to clone file id
** - re-calculate checksum and write out
** 3) Write out fresh split.
** 4) On succes touch up original and write out.
** 5) According to the very last page in Kirby McCoy's VMS File
System Internals,
** the vbn write to indexf.sys will trigger a flush of the
associated caches.
** 8.6.7 "User Invalidation if Cached Buffers.
** Seems almost too easy. No poking of the volume lock needed to
do cause
** the RM$DIRCACHE_BLKAST, No poking of the the file
serialization lock
** for the (source) directory?!
**
**
** Enjoy!
** Hein, HvdH Performance Consulting
**
*/

/*
** cc SPLIT_DIRECTORY.C+SYS$COMMON:[SYSLIB]SYS$LIB_C.TLB/lib
**
** libr/extr=fatdef/out=fatdef.h sys$library:sys$lib_c.tlb
*/

#include fh2def
#include fatdef
#include fm2def

#include rms
#include stdio
#include stdlib
#include string
#include dvidef
#include ssdef

typedef struct { short len, cod; void *address; int *retlen; } item;

int sys$open(), sys$connect(), sys$read(), sys$write(), sys$close();
int sys$create(), sys$parse(), sys$search(), sys$erase();
int sys$getdvi(), lib$spawn();

main(argc,argv)
int argc;
char *argv[];
{
int checksum, i, spawn_status;
FAT *source_fat, *target_fat;
unsigned char *p;
union {
unsigned int ebk;
struct {
unsigned short int lo; /* high order word */
unsigned short int hi; /* low order word */
} words;
} ebk;

static unsigned short source_header[256], target_header[256];
static char *usage = "Usage: $ split_directory old_name new_name
<blocks_to_split>\n";
static char esa[256], rsa[256], command[256];
static int status, channel, bytes, blocks_to_split=0, vbn=1;
static int file_hbk, file_nbytes, spec_nbytes;
static int index_file_id_offset, index_file_bitmap_size,
index_file_bitmap_vbn;
static int maxfiles, cluster, source_fid, target_fid;
static struct FAB fab;
static struct RAB rab;
static struct NAM nam;
// static struct XABFHC fhc;
FH2 *source_fh2, *target_fh2;
FM2 *source_fm2, *target_fm2;


item getdvi_items[] = { 4, DVI$_MAXFILES, &maxfiles, 0,
4, DVI$_CLUSTER, &cluster, 0,
0, 0, 0, 0 } ;

struct { int len; char *addr; } devnam_desc, command_desc;

/
******************************************************************************/

/* Verify that we've been properly invoked */

if (argc != 4) printf("%s",usage), exit(1);

/* Use RMS to parse the file so that we get a FID for the header
clone */

fab = cc$rms_fab;
fab.fab$b_shr = FAB$M_NIL; /* want to be alone for thie */
fab.fab$b_fac = FAB$M_PUT | FAB$M_GET | FAB$M_BIO; /* not
really... */
fab.fab$l_fna = argv[1];
fab.fab$b_fns = strlen (argv[1]);
fab.fab$l_nam = &nam;
// fab.fab$l_xab = &fhc;

// fhc = cc$rms_xabfhc;

nam = cc$rms_nam;
nam.nam$l_esa = esa;
nam.nam$b_ess = sizeof (esa) - 1;
nam.nam$l_rsa = rsa;
nam.nam$b_rss = sizeof (rsa) - 1;

rab = cc$rms_rab;
rab.rab$l_fab = &fab;
rab.rab$w_usz = 512;

/*
** Pick up the file ID for the source file...
** re-use the FAB and NAM for target later
*/
status=sys$parse(&fab);
if (status & 1 ) status=sys$search(&fab);
if (status & 1 ) status=sys$open(&fab);
if (!(status & 1 )) return status;
source_fid = nam.nam$b_fid_nmx << 16;
source_fid += nam.nam$w_fid_num;

/*
** Get maxfile and cluster size from GETDVI, in order to calculate
** the offset to apply to the file ID to get the VBN in indexf.sys
*/
devnam_desc.addr = nam.nam$l_dev;
devnam_desc.len = nam.nam$b_dev;
status = sys$getdvi ( 0, 0, &devnam_desc, getdvi_items,0,0,0,0);
index_file_id_offset = 4 * cluster + ( maxfiles/4096 ) + 1;


blocks_to_split = atoi(argv[3]);
if (!blocks_to_split || blocks_to_split % cluster ) {
printf ("blocks_to_split (%d) must be a multiple of the
device"
" clustersize (%d).\n", blocks_to_split, cluster);
printf ("(Yeah, I could round up for you, but this needs to
be"
" a concious choice.\n");
return (16);
}
/*
** EBK check replace by reading beyond split point.
** if (fhc.xab$l_ebk < blocks_to_split) return ( RMS$_EOF );
*/

if (!fab.fab$v_ctg) return ( SS$_FILNOTCNTG );

rab.rab$l_bkt = blocks_to_split + 1;
p = (void *) source_header;
rab.rab$l_ubf = (void *) p;
status = sys$connect(&rab);
if (status & 1 ) status = sys$read(&rab);
if (status & 1 ) status = sys$close(&fab);
if (!(status & 1 )) return status;
i = p[5]; // DIR$B_NAME_COUNT
printf ("! First filename after split: %*s;%d\n",
i, &p[6], source_header[(6+i+1)/2] );

/*
** Re-use the FAB and NAM to create a target file.
** Must be on the same disk.
** We'll use this header to clone the target header into.
** Close it and stash away its file ID.
*/
nam.nam$w_fid_num = 0;
nam.nam$w_fid_seq = 0;
nam.nam$b_fid_nmx = 0;
fab.fab$l_fna = argv[2];
fab.fab$b_fns = strlen (argv[2]);
fab.fab$l_dna = nam.nam$l_dev;
fab.fab$b_dns = nam.nam$b_dev;
fab.fab$l_alq = 0;
status = sys$create(&fab);
if (status & 1) status = sys$close(&fab);
if (!(status & 1 )) return status;
target_fid = nam.nam$b_fid_nmx << 16;
target_fid += nam.nam$w_fid_num;


/*
** re-use the FAB and NAM again to open INDEXF.SYS (id=1,1)
*/
nam.nam$w_fid_num = 1;
nam.nam$w_fid_seq = 1;
nam.nam$b_fid_nmx = 0;
fab.fab$l_fop = FAB$M_NAM;
fab.fab$b_shr = FAB$M_UPI | FAB$M_SHRPUT | FAB$M_SHRGET;
status = sys$open(&fab);
if (status & 1 ) status = sys$connect(&rab);
if (!(status & 1 )) return status;

/*
** Read original header. UBF already set up.
*/
rab.rab$l_bkt = source_fid + index_file_id_offset;
status = sys$read(&rab);
if (!(status & 1 )) return status;
/*
** Read target header.
*/
rab.rab$l_bkt = target_fid + index_file_id_offset;
rab.rab$l_ubf = (void *) target_header;
status = sys$read(&rab);
if (!(status & 1 )) return status;
/*
** Copy record attribute area
*/
source_fh2 = (void *) source_header;
target_fh2 = (void *) target_header;
source_fat = (void *) &source_fh2->fh2$w_recattr;
target_fat = (void *) &target_fh2->fh2$w_recattr;
for (i = 10; i<(sizeof (FAT) / 2); i++) {
target_header[i] = source_header[i];
}
target_fh2->fh2$l_filechar = source_fh2->fh2$l_filechar;

/*
** Set the adjusted, word swapped, End-Of-File-Blocks.
*/

ebk.words.lo = source_fat->fat$w_efblkl;
ebk.words.hi = source_fat->fat$w_efblkh;
ebk.ebk -= blocks_to_split;
source_fat->fat$w_efblkl = ebk.words.lo;
source_fat->fat$w_efblkh = ebk.words.hi;

ebk.words.lo = source_fat->fat$w_hiblkl;
ebk.words.hi = source_fat->fat$w_hiblkh;
ebk.ebk -= blocks_to_split;
source_fat->fat$w_hiblkl = ebk.words.lo;
source_fat->fat$w_hiblkh = ebk.words.hi;

ebk.ebk = blocks_to_split + 1;
target_fat->fat$w_efblkl = ebk.words.lo;
target_fat->fat$w_efblkh = ebk.words.hi;
target_fat->fat$w_hiblkl = ebk.words.lo;
target_fat->fat$w_hiblkh = ebk.words.hi;
target_fh2->fh2$l_highwater = ebk.ebk;
/*
** Now for the tricky part... the mapping pointer.
*/
int mpoffset, map_inuse, lbn, count;
mpoffset = source_fh2->fh2$b_mpoffset;
map_inuse = source_fh2->fh2$b_map_inuse;
source_fm2 = (void *) &source_header[mpoffset];
target_fm2 = (void *) &target_header[mpoffset];
if ( target_fh2->fh2$b_map_inuse ) return (SS$_BADFILEHDR);
target_fh2->fh2$b_map_inuse = map_inuse;

for (i = mpoffset; i < (mpoffset + map_inuse); i++) {
target_header[i] = source_header[i];
}
target_fh2->fh2$l_filechar = source_fh2->fh2$l_filechar;

switch (source_fm2->fm2$v_format) {
case FM2$C_FORMAT1:
lbn = source_fm2->fm2$w_lowlbn + (source_fm2->fm2$v_highlbn
<<16);
lbn += blocks_to_split; // That had better fit!
source_fm2->fm2$w_lowlbn = lbn & 0xFFFF;
source_fm2->fm2$v_highlbn = lbn >> 16;

source_fm2->fm2$b_count1 -= blocks_to_split;

target_fm2->fm2$b_count1 = blocks_to_split - 1;
break;

case FM2$C_FORMAT2:
((FM2_1 *) source_fm2)->fm2$l_lbn2 += blocks_to_split;

source_fm2->fm2$v_count2 -= blocks_to_split;

target_fm2->fm2$v_count2 = blocks_to_split - 1;
break;

case FM2$C_FORMAT3:
((FM2_2 *) source_fm2)->fm2$l_lbn3 += blocks_to_split;

count = ((FM2_2 *) source_fm2)->fm2$w_lowcount + (source_fm2-
>fm2$v_count2 << 16);
count -= blocks_to_split;
((FM2_2 *) source_fm2)->fm2$w_lowcount = count & 0xFFFF;
source_fm2->fm2$v_count2 = count >> 16;

count = blocks_to_split - 1;
((FM2_2 *) target_fm2)->fm2$w_lowcount = count & 0xFFFF;
target_fm2->fm2$v_count2 = count >> 16;
break;

case FM2$C_PLACEMENT:
printf ("Don't want to deal with placement headers.\n");
return SS$_BADFILEHDR;
break;
}

/*
** Write out target header first, in case that is a problem.
** It was the last read, RAB still set up for BKT, RBF, RSZ.
*/
checksum = 0;
for (i = 0; i<255; i++) {
checksum += target_header[i];
}
target_header[i] = checksum & 0xFFFF;
status = sys$write(&rab);
if (!(status & 1)) return status;

/*
** Write out target header first, in case that is a problem.
*/
checksum = 0;
for (i = 0; i<255; i++) {
checksum += source_header[i];
}
source_header[i] = checksum & 0xFFFF;
rab.rab$l_bkt = source_fid + index_file_id_offset;
rab.rab$l_rbf = (void *) source_header;
status = sys$write(&rab);

if (status & 1 ) status = sys$close(&fab); /* close indexf.sys */
return status;
}

Jim Duff

unread,
Oct 17, 2007, 8:24:53 PM10/17/07
to
> [code snipped]

Scary cute ;-) I must say I'd be tempted to take out the volume
blocking lock, no matter what McCoy says. I tend to be "belt and
suspenders" when it comes to stuff like this.

Minor nit: sys$getdvi seems to be missing a "W" on the end, an IOSB, and
return status checks.

Cheers,
Jim.
--
www.eight-cubed.com

David J Dachtera

unread,
Oct 17, 2007, 9:27:05 PM10/17/07
to
Mike Minor wrote:
>
> intersystems.public.cache at news.intersystems.com

AH! Ever seen MUMPS code?

Are there any other questions?

Rudolf Wingert

unread,
Oct 18, 2007, 2:00:41 AM10/18/07
to
Hello,
I think, that the fastest delete is the following:
BACKUP/DELETE A*.*;* NL:T/SAVE/NOCRC/GROUP=0
AFAIK this will do all what you want in the right way (reverse order of
delete).

Best regards Rudolf Wingert

JF Mezei

unread,
Oct 18, 2007, 3:24:15 AM10/18/07
to
Rudolf Wingert wrote:
> BACKUP/DELETE A*.*;* NL:T/SAVE/NOCRC/GROUP=0
> AFAIK this will do all what you want in the right way (reverse order of
> delete).

Out of curiosity, how does Backup achieve this reverse delete ?

Does it build an in-memory list of files processed and once the backup
has been done (and optional verification pass), it parses that in-mmory
list backwards to delete the files ?

Ron Johnson

unread,
Oct 18, 2007, 3:47:18 AM10/18/07
to
On 10/18/07 01:00, Rudolf Wingert wrote:
> Hello,
> I think, that the fastest delete is the following:
> BACKUP/DELETE A*.*;* NL:T/SAVE/NOCRC/GROUP=0
> AFAIK this will do all what you want in the right way (reverse order of
> delete).

But it wastes so much CPU & IO reading thru all the files.

$ PIPE DIRE/COL=1/NOHEAD/NOTRAIL DISK$FOO:[BAR]*.* | -
SORT/KEY=(POS:1,SIZE:,DESC) SYS$PIPE FOO_BAR.TXT

Then DCL to delete them:
$ SET NOVER
$ ON ERROR THEN $GOTO ERR_RTN
$ OPEN/READ IFILE FOO_BAR.TXT
$LTOP:
$ READ/END=LEND IFILE IREC
$ DEL/LOG 'IREC'
$ GOTO LTOP
$LEND:
$ERR_RTN:
$ CLOSE IFILE
$ EXIT

Hein RMS van den Heuvel

unread,
Oct 18, 2007, 7:55:14 AM10/18/07
to
On Oct 18, 3:47 am, Ron Johnson <ron.l.john...@cox.net> wrote:
> On 10/18/07 01:00, Rudolf Wingert wrote:
>
> > Hello,
> > I think, that the fastest delete is the following:
> > BACKUP/DELETE A*.*;* NL:T/SAVE/NOCRC/GROUP=0
> > AFAIK this will do all what you want in the right way (reverse order of
> > delete).
>
> But it wastes so much CPU & IO reading thru all the files.
>
> $ PIPE DIRE/COL=1/NOHEAD/NOTRAIL DISK$FOO:[BAR]*.* | -
> SORT/KEY=(POS:1,SIZE:,DESC) SYS$PIPE FOO_BAR.TXT
> Then DCL to delete them:
:

> $LTOP:
> $ READ/END=LEND IFILE IREC
> $ DEL/LOG 'IREC'

Het middle is erger dan de kwaal?

Speaking of wasteful... what about that image activation for each
delete. Yikes!

I published a simple DCL script for reverse delete a few time in the
past.
For example in:
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=625667
It combines a few deletes per image activation.

These days I'd use PERL:

$ perl -le "foreach (reverse sort glob q(test*.*;1)){ print; unlink }"

Looks tight, is faster.

Cheers,

Hein.

Bob Koehler

unread,
Oct 18, 2007, 8:54:34 AM10/18/07
to
In article <c8994$47164f11$cef8887a$32...@TEKSAVVY.COM>, JF Mezei <jfmezei...@vaxination.ca> writes:
> Have you considered renaming the files needing to be deleted to a
> different directory, and then deleting them in that directory where the
> size will be more manageable ?
>

I would think that a rename removing a file from a directory would
have the same issues processing the directory file that delete has.

set file/enter doesn't have that problem, but deleting the alias
does not delete the file, so it also doesn't accomplish anything.

Bob Koehler

unread,
Oct 18, 2007, 8:55:54 AM10/18/07
to
In article <dcERi.19163$UN.1...@newsfe24.lga>, Ron Johnson <ron.l....@cox.net> writes:
>
> But it wastes so much CPU & IO reading thru all the files.
>

Which is OK if the files are small and the real overhead is in
directory file processing. But DFU doesn't have that problem,
so it's a better solution.

The OP may not have DFU. He may have to jump through hoops to get
it.

bri...@encompasserve.org

unread,
Oct 18, 2007, 9:24:21 AM10/18/07
to
In article <c8994$47164f11$cef8887a$32...@TEKSAVVY.COM>, JF Mezei <jfmezei...@vaxination.ca> writes:
> Have you considered renaming the files needing to be deleted to a
> different directory, and then deleting them in that directory where the
> size will be more manageable ?

Alas, that approach doesn't help at all. The problem at hand was that
the directory was populated with 200,000 files and the original poster
needed to delete the first 30,000 of these.

Whether you delete those files or rename them to another directory
you still end up removing their directory entries. That leaves you
with empty blocks at the front end of a 200,000 file directory.
And that means that you need to shift the remaining data down to fill
in the vacated blocks.


If one was absolutely determined to use such an approach, it would be
possible to use a scheme in which _all_ the files are renamed to another
directory in reverse alphabetical order and the file names themselves
are inverted in lexicographic order -- e.g. A becomes Z, B becomes Y, etc.

That way you'd be updating both directories at the tail end.

> Also,
>
> delete az*.*;*
> delete ay*.*;*
> delete ax*.*;*
> ...
> delete ac*.*;*
> delete ab*.*;*
> delete aa*.*;*
>
> would be faster since it would begin the deletes further down the list
> and while not a full "reverse order" delete, it would reduce the amount
> of shuffling it needs to do for each delete.

If you're deleting the first 30,000 files from a 200,000 file directory,
any such optimization can only shave something like 8% off your total
elapsed time.

You can save yourself from shuffling the first 29,999 directory
entries (average 15,000) down, but there are still 170,000 that you
can't do anything about with this scheme.

Hein RMS van den Heuvel

unread,
Oct 18, 2007, 10:00:54 AM10/18/07
to
On Oct 16, 3:58 pm, Hein RMS van den Heuvel
<heinvandenheu...@gmail.com> wrote:

> On Oct 16, 3:23 pm, "Mike Minor" <mminor...@earthlink.net> wrote:
>
> > I have a directory with 200000+files, all in the a*.txt;1 range. I need to
:
> The problem is somewhat similar to one discussed in:
>
> http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=...
>
> You may want to check out my rename suggestion there.
> It does a double rename to allow the system to allways take from and
> add to the end.

Actually.... I seem to have posted an intermediate version.
Almost right, but not taking from the end towards forwards.
Which was the whole point! Ooops

Here is the correct example
It's just an example, with soem debugging lines still there to help
understand it.
Adapt to individual needs and perl quirks trying to help with files
and filenames.
Or re-write to something similar in DCL.

use strict;
#use warnings;

my $HELPER = "[-.tmp_helper]";
my $TARGET = "[-.tmp_renamed]";
my $i = 0;
my @files;
$_ = shift or die "Please provid double quoted wildcard filespec";

print "wild: $_\n";
s/"//g;
my $wild = $_;
foreach (qx(DIRECTORY/COLU=1 $wild)) {
chomp;
$files[$i++] = $_ if /;/;
}
die "Please provide double quoted wildcard filespec" if @files < 2;

# phase 1

$i = @files;

print "Moving $i files to $HELPER\n";

while ($i-- > 0) {
my $name = $files[$i];
my $new = sprintf("%s%06d%s",$HELPER,999999-$i,$name);
print "$name --> $new\n";
rename $name, $new;
}

system ("DIRECTORY $HELPER");
# phase 2

print "Renaming from $HELPER to $TARGET...\n";

while ($i++ < @files) {
my $name = $files[$i];
rename sprintf("%s%06d%s",$HELPER,999999-$i,$name), $TARGET.$name;
}


Hope this help better :-)

Hein.


Ron Johnson

unread,
Oct 18, 2007, 11:45:25 AM10/18/07
to
On 10/18/07 06:55, Hein RMS van den Heuvel wrote:
> On Oct 18, 3:47 am, Ron Johnson <ron.l.john...@cox.net> wrote:
>> On 10/18/07 01:00, Rudolf Wingert wrote:
>>
>>> Hello,
>>> I think, that the fastest delete is the following:
>>> BACKUP/DELETE A*.*;* NL:T/SAVE/NOCRC/GROUP=0
>>> AFAIK this will do all what you want in the right way (reverse order of
>>> delete).
>> But it wastes so much CPU & IO reading thru all the files.
>>
>> $ PIPE DIRE/COL=1/NOHEAD/NOTRAIL DISK$FOO:[BAR]*.* | -
>> SORT/KEY=(POS:1,SIZE:,DESC) SYS$PIPE FOO_BAR.TXT
>> Then DCL to delete them:
> :
>> $LTOP:
>> $ READ/END=LEND IFILE IREC
>> $ DEL/LOG 'IREC'
>
> Het middle is erger dan de kwaal?
>
> Speaking of wasteful... what about that image activation for each
> delete. Yikes!

Good point. Then read 3-4 records, concatenating them into one
larger string and then delete that. Damn DCL for having in 2007 a
240 byte max record size!

> I published a simple DCL script for reverse delete a few time in the
> past.
> For example in:
> http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=625667
> It combines a few deletes per image activation.
>
> These days I'd use PERL:
>
> $ perl -le "foreach (reverse sort glob q(test*.*;1)){ print; unlink }"

Interesting.

> Looks tight, is faster.
>
> Cheers,

--

JF Mezei

unread,
Oct 18, 2007, 2:47:38 PM10/18/07
to
Bob Koehler wrote:
> Which is OK if the files are small and the real overhead is in
> directory file processing. But DFU doesn't have that problem,
> so it's a better solution.


Does DFU have the ability to selectively delete files from a directory ?
From what help says, it can delete whole directories, or delete by file-id.

The original poster needs to selectively delete a whole bunch of files
in a huge directory.

JF Mezei

unread,
Oct 18, 2007, 2:49:54 PM10/18/07
to
A different twist to the original problem:


Do the FTP in reverse order. For instance, do all the Z*.*;* files, then
delete all the Z*.*;* files. Transfer the Y*.*;* files, then delete all
the Y*.*;* files.

The current batch of undeleted files could stay there until you've done
all the B files, at which point you can delete the A*.*;* files which
you originally transfered first.

This will make the deletes at each stage much faster since you will be
working with files that are towards the end of the directory.

Jan-Erik Söderholm

unread,
Oct 18, 2007, 3:13:41 PM10/18/07
to
JF Mezei wrote:
> Bob Koehler wrote:
>> Which is OK if the files are small and the real overhead is in
>> directory file processing. But DFU doesn't have that problem,
>> so it's a better solution.
>
>
> Does DFU have the ability to selectively delete files from a directory ?

No, don't think so, and when *I* originaly mentioned DFU in this
thread it was about "emtying" or removing a whole DIR.

> From what help says, it can delete whole directories, or delete by file-id.
>
> The original poster needs to selectively delete a whole bunch of files
> in a huge directory.

As this, as far as I understand, is a one time effort to clean
up efter a system or user error, I think that the easiest route
is to just take the time to copy and delete the files using
regular DCL. If one use DFU at some intervalls to compress the
DIR file, one will ge a better performance after a while...

Note, if there have been some creates/deletes in this DIR over
time, it could be a good idea to run a DFU "directory compress"
right from the beginning to get the smallest possible DIR file
with the current files in it. The smaller the DIR file is, the
faster the deletes run.

Jan-Erik.

Bob Koehler

unread,
Oct 18, 2007, 5:18:53 PM10/18/07
to
In article <1c72b$4717aa4b$cef8887a$13...@TEKSAVVY.COM>, JF Mezei <jfmezei...@vaxination.ca> writes:
> Bob Koehler wrote:
>> Which is OK if the files are small and the real overhead is in
>> directory file processing. But DFU doesn't have that problem,
>> so it's a better solution.
>
>
> Does DFU have the ability to selectively delete files from a directory ?
> From what help says, it can delete whole directories, or delete by file-id.

Oops, your right. The op would have to get the FID first, such as
by dir/file_id or f$file_attributes()

David J Dachtera

unread,
Oct 18, 2007, 10:44:55 PM10/18/07
to
Ron Johnson wrote:
>
> On 10/18/07 06:55, Hein RMS van den Heuvel wrote:
> > On Oct 18, 3:47 am, Ron Johnson <ron.l.john...@cox.net> wrote:
> >> On 10/18/07 01:00, Rudolf Wingert wrote:
> >>
> >>> Hello,
> >>> I think, that the fastest delete is the following:
> >>> BACKUP/DELETE A*.*;* NL:T/SAVE/NOCRC/GROUP=0
> >>> AFAIK this will do all what you want in the right way (reverse order of
> >>> delete).
> >> But it wastes so much CPU & IO reading thru all the files.
> >>
> >> $ PIPE DIRE/COL=1/NOHEAD/NOTRAIL DISK$FOO:[BAR]*.* | -
> >> SORT/KEY=(POS:1,SIZE:,DESC) SYS$PIPE FOO_BAR.TXT
> >> Then DCL to delete them:
> > :
> >> $LTOP:
> >> $ READ/END=LEND IFILE IREC
> >> $ DEL/LOG 'IREC'
> >
> > Het middle is erger dan de kwaal?
> >
> > Speaking of wasteful... what about that image activation for each
> > delete. Yikes!
>
> Good point. Then read 3-4 records, concatenating them into one
> larger string and then delete that. Damn DCL for having in 2007 a
> 240 byte max record size!

As of V7.3-2 and later, DCL's "record" (max cmd) length is 4KB (4095, actually).

Glenn Everhart

unread,
Oct 19, 2007, 10:36:48 PM10/19/07
to
A directory listing including file ID could conceivably be used followed
by a short program that would issue io$_delete on each file by ID. Mark the
directory as /nodir first if you want to avoid nonsense.

This is approximately what DFU does. Since it is not resorting directories
ever, in doing this, it will run fairly fast. Once you have the file IDs,
after all, the files are all just files and it makes no difference whether
they are in one directory or separate directories.

The ACP interface is described in the firsts chapters of the RMS manual
as I recall. Note that it (using io$_delete)is not the same as lib$delete which
opens the file and closes with delete as disposition.

Glenn Everhart

0 new messages