>> Currently, this directory is hosting around 80,000 files which are of past 60 days.
>> 11725/21845
That is well beyond the point the point of noticeably diminishing performance.
The math is really simple....
If a file with a name which sort early is added requiring the directory to make room, or if it is deleted creating an empty (512 byte) block, then the rest of the directory is shuffled up resp moved down in ACP_MAXREAD chunck.
ACP_MAXREAD is typically 32 blocks, so that 1 file will cause 11725/32=366 read and write IOs during which the directory is locked out and invalidating the cache on all other cluster nodes. The reads are likely from cache. Verify the size with MCR SYSGEN SHOW /ACP
> > Today I created a blank file and did a test backup and please find the output below for the time taken.
I do not understand that test. Is is unlikely to depend a lot on the directory size since is did a simple direct lookup. Is that backup command similar to how the directory is typically use? What is a blank file? Zero block, or many zero filled blocks ? How big?
So you are indicating it took almost 10 minutes to copy one file from the problem directory with backup?
Now backup takes a retarded approach... it reads the directory in chunks doing an access and de-access as it goes. It can easily require 400+ accesses for a 10,000+ block directory and if that directory is write accessed on other nodes I could see it take 'forever'. Still... 10 minutes?
How does 'COPY' perform for you?
It does a single lookup and lets the XQP deal with it.
Anyway... JFM has the right idea for now... turn in into a search list.
How do applications find/use the directory?
Can you create a fresh directory
$ CREATE/ALLO=2000 LOGICAL_ROOT:[SCRATCH1]
and next re-define the logical as
LOGICAL_ROOT:[SCRATCH1], LOGICAL_ROOT:[SCRATCH]
Existing files can still be found, but new ones will be create in the (pre-allocated) new directory.
Now start cleaning [SCRATCH] ... from HI to LO, Z to A
Repeat monthly, weekly or daily as needed.
I create a test directory with 55,000 files in 11,000 blocks with the scripts below. So 5 files/block. You'll see just a few IO's to delete files 1..4 in a block, but delete file # and the now empty the block will need 690 IOs to get expunged when early in the sort, bu just a few IOs late in the sort order.
More importantly... how does the application use the scratch directory?
Are files handed over from one process to the next with a strict design,
or is it perhaps just a 'free-for-all' zone out of the quota / backup realm?
Can you not just give each process its own scratch directory based on username or UIC or some modest grouping based on that?
> Now, Im in process of moving more files out of this directory and putting in sub Directories.
ok...
> If I move the files and free up the directory, will the response time decrease automatically? Or Anyfurther action needs to be taken to improve this?
Yes. No need to shrink the directory, only EOF size is considered.
The more files are 'spread' within those sub directories in the first few bytes of the name, the better the system works. The worst you can do is to have files like MAIL$xxx ... a sin OpenVMS itself made. You may want to consider per-node subdirectories or searchlist where the 'home node' comes first, yet others are there for incidental access as needed
> Is there a tool available to measure the response time of a directory file?
Just do it and measure?
It would vary too much due to unpredictable shuffles and cross cluster-node directory datablock cache invalidations
But be sure to do something similar to what the application uses.
Don't use BACKUP if the application uses COPY or DELETE.
Use Wildcards or not as the application would.
Keep context, as the application would (F$SEARCH ?)
> How do I identify that a directory is taking a long time to respond?
(say if I'm clearing the files today and after a week if the same issue occurs, I want to put some script to notify me)
If it is big then it is bad. Simple.
> Disk DSA1240:, device type Generic SCSI disk ------ Its a two member shadow
ok... so worst case write. Is one of the member geograpically distant?
Then 300 Io's could start to take 300*50 ms = 15 seconds... or worse.
>> Its a mixed cluster with 5 Alpha and 4 Itanium - One Alpha is just a quorum
> %ANALDISK-W-BAD_NAMEORDER, filename ordering incorrect in VBN 1489 -- There are multiple entries similar to this
Please urgently report to OpenVMS Engineering via your support channels.
There are currently a few customers with up to date patches which are experiencing directory corruptions and we have little to go expect for hi-activity and mix-cluster usage. Serious stuff. Really bad if true.
> If I just give $ dire/Grand , I'm getting response in around 2 seconds. But If I give any additional operation like /Since or /before, then it takes a lot of time to respond around 30 seconds.
That is absolutely normal and predictable. Learn to use DFU!
If you do a directory requiring anything other than the name then the system will have to access the file header. The header is a block in INDEXF.SYS where dates and size and actually mapping is recorded.
So a DIR/SINCE looking at 50,000 files will require 50,000 IOs. No ifs or buts.
If that is done in 30 seconds, then your IO system is performing about 1500 read IO's per second. Not bad.
Cheers,
Hein
ps... when testing this stuff, consider a RAM disk:
$ mcr sysman io connect mda1:/driver=sys$mddriver/noadap
$ init mda1:/size=50000 ram
$ moun /sys mda1: ram ram
%MOUNT-I-MOUNTED, RAM mounted on _EISNER$MDA1:
$ cre/dir ram:[test]
$ cre ram:[000000]tmp.tmp
$
$ create
fill.com
$ i = 1
$ max = p1
$ if f$type(max).nes."INTEGER" then exit 16
$ loop:
$ SET FILE/ENT=RAM:[TEST]'f$fao("!6ZL!33*x.!39*y",i)' RAM:[000000]tmp.tmp
$ i = i + 1
$ if i .lt. max then goto loop
$
$ @fill 55000 ! 10,000,000+ IO's
$ dir/size=all ram:[000000]test.dir
$ @time delete ram:[test]000011*.*;* ! yes, SET FILE/REMOVE is quicker.
Dirio= 9 Bufio= 18 Kernel= 5 RMS= 11 DCL=0 User= 0 Elapsed= 17
$ @time delete ram:[test]000012*.*;*
Dirio= 3 Bufio= 18 Kernel= 5 RMS= 11 DCL=0 User= 0 Elapsed= 16
$ @time delete ram:[test]000013*.*;*
Dirio= 3 Bufio= 18 Kernel= 5 RMS= 11 DCL=0 User= 0 Elapsed= 16
$ @time delete ram:[test]000014*.*;*
Dirio= 3 Bufio= 18 Kernel= 4 RMS= 10 DCL=0 User= 0 Elapsed= 16
$ @time delete ram:[test]000015*.*;*/lo
Dirio= 691 Bufio= 18 Kernel= 10 RMS= 10 DCL=0 User= 0 Elapsed= 22
:
$ @time delete ram:[test]054011*.*;*
Dirio= 22 Bufio= 18 Kernel= 7 RMS= 18 DCL=0 User= 0 Elapsed= 25
$ @time delete ram:[test]054012*.*;*
Dirio= 16 Bufio= 18 Kernel= 7 RMS= 19 DCL=0 User= 0 Elapsed= 26
$ @time delete ram:[test]054013*.*;*
Dirio= 15 Bufio= 18 Kernel= 4 RMS= 18 DCL=0 User= 1 Elapsed= 26
$ @time delete ram:[test]054014*.*;*
Dirio= 14 Bufio= 18 Kernel= 6 RMS= 20 DCL=0 User= 0 Elapsed= 26
$ @time delete ram:[test]054015*.*;*
Dirio= 27 Bufio= 18 Kernel= 6 RMS= 18 DCL=0 User= 0 Elapsed= 26