RMS indexed file structure questions

JF Mezei

unread,

Mar 13, 2007, 2:32:34 AM3/13/07

to

Some months ago, I suffered a disk corruption. I tried many tools to try to
rescue a few indexed files, but to no avail.

The solution I have selected is to use a program to scan the raw data in the
file. Each record starts with a filename beginning with "OA$SHAR", and should be
preceeded by 2 bytes indicating record length that should be greater than 64 and
not 2 blanks.

This **SEEMS** to work. However, there appears to be duplicate keys.

So I have questions:

When a record is deleted from a bucket, is the space occupied by the records
zeroed or doe the record remain there intact, with only the index areas updated ?

When a record is delete from a bucket, would records stored after the deleted
record in a bucket be shiften to the "left" by the length of the deleted record
? (zapping the deleted record, but leaving a ghost image of the last record in
the bucket) ?

When a record is updated with a longer length, does it get put at the end of
that bucket with its original location left intact ? Or do all records after the
updated record getting shifted to the "right" to leave enough space to rewrite
in-situ the updated record with the few extra bytes in it ?

In other words, when I see multiple instances of a key in a file, should the
record with the highest relative position in the file be considered the most
recent ? Or is there no way to know ?

If I have a bucket size of 30, does this mean that all structures (except the
first 2 blocks) in the file would be 30 blocks long ? Or would index blocks be
allocated with different granularity ?

I have found one area of what appear to be the index. It consists of the 65 byte
keys with no separators between the keys. Is there some "signature" I can look
for at the start of a block to identify this block (or seres of 30 blocks) and
an index block/bucket ?

Hein RMS van den Heuvel

unread,

Mar 13, 2007, 8:08:27 PM3/13/07

to

On Mar 13, 2:32 am, JF Mezei <jfmezei.spam...@vaxination.ca> wrote:
> Some months ago, I suffered a disk corruption. I tried many tools to try to
> rescue a few indexed files, but to no avail.

Read the second part of my RMS_TUNING.PPT on the OpenVMS V6.0 Freeware
[RMS_TOOLS]
Read it again.

Now. Since you had disk corruption we may assume that entire blocks,
more specifically entire clusters of data are whiped out. What we need
to do is 'connect' all seeming valid data buckets with the NEXT
pointers together, skipping over bad clusters.
- Optionally do so in primary key order.
- Optionally accept any bucket which starts out ok, but set the NEXT-
FREE-BYTE to just beyon the last whole data record.

Valid data bucket are easy to recognize by the 14 bucket header bytes
and the 11 or 13 first record header byte with its very recognizable
RRV and record flag byte (2 or 6).
Suggestions for a next valid data bucket are:
- Whatever the next bucket pointer points to
- The next adjacent block after the bucket (rounded up to cluster
boundary for poorly pre-allocated files)
- The next pointer from the index bucket above it.

> The solution I have selected is to use a program to scan the raw data in the
> file. Each record starts with a filename beginning with "OA$SHAR", and should be
> preceeded by 2 bytes indicating record length that should be greater than 64 and
> not 2 blanks.

Maybe... RECORD / KEY COMPRESSION might make this tricky.
The preceding RFA + a 02 byte for 'valid record flag' is more
critical.

> This **SEEMS** to work. However, there appears to be duplicate keys.
>
> So I have questions:
>
> When a record is deleted from a bucket, is the space occupied by the records
> zeroed or doe the record remain there intact, with only the index areas updated ?

It depends on Prologue 1 vs 3 and whether duplicates are allowed.
For most file any deleted record is expunged, by shuffling down the
rest of teh bucket and adjusting the next free byte. That means
however that if the bucket held records A,B,C and B is deleted you
micght actually 'see' A,C,C' where C' is the old C bytes still there,
but no longer valid.
The index records are never updated for a delete, as they point to a
bucket, not to a record.

> When a record is delete from a bucket, would records stored after
the deleted
> record in a bucket be shiften to the "left" by the length of the deleted record
> ? (zapping the deleted record, but leaving a ghost image of the last record in
> the bucket) ?

Yes.

> When a record is updated with a longer length, does it get put at the end of
> that bucket with its original location left intact ? Or do all records after the
> updated record getting shifted to the "right" to leave enough space to rewrite
> in-situ the updated record with the few extra bytes in it ?

Right shift to make rooom.

> In other words, when I see multiple instances of a key in a file, should the
> record with the highest relative position in the file be considered the most
> recent ? Or is there no way to know ?

Look at the first free byte.
Use ANAL/RMS ... POSI /BUCK

> If I have a bucket size of 30, does this mean that all structures (except the
> first 2 blocks) in the file would be 30 blocks long ? Or would index blocks be
> allocated with different granularity ?

Each key CAN have up to 3 zones with unique bucket sizes, but rarely
do.
The 30 block everywhere is a safe assumption. The 2 blocks header not.
Use ANAL/RMS/INT !

> I have found one area of what appear to be the index. It consists of the 65 byte
> keys with no separators between the keys. Is there some "signature" I can look
> for at the start of a block to identify this block (or seres of 30 blocks) and
> an index block/bucket ?

Use ANAL/RMS/INT !
For NO COMPRESSED KEYS The index bucket are filled from low to high
with raw values, and the corresponding pointers are in an array
running from high to low. Free byte pointers for both.
COMPRESSED Index records look like 'Bytes-to-clone-byte, Fresh-Bytes-
bytes, Fresh-Data-bytes, Pointer'

Use ANAL/RMS/INT
use my ZAP tool

Too bad this is not a commercially critial file.
I love to do this stuff for real money.
Fame alone jut doesn't cut it. :-)

Regards,
Hein van den Heuvel
HvdH Performance Consulting.

JF Mezei

unread,

Mar 14, 2007, 4:58:56 AM3/14/07

to

Hein RMS van den Heuvel wrote:
> Read the second part of my RMS_TUNING.PPT on the OpenVMS V6.0 Freeware
> [RMS_TOOLS]
> Read it again.

Reading it again, after my twadling into the file was good.

> more specifically entire clusters of data are whiped out.

Actually, it can be portions of a cluster. The LD drive wrote over "random"
sequences of blocks, it didn't know to squish at 30 block boundaries (the bucket
size for that file). So I may have a bucket that starts off valid but only has
10 valid blocks with the remainder containing junk. And conversely, I may have a
cluster which has only its first block invalid, but still contains valid records
after it. Your presentation does not cover the type of failure I experienced :-(

The primary and only index is uncompressed. It appears to be in 2 levels. In
ana/rms/interactive, the first index level appears to have 8 entries (last one
has a key filled with %xFF.

I was able to DOWN DOWN from each of the 8 entries, and then "REST" to display
all keys in that branch. Does this mean that my index is actually intact ?

Is it correct to state that traversing the index will only give me the highest
key value of each data bucket ? (but would give me a list of VBNs where each
bucket starts) ?

------------

In my case, the DATA_KEY_COMPRESSION is off, but DATA_RECORD_COMPRESSION is on.

So, when I end up extracting the variable length record from that raw file, it
will be compressed. Is the compression logic publicly documented ? complex ? (I
looked at your record_compression.c example, and it wasn't obvious what logic is
used to uncompress the data).

If I write the compressed raw records to a sequential file, then use CONVERT to
load it to a new indexed file (defined with record-compression:OFF), would I be
able to tweak a bit to enable record compression (similar to a SET FILE/ATRRIB)
and RMS would then succesfully decompress each record ?

I have seen mention of RRV a few times. What does this mean ?

> Valid data bucket are easy to recognize by the 14 bucket header bytes

what are the check bytes ? Is this a checksum ?

In your sample programs, I noticed that you calculate total record length by
adding key length to the record length.

You have the following structure:
#pragma nomember_alignment
struct record {unsigned char flag;
short id, rrv_id;
int vbn;
short length;
unsigned char key_length, key_count, key[]; };

What is the key count ?

In my case however, taking the 2 bytes preceeding start of key seems to give me
a record length which approaches what DUMP/RECORD gives me. (but off by a number
of bytes, I assume due to record decompression. If I choose the length that is 4
bytes before the key data, I get some "20300" integer number for each record.

Are there different record header layouts for different types of indexed files ?
(this is a prolog 3, single key file).

Also, is it possible that with record compression on, the stored record will be
longer by a couple of bytes than the actual record ?

> The index records are never updated for a delete, as they point to a
> bucket, not to a record.

But if you delete the rightmost record in a bucket, wouldn't the index be
updated to contain the key of the new rightmost record ? Or does the key or the
deleted record remain in the index ?

> Too bad this is not a commercially critial file.
> I love to do this stuff for real money.

Yeah, but it is a learning experience for me :-) :-( :-(

Since this was a bug with the LD driver provided with 8.3 which caused this
corruption, perhaps HP might pay you to fix this hobbyist system indexed file
???? :-) :-) :-)

Hein RMS van den Heuvel

unread,

Mar 14, 2007, 8:03:27 AM3/14/07

to

On Mar 14, 4:58 am, JF Mezei <jfmezei.spam...@vaxination.ca> wrote:
> Hein RMS van den Heuvel wrote:
>
> > more specifically entire clusters of data are whiped out.
> Actually, it can be portions of a cluster. The LD drive wrote over "random"
> sequences of blocks,

If I recall correctly it might have used LBN -> VBN mapping from one
disk onto an other.
So the cluster size may have been different, and as you say it might
have only written a few blocks impying an update or append.The most
likely case though is a series of blocks starting at a cluster
boundary on the real target disk. No, indeed it would no have know
about the bucket size.

> So I may have a bucket that starts off valid but only has
> 10 valid blocks with the remainder containing junk.

That's what I was saying, and for those you patch the NEXT FREE byte
in the header.

> And conversely, I may have a
> cluster which has only its first block invalid, but still contains valid records
> after it. Your presentation does not cover the type of failure I experienced :-(

That's the hardest to recover from. I did believe I mention that.
You have to reconstruct a bucket header. And then you have to fill the
space up to the start of teh good record with a deleted record (with
the right key in case of compression).
I only needed to do that once ever... For Andy Goldsteins MAIL.MAI
file.

> The primary and only index is uncompressed. It appears to be in 2 levels.

It will tell you. But 2 is most likely.

> In ana/rms/interactive, the first index level appears to have 8 entries (last one
> has a key filled with %xFF.

Sure.

> I was able to DOWN DOWN from each of the 8 entries, and then "REST" to display
> all keys in that branch. Does this mean that my index is actually intact ?

Sounds like it. So no you know the VBN of each potential data bucket.

> Is it correct to state that traversing the index will only give me the highest
> key value of each data bucket ?

Yes.

> (but would give me a list of VBNs where each bucket starts) ?

Yes.

> So, when I end up extracting the variable length record from that raw file, it
> will be compressed. Is the compression logic publicly documented ? complex ? (I
> looked at your record_compression.c example, and it wasn't obvious what logic is
> used to uncompress the data).

yes, no, no, right tool.

The compression is just a repeating byte count for repeats > 5 (or
so).
Each chunk then is: length, data bytes, last-byte, repeat-count-for-
last-byte-value.

> If I write the compressed raw records to a sequential file, then use CONVERT to
> load it to a new indexed file (defined with record-compression:OFF), would I be
> able to tweak a bit to enable record compression (similar to a SET FILE/ATRRIB)
> and RMS would then succesfully decompress each record ?

Yuck. It would be easier to patch up the data buckets in the broken
file. IMHO.
Or try the -b option for http://h71000.www7.hp.com/freeware/freeware60/rms_tools/bonus/buckets.c

> I have seen mention of RRV a few times. What does this mean ?

It's a 4-byte VBN + 2 byte ID. It's the RFA for where the Record
started out. Not to worry.

> > Valid data bucket are easy to recognize by the 14 bucket header bytes
> what are the check bytes ? Is this a checksum ?

Just a simple, wrapping, modification counter. Starts out 0 on a
convert.
Often corresponds to the number of record in the bucket. (no updates,
no deletes, just $put)

> In your sample programs, I noticed that you calculate total record length by
> adding key length to the record length.

The primary key is extracted from the record and moved to the front of
the rest record data, which is a no-op for most files.

> You have the following structure:
> #pragma nomember_alignment
> struct record {unsigned char flag;
> short id, rrv_id;
> int vbn;
> short length;
> unsigned char key_length, key_count, key[]; };
>
> What is the key count ?

The number of bytes to re-use from the key value from the PRIOR
record.
Only present with KEY (or INDEX) compression enabled. Not present in
your case.
My record definition is sloppy. I adjust at runtime based on the
flags.

> Also, is it possible that with record compression on, the stored record will be
> longer by a couple of bytes than the actual record ?

Yes, if no sufficiently long sequence of repeating bytes was detected,
then having compression adds 3 bytes. (length word + repeat count
byte).

> But if you delete the rightmost record in a bucket, wouldn't the index be
> updated to contain the key of the new rightmost record ? Or does the key or the
> deleted record remain in the index ?

The bucket remains permanently associated with its first high key
value it obtained on the bucket split (which may have been the simple
transition from the high key FF for the last bucket to the real key.

Hein.

JF Mezei

unread,

Mar 18, 2007, 6:15:02 PM3/18/07

to

Hein RMS van den Heuvel wrote:

(for situation where the leftmost blocks of a bucket have been zapped, but
rightmost contain valid records)

> You have to reconstruct a bucket header. And then you have to fill the
> space up to the start of teh good record with a deleted record (with
> the right key in case of compression).

Would it not be simpler for me to simply shift the valid right end of the bucket
to the left and update the bucket's first free byte ?

If the file is defined as having 2000 max rec length (variable), can I cheat and
create a 5000 byte deleted record to fill a void ?

> Yuck. It would be easier to patch up the data buckets in the broken
> file. IMHO.

I have come to the same conclusion. However, whatever way I choose, I still be
to be able to parse the file.

The bucket structure seems simple enough and the _bkt structure in bktdef seems
to map well to reality inside the file.

Question: is there a way to tell from the bucket header what type of bucket I am
dealing with ? (index of various shapes/forms, data etc ?)

I have learned about the check byte. First byte of the bucket must match the
last byte of the bucket. And for my purposes, the value of that byte is meaningless.

However, there is mention of "end of bucket overhead" in the include files.
there are 3 constants defined, with values of 2 or 4. Apart from the last byte
of a bucket, are there preceeding bytes that need to be set when creating a
bucket ? (I will likely have to synthetize bona fide buckets).

Is it legal to have a bucket without records in it ? (aka: first free byte is at
offset 14, right after the end of the bucket header) ? Or would I need to create
at least one dummy deleted record ?

---

When using CONVERT it just find the first data bucket and then walk through the
data buckets by using the "next bucket VBN" field in the bucket header ? Or
will it be parsing the index ?

The reason I ask:

say bucket 27 points to bucket 28 as the next bucktet. But buckets 28 though 32
are corrupt. I could then just patch bucket 27 to point to bucket 33 as the next
one. Walking the buckets would avoid bad buckets. But the index would still have
entries pointing to bad buckets. Would convert complain ?

(I realise buckets are numbered with VBNs).

================

I am having problems parsing the records within a bucket.
IRCDEF defines 6 bytes of bitflags, 1 filler byte.

For variable length files of prologue3, it defines an overhead of 11 bytes for
the record header. And this matches what I am seeing in the file. And I know
that the bytes 10-11 represent the record length. But that is still leaving 3
undefined bytes. How can I find out ?

Also, if the 2 bytes containing the length say the record is 100 bytes, does
this mean that the next record header begins at the 101th byte ? Or is there an
"end of record" overhead with some control bytes etc ?

Is there a way to find a clear and complete description of the record header
that is garanteed to fit what I actually have in the file ?

Now, I was thrown off by the compression business. Turns out that that the
record key is not compressed but the rest of the record is. So the 2 bytes
before the start of the record (key is at offset 0).

-------------------

OK, I have looked at the data for an index bucket. (uncompressed index in my case)

It appears to have a 14 byte bucket header.

Followed immediatly by raw sequence of fixed length key values in ascending
order with no separator between each key value.

The bucket header's "first free byte" points to the first byte after the list of
key values.

At the end of the bucket, going backwards, there is the bucket's check byte, 3
unknown bytes, and then a sequence of 2 byte VBNs that correspond, in reverse
order to the key values at the start of the index. So the first key value at the
start of the header goes to the VBN that is the last in the list of VBNs.

Is this correct ?

Questions:

Is there documentation for those last 3 bytes in the bucket ?
Is there a count of keys in that bucket ? Or must I calculate it myself ? (first
free byte - 14) / key size)

It is pretty interesting to see the amount of redundancy and checks that were
designed into the indexed files.

Hein RMS van den Heuvel

unread,

Mar 18, 2007, 9:30:00 PM3/18/07

to

On Mar 18, 6:15 pm, JF Mezei <jfmezei.spam...@vaxination.ca> wrote:
> Hein RMS van den Heuvel wrote:

This is turning into quit the lesson!

> (for situation where the leftmost blocks of a bucket have been zapped, but
> rightmost contain valid records)

> > You have to reconstruct a bucket header. And then you have to fill the
> > space up to the start of teh good record with a deleted record

> Would it not be simpler for me to simply shift the valid right end of the bucket

> to the left and update the bucket's first free byte ?

Yes, with no compression and if you want to write a program.
With my little ZAP tool and a couple of well aimed deposits faking a
deleted record is not too hard.

> If the file is defined as having 2000 max rec length (variable), can I cheat and
> create a 5000 byte deleted record to fill a void ?

Yes.

> I still be to be able to parse the file.

Several of my tools implenent parts of that. ANAL/RMS does it all.

> Question: is there a way to tell from the bucket header what type of bucket I am
> dealing with ? (index of various shapes/forms, data etc ?)

Just close your eyes and let the bits speak to you.
Seriously, you quickly derive a 'feel' for what one is looking at.
Biggest helpers for you specific question are: BKT$B_INDEXNO, BKT
$B_LEVEL

> I have learned about the check byte. First byte of the bucket must match the
> last byte of the bucket. And for my purposes, the value of that byte is meaningless.

That's just ignorance.
For you it _might_ mean whether the same bucket from a backup was
changed since that backup.
You do have some backup right?

> However, there is mention of "end of bucket overhead" in the include files.
> there are 3 constants defined, with values of 2 or 4. Apart from the last byte
> of a bucket, are there preceeding bytes that need to be set when creating a
> bucket ? (I will likely have to synthetize bona fide buckets).

Nothing for data buckets, only for index bucket.
Check ANAL/RMS/INT for a good file carefully.
May favourites for this are SYSUAF and SYS$LIBRARY:VMS
$PASSWORD_DICTIONARY.DATA
Once you are in a (the one and only?) index bucket. Do you see: " VBN
Free Space Offset"
Now DUMP (in ANAL/RMS or better with plain $DUMP/BLOC=(COUN=1,STA=vbn
+bks-1)
See that offset? See how it is word aligned? That's your 4 bytes I
suspect.

> Is it legal to have a bucket without records in it ?

No, just skip it.

Remember, your goal is NOT to create a great new indexed file.
You goal should be an indexed file which is just good enough to
convert!

> When using CONVERT it just find the first data bucket and then walk through the
> data buckets by using the "next bucket VBN" field in the bucket header ? Or
> will it be parsing the index ?

It should. IMHO. But it actually walks the left edge of the index
structure FOR THE FIRST BUCKET.
After that is just follows bkt$l_nxtbkt

> I am having problems parsing the records within a bucket.
> IRCDEF defines 6 bytes of bitflags, 1 filler byte.

No. That's a union. All different ways to look at the same byte.
Check with the much simpler:
$libr/extr=$ircdef/out=tt: sys$library:lib.mlb

> For variable length files of prologue3, it defines an overhead of 11 bytes for
> the record header. And this matches what I am seeing in the file. And I know
> that the bytes 10-11 represent the record length. But that is still leaving 3
> undefined bytes. How can I find out ?

Study ANAL/RMS/INT output. DUMP the bucket.
But is really is simple:
control-byte
record-id-word
RRV-6-byte = record-id-word + vbn-long for first location of this
record, which may be current.
record-length-word
= 1 + 2 + 6 + 2 = 11

> Also, if the 2 bytes containing the length say the record is 100 bytes, does
> this mean that the next record header begins at the 101th byte ?

Yes. Surely that is trivial to verify in a good file with.. dare I say
it again... ANAL/RMS/INT

>Or is there an "end of record" overhead with some control bytes etc ?

Do you see any? Do you see any defintion for any?

> Is there a way to find a clear and complete description of the record header
> that is garanteed to fit what I actually have in the file ?

Send money this way.

> Now, I was thrown off by the compression business. Turns out that that the
> record key is not compressed but the rest of the record is.

Sure. Could be.

> At the end of the bucket, going backwards, there is the bucket's check byte, 3
> unknown bytes,

Hey, there's your 4 byte overhead.
But iti is not unknown. It is the counterpart of next-free-byte for
the pointer storage array.

and then a sequence of 2 byte VBNs that correspond, in reverse

:
Is this correct ?

Yes but inaccurate. Whether it is 2 bytes depends on the size of the
file and is encoded in: bkt$v_ptr_sz

> Questions:
- answered above

> It is pretty interesting to see the amount of redundancy and checks that were
> designed into the indexed files.

Yeah, and if those fail you et RMS$_CHK, bucket format check failed
for VBN = 'nnn'
And that is a bit of a problem for RMS because in 99.9% of the time
RMS did not cause a the correption but is just the messenger. Other
systems might just croak or feed bad data to the application and never
get the blame. But as RMS reports the problem is often gets the blame!

Cheers,
Hein.

JF Mezei

unread,

Mar 19, 2007, 12:56:52 AM3/19/07

to

Hein RMS van den Heuvel wrote:
> This is turning into quit the lesson!

And it is very appreciated. And I am sure that someone bitten by a similar
problem later on will appreciate your having taken the time to document this for
me and Mr Google.

> Yes, with no compression and if you want to write a program.
> With my little ZAP tool and a couple of well aimed deposits faking a
> deleted record is not too hard.

The problem I have is to generate the list of bad blocks/buckets. This is a 50k
block file with a few thousand bad blocks. All the standard tools such as
ANA/RMS just choke at the first one.

And I also want an idea of exactly how much have lost from the file and record
which key ranges were lost. I have a 1 year old backup and may then see if any
of the missing records can be sourced from the old file.

> Several of my tools implenent parts of that. ANAL/RMS does it all.

ANA/RMS per say chokes at the first bad block. Ana/rms/interactive is very
useful to ensure my program gets the same values. But for a large file, manual
scanning doesn't work.

> Just close your eyes and let the bits speak to you.

Sorry, I don't have a sound card on my workstation :-( :-(

> Seriously, you quickly derive a 'feel' for what one is looking at.
> Biggest helpers for you specific question are: BKT$B_INDEXNO, BKT
> $B_LEVEL

Thanks. Will get to work on that.

> For you it _might_ mean whether the same bucket from a backup was
> changed since that backup.
> You do have some backup right?

A year old backup. Depending on where the corruption occured, it may have
affected old records (emails) and can thus can be recovered from the backup. For
more recent records, partial reconstruction can be made from the email file as
well as the private docdb records which have pointers to the corrupted file and
a few copies of the fields.

> > Is it legal to have a bucket without records in it ?
>
> No, just skip it.

> It should. IMHO. But it actually walks the left edge of the index
> structure FOR THE FIRST BUCKET.
> After that is just follows bkt$l_nxtbkt

Ok, thanks. So I can then safely modify the nxtbkt fields to skip over totally
empty buckets then.

> No. That's a union. All different ways to look at the same byte.

Ahh, many thanks.

> But is really is simple:
> control-byte
> record-id-word
> RRV-6-byte = record-id-word + vbn-long for first location of this
> record, which may be current.
> record-length-word
> = 1 + 2 + 6 + 2 = 11

Ok. Many many many many thanks. Out of curiosity, is this documented somewhere
regular humans have access to ?

IRCDEF doesn't say anything about that.

> Yes. Surely that is trivial to verify in a good file with.. dare I say
> it again... ANAL/RMS/INT

I was going to create a test file with just textual data in it to see. My "bad"
file has a mix of binary and textual data in it, so it is very hard for me to
know where a record actually ends just by looking at the dump output.

> Hey, there's your 4 byte overhead.
> But iti is not unknown. It is the counterpart of next-free-byte for
> the pointer storage array.

I suspected as such. But the values seemed too high, but I was thinking in terms
relative to the end of bucket (small values) versus relative to start of bucket
(high value since this is a 30 block bucket).

> And that is a bit of a problem for RMS because in 99.9% of the time
> RMS did not cause a the correption but is just the messenger. Other
> systems might just croak or feed bad data to the application and never
> get the blame. But as RMS reports the problem is often gets the blame!

The ability to ring an alarm bell to warn of bad file structure is extremely
valuable. the ability to scan files to detect corruption is extremely valuable.
And this is a HUGE asset for VMS.

Had the LD driver problem happened on windows or Linux, I would have had no
tools to even draw a list of files that were zapped/damaged by some driver doing
rogue writes on the wrong physical disk drive.

Hein RMS van den Heuvel

unread,

Mar 19, 2007, 9:55:29 PM3/19/07

to

On Mar 19, 12:56 am, JF Mezei <jfmezei.spam...@vaxination.ca> wrote:
> Hein RMS van den Heuvel wrote:
>
> > This is turning into quit the lesson!
>
> And it is very appreciated. And I am sure that someone bitten by a similar
> problem later on will appreciate your having taken the time to document this for
> me and Mr Google.

That's what I am counting on.
That's why I replied mostly with general descriptions, and not for the
specific file in question (which is a well known file).

> The problem I have is to generate the list of bad blocks/buckets. This is a 50k
> block file with a few thousand bad blocks. All the standard tools such as
> ANA/RMS just choke at the first one.

That's not really that big a file.
Attached you'll find a simple skeleton Macro program which will
happily bruteforce it is way through an indexed file lookign for valid
data buckets, a block at a time untill it hits an other one.
As written it will attempt to extract records, but that only works for
Prologue-1 uncompressed files.
So you'll have to modify it to perhaps just print a list of valid
bucket starts.
Or modify it to update the NEXT pointer in the last known good bucket
to point to any next good bucket.
They may be totally out of order, but a CONVERT/SORT will take care of
that.
The program takes liberties which may not be valid for the file.
For example it assumes that FAB$B_BKS = Data Bucket Size. This is
likely but not certain.
It could also readily be improved to read large chunks and use
pointers to more forwards, but why worry about 50,000 IOs between
friends !?

> And I also want an idea of exactly how much have lost from the file and record
> which key ranges were lost. I have a 1 year old backup and may then see if any
> of the missing records can be sourced from the old file.

Good plan. If the key is unique then a simple CONV/MERG/EXC can be
used to 'backfill' lost records with ancient copies.

> But for a large file, manual scanning doesn't work.

Right. You'll need to code up a tool, or send me money.

> partial reconstruction can be made from the email file as
> well as the private docdb records which have pointers to the corrupted file and
> a few copies of the fields.

That's often a good way to approach broken files.
External reference may well be able to generate a list of key (sic)
values.

> So I can then safely modify the nxtbkt fields to skip over totally empty buckets then.

Ayup.

> Out of curiosity, is this documented somewhere regular humans have access to ?

It'll be in my book :-).
It was in the old RMS traning 'Digital' used to offer.
I've done several Decus presentations covering it.
In front of me I have paper copy of a 1988 document by Kirby McCoy
which was going to be a Part II in the VMS Guide to File Internals
covering RMS On Disk structures. I have never found a machine readable
version for that :-(.
I do have a machine readable (.TXT :-) version of a Spring'85 DECUS
presentation Jim Teague (ex-Isam developer) made. It stick that in the
next reply.

> The ability to ring an alarm bell to warn of bad file structure is extremely
> valuable. the ability to scan files to detect corruption is extremely valuable.
> And this is a HUGE asset for VMS.

Right.
Hein.

-< INDEXFILE_EXTRACT.MAR. Mostly template. Hope it works some
still >-
--------------------------------------------------------------------------------
;
; Very simplistic tool extract anything that looks like a valid
; data record form anything that looks like a valid data bucket
; from a prolog-1 indexed file.
; The output is likely to be out of sequence and needs to be
sorted
; The output may well contain duplicates and garbage but gets it
; right more often then not. At least that is what I recall
because
; the last time I needed is was way back when in 1985 or so.
;
; Have fun, Hein van den Heuvel, 1985
;

FAB: $FAB FAC = <BIO,BRO,GET>, - ;Allow block I/O read AND write
FNA = FILENAME_BUF, - ;Address of filename string
SHR = UPI
RAB: $RAB FAB = FAB, - ;Associated FAB
ROP = <BIO>, - ;block I/O Processing
UBF = BUF ;Input buffer

OUTFAB: $FAB ALQ = 1000, - ;Initial allocation 1,000 blocks
DEQ = 1000, - ;Default extension 1,000 blocks
FAC = <PUT>, - ;Put access
FOP = <CBT,TEF>, - ;Contiguous best try, truncate at EOF
ORG = SEQ, - ;Sequential organization
RAT = CR, - ;Record attributes - Carriage Return
RFM = VAR, - ;Variable length records
FNA = FILENAME_BUF ;Address of filename string

;Output file Record Attributes Block

OUTRAB: $RAB FAB = OUTFAB,- ;FAB pointer
RAC = SEQ ;Sequential record access

.ENTRY START, ^M<>
PUSHAL FILENAME_SIZ
PUSHAQ FILENAME_PROMPT
PUSHAQ FILENAME
CALLS #3, G^LIB$GET_INPUT
MOVB FILENAME_SIZ, FAB+FAB$B_FNS ;Insert the filename size
$OPEN FAB=FAB ;Open the input file
BLBC R0, BYE ;See you later!
MOVZBL FAB+FAB$B_BKS, R10 ;Pick up bucket size
ASHL #9, R10, R11 ;Multiply by 512
MOVW R11, RAB+RAB$W_USZ ;Set up size of read
$CONNECT RAB=RAB ;Connect
BLBC R0, BYE ;See you later!

MOVAL FILENAME_BUF,R0 ;Point to input file name
MOVZBL FILENAME_SIZ,R1 ;Get its length
20$: CMPB (R0)+,#^A/./ ;Is it a period
BEQL 10$ ;Yes
DECL R1 ;No reduce the counter...
BGTR 20$ ;...and continue

10$: MOVL #^A/SEQ /,(R0) ;Stick in the new file type
SUBL2 #4,R1 ;Count the new characters
SUBL2 R1,FILENAME_SIZ ;Adjust the string lenght
MOVB FILENAME_SIZ,OUTFAB+FAB$B_FNS ;Insert the length into the FAB
MOVW FAB+FAB$W_MRS,OUTFAB+FAB$W_MRS ;Set maximum record size

$CREATE FAB=OUTFAB ;Open the sequential output file
BLBC R0, BYE ;See you later!
$CONNECT RAB = OUTRAB ;Connect the record stream to it
BLBC R0, BYE ;See you later!
CLRL R9 ;Valid data bucket counter
CLRL R8 ;Valid record counter
CLRL R7 ;Valid byte counter
CLRL RAB+RAB$L_BKT ;Init
BLBS R0, MAIN_LOOP ;Go for it!
BYE: RET

MAIN_LOOP:
ADDL2 R10, RAB+RAB$L_BKT ;Next Block RAB
10$: $READ RAB=RAB ;Read the bucket
BLBS R0, 20$
PUSHAL ENDOF_ERROR
CMPL R0,#RMS$_EOF
BNEQ 15$
BRW DONE
15$: PUSHAL READ_ERROR
BRW GIVE_ERROR
20$:
CMPW BUF+2, RAB+RAB$L_BKT ;Sample OK?
BNEQ 90$
CMPB BUF, BUF-1(R11) ;Checkbyte OK?
BNEQ 90$
CMPW R11, BUF+4 ;Next avaiable reasonable?
BGTRU 21$
90$:
INCL RAB+RAB$L_BKT ;Next Block RAB
BRW 10$

;
; Valid bucket!
;
21$: TSTB BUF+12 ;Data level?
BNEQ MAIN_LOOP
INCL R9 ;Count a valid data bucket
MOVL #14, R5 ;Point to first record
30$: CMPB #02, BUF(R5) ;Valid data record?
BNEQ 40$ ;No, branch
INCL R8 ;Count a valid record
BITL #8191, R8 ;Multiple of 8192?
BNEQ 35$
JSB STAT
35$: MOVZWL BUF+9(R5), R2 ;Get number of bytes
ADDL2 #2, R5 ;Adjust for record length
MOVAB BUF+9(R5), OUTRAB+RAB$L_RBF ;Point to the record.
MOVW R2, OUTRAB+RAB$W_RSZ ;Adjust the record size in the RAB
ADDL2 R2, R7 ;Count the bytes!
ADDL2 R2, R5 ;Build next record pointer
$PUT RAB=OUTRAB ;Write the record
BLBS R0,40$
PUSHAQ WRITE_ERROR
CALLS #1, G^LIB$PUT_OUTPUT
pushl #ss$_debug
calls #1, g^lib$signal
nop
nop
40$: ADDL2 #9, R5 ;Point to next record.
CMPW R5, BUF+4 ;In used range?
BLSS 30$ ;Ok, go for next record!
BRW MAIN_LOOP ;Ok, go for next bucket!

DONE: $CLOSE FAB=OUTFAB ;Close the file.
$CLOSE FAB=FAB ;Close the file.
JSB STAT
$EXIT_S

STAT: DIVL3 R8, R7, R6 ;Average number of bytes.
PUSHL R6
PUSHL R7
PUSHL R8
PUSHL R9
MOVL #FAO_OUTBUF_L, FAO_OUTBUF_D ;init size
PUSHAL FAO_OUTBUF_D ;3
PUSHAL FAO_OUTBUF_D ;2
PUSHAL FAO_CTRSTR_D ;1
CALLS #7, G^SYS$FAO
PUSHAL FAO_OUTBUF_D
CALLS #1, g^LIB$PUT_OUTPUT
RSB

GIVE_ERROR:
CALLS #1, G^LIB$PUT_OUTPUT
BRW MAIN_LOOP

WRITE_ERROR: .ASCID "Error writing record"
READ_ERROR: .ASCID "Error reading VBN"
ENDOF_ERROR: .ASCID "Beyond EOF"
FILENAME_PROMPT:.ASCID "Please enter filename:"
FILENAME: .LONG 80,FILENAME_BUF ;input buffer descriptor
FILENAME_SIZ: .WORD 0 ;Receives length of filename
FILENAME_BUF: .BLKB 80
FAO_CTRSTR_A: .ASCII "Total count of valid data BUCKETS : !UL!/"
.ASCII "Total count of valid data RECORDS : !UL!/"
.ASCII "!UL Bytes of data, average record length: !UL"
FAO_CTRSTR_L = . - FAO_CTRSTR_A
FAO_CTRSTR_D: .LONG FAO_CTRSTR_L, FAO_CTRSTR_A
FAO_OUTBUF_L = 200
FAO_OUTBUF_A: .BLKB FAO_OUTBUF_L
FAO_OUTBUF_D: .LONG FAO_OUTBUF_L, FAO_OUTBUF_A
BUF:: .BLKB 512*64
.END START

Hein RMS van den Heuvel

unread,

Mar 19, 2007, 9:58:47 PM3/19/07

to

On Mar 19, 12:56 am, JF Mezei <jfmezei.spam...@vaxination.ca> wrote:
> Out of curiosity, is this documented somewhere regular humans have access to ?

See Jim Teague Spring '85 Decus presenation below.

Hein.

SPRING 1985 U.S. DECUS
----------------------

Overall RMS ISAM File Structure
-------------------------------

Prolog - The prolog contains important file-wide information and is
always in VBN 1. The most important information it
contains is the size of the data buckets, the VBN where
the area descriptors begin, the global buffer count, and
the prolog version number. Currently allowable prologs
are 1, 2 and 3.

Key Descriptors - There is a key descriptor for each key defined in
the
file. The first key descriptor is in VBN 1, and overlays
the prolog. Key descriptors provide information about
each key in the file such as where the key appears in
the record, number of segments, length of each segment,
etc. Things like the root VBN, root level, null
character, compression flags are also there, along with
a pointer to the next key descriptor. If there is more
than one key, then the second key descriptor begins in
VBN 2.

Area Descriptors - These descriptors begin in first VBN after the last
VBN to contain a key descriptor, and contain information
about the areas of the file.

Index and Data Buckets - The index and data buckets appear after the
area
descriptors. RMS ISAM files have their index and data
buckets in a B-tree arrangement. The root (or top)
index bucket contains a high key value and a pointer for
each bucket below it in the structure. Buckets at that
level contain similar keys and pointers to buckets at the
next lower level. At the bottom level (level 0, or the
data level) appear the data records. Records at the primary
index data level contain the actual data bytes of the
records in the file. Records at the secondary index data
level (SIDRs) contain secondary key values and pointers
to primary index data records with the corresponding
alternate key value. Bucket levels are numbered from
0 (at the data, or bottom level) upwards to the root level.

Prolog Structure
----------------

+---------------------------------------------------------------------------
+
/ unused (11
bytes) /
+------------------
+ +
! PLG
$B_DBKTSIZ ! !
+------------------
+--------------------------------------------------------+
!
unused !
+--------------------------------------------------------
+------------------+
/ unused (85 bytes) ! PLG
$B_FLAGS !
+------------------+------------------+
+------------------+
! PLG$B_AMAX ! PLG
$B_AVBN ! /
+------------------+------------------
+-------------------------------------+
! unused ! PLG
$W_DVBN !
+-------------------------------------
+-------------------------------------+
! PLG
$L_MRN !
+---------------------------------------------------------------------------
+
! PLG
$L_EOF !
+-------------------------------------
+-------------------------------------+
! PLG$W_GBC ! PLG
$W_VER_NO !
+-------------------------------------
+-------------------------------------+

* Note that the prolog structure overlays the key descriptor for the
primary key

* PLG$B_FLAGS, PLG$L_MRN, and PLG$L_EOF are only used in relative
files

* PLG$B_AVBN - VBN of first area descriptor

* PLG$B_AMAX - maximum number of areas

* PLG$W_DVBN - first data bucket VBN

* PLG$W_VER_NO - prolog version number

* PLG$W_GBC - default global buffer count

Key Descriptor
---------------

+---------------------------------------------------------------------------
+
! KEY
$L_IDXFL !
+------------------+------------------
+-------------------------------------+
! KEY$B_LANUM ! KEY$B_IANUM ! KEY
$W_NOFF !
+------------------+------------------+------------------
+------------------+
! KEY$B_DATBKTSZ ! KEY$B_IDXBKTSZ ! KEY$B_ROOTLEV ! KEY
$B_DANUM !
+------------------+------------------+------------------
+------------------+
! KEY
$L_ROOTVBN !
+------------------+------------------+------------------
+------------------+
! KEY$B_NULLCHAR ! KEY$B_SEGMENTS ! KEY$B_DATATYPE ! KEY
$B_FLAGS !
+------------------+------------------+------------------
+------------------+
! KEY$W_MINRECSZ ! KEY$B_KEYREF ! KEY
$B_KEYSZ !
+-------------------------------------+------------------
+------------------+
! KEY$W_DATFILL ! KEY
$W_IDXFILL !
+-------------------------------------
+-------------------------------------+
! KEY$W_POSITION1 ! KEY
$W_POSITION !
+-------------------------------------
+-------------------------------------+
! KEY$W_POSITION3 ! KEY
$W_POSITION2 !
+-------------------------------------
+-------------------------------------+
! KEY$W_POSITION5 ! KEY
$W_POSITION4 !
+-------------------------------------
+-------------------------------------+
! KEY$W_POSITION7 ! KEY
$W_POSITION6 !
+------------------+------------------+------------------
+------------------+
! KEY$B_SIZE3 ! KEY$B_SIZE2 ! KEY$B_SIZE1 ! KEY
$B_SIZE !
+------------------+------------------+------------------
+------------------+
! KEY$B_SIZE7 ! KEY$B_SIZE6 ! KEY$B_SIZE5 ! KEY
$B_SIZE4 !
+------------------+------------------+------------------
+------------------+
/ KEY$T_KEYNAM (32
bytes) /
+
+
! !
+---------------------------------------------------------------------------
+
! KEY
$L_LDVBN !
+------------------+------------------+------------------
+------------------+
! KEY$B_TYPE3 ! KEY$B_TYPE2 ! KEY$B_TYPE1 ! KEY
$B_TYPE !
+------------------+------------------+------------------
+------------------+
! KEY$B_TYPE7 ! KEY$B_TYPE6 ! KEY$B_TYPE5 ! KEY
$B_TYPE4 !
+------------------+------------------+------------------
+------------------+

Key Descriptor (continued)
--------------

* KEY$L_IDXFL - VBN for next key descriptor

* KEY$W_NOFF - Offset to next key descriptor

* KEY$B_IANUM - index area number

* KEY$B_LANUM - level 1 index area number

* KEY$B_DANUM - data level area number

* KEY$B_ROOTLEV - Root level: height of index tree

* KEY$B_IDXBKTSZ - index bucket size

* KEY$B_DATBKTSZ - data bucket size

* KEY$L_ROOTVBN - VBN of root bucket

* KEY$B_FLAGS - duplicates (bit 0), change key (1), null key (2),
index compression (3), index uninitialized (4), key compression (6),
record compression (7)

* KEY$B_DATATYPE - data type for key

* KEY$B_SEGMENTS - number of segments in key

* KEY$B_NULLCHAR - null character if specified

* KEY$B_KEYSZ - key size

* KEY$B_KEYREF - key of reference

* KEY$W_MINRECSIZ - minimum record size

* KEY$W_xxxFILL - index and data fill values

* KEY$W_POSITIONx, KEY$B_SIZEx - beginning positions and sizes
of up to 8 key segments

* KEY$T_KEYNAM - key name (ASCII counted string)

* KEY$L_LDVBN - first data bucket VBN

Area Descriptor
----------------

+------------------+------------------+------------------
+------------------+
! AREA$B_ARBKTSZ ! AREA$B_AREAID ! AREA$B_FLAGS !
unused !
+------------------+------------------+------------------
+------------------+
! AREA$B_AOP ! AREA$B_ALN ! AREA
$W_VOLUME !
+------------------+------------------
+-------------------------------------+
! AREA
$L_AVAIL !
+---------------------------------------------------------------------------
+
! AREA
$L_CVBN !
+---------------------------------------------------------------------------
+
! AREA
$L_CNBLK !
+---------------------------------------------------------------------------
+
! AREA
$L_USED !
+---------------------------------------------------------------------------
+
! AREA
$L_NXTVBN !
+---------------------------------------------------------------------------
+
! AREA
$L_NXT !
+---------------------------------------------------------------------------
+
! AREA
$L_NXBLK !
+-------------------------------------
+-------------------------------------+
! unused ! AREA
$W_DEQ !
+-------------------------------------
+-------------------------------------+
! AREA
$L_LOC !
+---------------------------------------------------------------------------
+
! AREA
$W_RFI !
+-------------------------------------
+ +
<---- AREA
$L_TOTAL_ALLOC ! !
+-------------------------------------
+-------------------------------------+
! ! AREA$L_TOTAL_ALLOC
<----
+
+-------------------------------------+
!
unused !
+-------------------------------------
+ +
! AREA
$W_CHECK ! !
+-------------------------------------
+-------------------------------------+

Area Descriptor (continued)
---------------

* AREA$B_FLAGS - not used

* AREA$B_AREAID - area number

* AREA$B_ARBKTSZ - bucket size for area

* AREA$W_VOLUME - relative volume number

* AREA$B_ALN - extend allocation alignment

* AREA$B_AOP - alignment options: absolute alignment/hard (bit 0),
locate on cylinder (1), contiguous best try (5), contiguous (7)

* AREA$L_AVAIL - reclaimed bucket chain

* AREA$L_CVBN - starting VBN for current extent

* AREA$L_CNBLK - number of blocks in current extent

* AREA$L_USED - number of blocks used

* AREA$L_NXTVBN - next VBN to use

* AREA$L_NXT - starting VBN for next extent

* AREA$L_NXBLK - number of blocks in next extent

* AREA$W_DEQ - default extend quantity

* AREA$L_LOC - start LBN on volume

* AREA$W_RFI - related file ID (6 bytes)

* AREA$L_TOTAL_ALLOC - total block allocation

* AREA$W_CHECK - checksum

Prologue 3 Data Bucket Structure
--------------------------------

(Note that picture runs right to left)

+-------------------------------------+------------------
+------------------+
! BKT$W_ADRSAMPLE ! BKT$B_INDEXNO ! BKT
$B_CHECKCHAR !
+-------------------------------------+------------------
+------------------+
! BKT$W_NXTRECID ! BKT
$W_FREESPACE !
+-------------------------------------
+-------------------------------------+
! BKT
$L_NXTBKT !
+-------------------------------------+------------------
+------------------+
<---- data records ! BKT$B_BKTCB ! BKT
$B_LEVEL !
+------------------
+------------------+

* BKT$B_CHECKCHAR - This first byte of the bucket should be identical
to
the last byte of the bucket. Both are incremented every time the
bucket is modified. If the bucket check bytes are out of phase,
RMS will complain about a bucket format check error: what this
usually indicates is that something has interrupted the writing
of all blocks in a bucket.

* BKT$B_INDEXNO - The index number: 0 for primary; 1, 2, ... for
alternates.

* BKT$W_ADRSAMPLE - The low order word of the bucket VBN.

* BKT$W_FREESPACE - The first byte of unused space in the bucket.

* BKT$W_NEXTRECID - Next available record id.

* BKT$L_NXTBKT - Horizontal link to next bucket.

* BKT$B_LEVEL - Level in the index structure.

* BKT$B_BKTCB - Control byte. Can indicate, among other things, that
this
is the last bucket in the structure at this level.

Prologue 3 Data Record Structure (with key compression)
-------------------------------------------------------

(Note that picture runs right to left)

+---...----------------------------------------------------------------
+
| key + | cnt | len | record | RRV | record |
ctrl |
| data | | | length | VBN | id | id |
byte |

+---...----------------------------------------------------------------
+

size: byte byte word 6 bytes word byte

* The first key in each bucket is always fully expanded (but may be
repeating character truncated, however).

* The high order 6 bits of the record control byte indicate that
the record is deleted (bit 2), or is an RRV (bit 3).
ANALYZE/RMS will display the state and position of these
bits. The low order two bits are practically meaningless:
a typical non-deleted record that is not an RRV will have a
control byte of hex 02.

* Data records are assigned a record id to uniquely identify them
within the data bucket. These ids are assigned in the order
of insertion, and may have nothing to do with the physical
order of records within the data bucket. RRVs are in
essence "forwarding addresses" of records that are useful
only after the record has been displaced by a bucket split.
If a record has never been moved by a bucket split, then
its RRV points to itself.

* Prolog 3 compression features imply something that may not be
obvious about record lengths: even with fixed-length records,
if there is data or key compression, then there must be a
record length, since the length can vary based on the amount
of compression.

* "len" and "cnt" are key compression fields. "len" is the length
of the key (not including the "cnt" byte). "cnt" is the
count of front bytes, based on the previous key. Repeating
characters at the end are truncated. There is an example
given below.

* Prolog 3 data records ALWAYS have the key at the front (even if
there is no key compression). If the key field is in the
middle of the record, it is still extracted and placed at
the front for performance reasons (of course, it is inserted
at the proper point before the record is returned to the user.

Example of key compression using 6-byte string keys (see
explanation of "len" and "cnt" given above):

(Example runs right to left)

Second key in bucket has First key in bucket has
value "ABCDFF" value "ABCDEF"

key cnt len key cnt len
...data... 46 04 01... ...data... 464544434241 00 06 ...

Note here that with the second key fully expanded based on the
preceding key, we only come up with 5 characters because there has
been rear end truncation of repeating characters. We manufacture
enough bytes of the last character (D, or hex 46) and append them to
make the key 6 bytes long.

Data Record Compression
-----------------------

The compression algorithm is different for data records -- it is
strictly a repeating character truncation process. The data portion
of the record begins immediately after the key. Note that RMS will
not do repeating character truncation unless there are at least 5
repeating characters, to make sure that the extra overhead will not
negate the savings.

There are two fields associated with data record compression: a word
field which points to the compressed character, and a byte field
that
tells how many repetitions of the character were truncated. The
general format of the record is: pointer word, data segment,
truncation count byte; pointer word, data segment, truncation count
byte, etc. An example of data record compression is given below.
Each set of {word, data, byte} is termed a compression segment.

A record with a data portion that looks like:

"ABBBBBBCDDDDDDDDDDD" (A, 7 Bs, C, 11 Ds)

will compress to two segments:

(Example runs right to left)

count data pointer count data pointer
byte word byte word overhead

0A 4443 0002 06 4241 0002 ...

Index Bucket Structure
----------------------

Index buckets look identical, except that the "index #" byte is not
necessarily 0, and neither is the "index level" byte. They of
course
reflect the index number and the level in the index structure.

Index levels are numbered with the root level being the highest
level,
and data levels always being level 0. Note that there is a data
level
for alternate index structures as well that consists of a key and an
RRV pointer. The RRV pointer points to the primary data record with
that secondary key value.

RMS saves index bucket space for all prologs by minimizing the size
of
the field needed to represent a bucket's VBN pointer. For prolog 3,
all VBN pointers in a particular index bucket are the same size,
maximized to the size necessary to represent the largest pointer in
the bucket. Bits 3 and 4 of the bucket control byte indicate the
pointer size for the bucket: 00 means two-byte pointers, 01 means
three-byte pointers, and 10 means four-byte pointers.

Note that if there is no index compression, RMS will do a binary
search
through index buckets for the requested key value. This of course
includes binary and integer keys. This is why prolog 3 keeps all
VBN
pointers in a given index bucket the same size.

Index compression is done exactly like key compression.

Prolog 3 index records are split into two parts, the key and the VBN
pointer. The keys are at the beginning of the index bucket, and the
VBN pointers are at the end of the index bucket. (A little silly,
but it's too late now.)

Secondary Index Data Records (SIDRs)
------------------------------------

"Data level" records of alternate indexes are called "SIDRs". A
SIDR
consists of a size word, followed by a key value and one or more
RRVs
with control fields. The control field indicates whether or not
the
record is deleted. If data key compression is enabled for this
index,
then the key will be compressed, otherwise not. The following
illustrates the layout of a SIDR record.

(Examples run right to left)

With key compression:

+--...------------------------------------------------------------------
+
| ... | RRV2 | ctl | RRV1 | ctl | key | cnt | len |
size |

+--...------------------------------------------------------------------
+

Without key compression:

+--...-----------------------------------------------------+
| ... | RRV2 | ctl | RRV1 | ctl | key | size |
+--...-----------------------------------------------------+

o Record Operations (assumes no bucket splits)

- $PUT

1. Initialization/validation (if sequential access, is

key value of new record greater than that of last

record, etc.)

2. Position to point of insert (involves positioning

through the index structure by key, and leaves data

bucket locked)

3. Adjust "high set" appropriately

4. Build record overhead fields in bucket; move in

record itself

5. Lock new record

6. Update primary index (if necessary)

7. Unlock bucket

8. Insert alternate keys (if any) (extracted from user

buffer)

- $DELETE (assumes previous $GET/$FIND)

+ V4 $DELETE

1. Initialization/validation (is there a current

record, etc.)

2. Position by RFA to record (leaves bucket locked)

3. SAVE RECORD IN INTERNAL BUFFER

4. Delete the RRV (if any)

5. Delete the primary record itself

6. Unlock bucket

7. Delete all alternate keys, plucking values from

saved record

+ V3 $DELETE

1. Initialization/validation

2. Position by RFA to record (leaves bucket locked)

3. Delete RRV (if any)

4. Delete all alternate keys -- NOTE BUCKET IS STILL

LOCKED at this point (*)

5. Delete primary data record

6. Unlock bucket

- $UPDATE (assumes previous $GET/$FIND)

1. Initialization/validation

2. Position by RFA to record (leaves bucket locked)

3. If alternate keys will change, then:

1. Save old record

2. Unlock data bucket

3. Insert new SIDR entries

4. Reposition by RFA to record (leaves bucket locked

again)

4. Is new record size less than or equal to old size?

+ YES (smaller or same as old record)

1. Adjust high set appropriately

2. Insert record

+ NO (larger than old record)

1. Save record ID

2. Perform "pseudo-$DELETE"

3. Perform "pseudo-$PUT" (stuffing saved record

ID) (*)

5. Unlock bucket

6. Delete old SIDR entries (if any) using old record

buffer

o Bucket Splits (or How to Complicate Matters by a Few Orders

of Magnitude) (assumes old bucket is already locked)

1. Lock area

2. Allocate new bucket

3. Unlock area

4. Format new bucket

5. Set new bucket's next pointer to old bucket's next

pointer

6. Set old bucket's next pointer to the new bucket

7. Move data into new bucket

8. Write out new bucket

9. Scan old bucket for records past the split point that

have RRVs, and keep in a table.

10. Update free space in old bucket and unlock it

11. Update RRVs in table to point to new location of records.

This involves multiple positionings by RRV -- one for

each RRV to be updated.

Note that SIDRs are not updated! SIDR entries may point

to an RRV, which in turn points to the real record.

Because of the RRV updating process however, this level

of indirection never goes beyond one.

o Performance Issues

- Bucket Size versus Record Size

+ Larger data buckets yield fewer index buckets, which

results in fewer DIOs, but longer search times (CPU)

at the data level

+ Smaller data buckets yield more index buckets, which

results in more DIOs, but shorter search times at the

data level

+ Keep in mind: Prolog 3 performs binary searches in

index buckets IFF there is no index compression.

Index bucket search times are greatly reduced, so the

major consideration for CPU usage is data level

searches.

+ What are you willing to trade? If you don't have

memory to burn, then the trade is more significant.

If you DO have lots of memory, you can have the best

of both worlds:

- Index Caching and Global Buffers. If you can use global

buffers to cache the entire index structure, then

EVERYBODY WINS! If you cache the entire index structure

locally (multibuffer count), then the process wins at the

expense of other processes (using more memory). Really

better only in the nonshared case.

Note that this argument for caching lots of the index

structure falls apart for sequential access, where a

small number of buffers is plenty (2).

- Compression Considerations. Certain data record formats

do NOT lend themselves to compression. Consider the case

of a file created at the beginning of a year. The data

records in this file consist of twelve blank subfields,

with data inserted into one subfield each month

throughout the year. OUCH!

Hein RMS van den Heuvel

unread,

Mar 21, 2007, 2:01:45 PM3/21/07

to

On Mar 19, 12:56 am, JF Mezei <jfmezei.spam...@vaxination.ca> wrote:

> The problem I have is to generate the list of bad blocks/buckets. This is a 50k
> block file with a few thousand bad blocks. All the standard tools such as
> ANA/RMS just choke at the first one.

Our Dutch VMS friend Fekko Stubbe has a very elaborate RMS File tool:
DIX
See: http://www.oooovms.dyndns.org/dix/

Relatively recently he build in ANALYZE RMS support.
Even more recently he build in a DECwindows interface to RMS indexed
file internal structure.
That might just be the ticket for problems like this!
Best I can tell that part is not released yet, but may be avaiable on
request already

Hein.

Hein RMS van den Heuvel

unread,

Mar 22, 2007, 1:31:58 PM3/22/07

to

Talking to myself again...

But I only just realized that with OpenVMS 8.2+ we got PATCH/ABSOLUTE
back from VAXland
That makes my old (1991) standby automatic bulk patcher a little bit
viable again.

http://h71000.www7.hp.com/freeware/freeware60/rms_tools/bonus/indexed_file_patch.bas

I needed to update the source a little for new XAB/XABKEY defs in
BASIC:
bruteforced that with the following changes:
38 %INCLUDE "$XABKEYDEF" %FROM %LIBRARY "SYS$LIBRARY:BASIC
$STARLET.TLB"
53 MAP (XABKEY) XABDEF XAB
54 MAP (XABKEY) XABKEYDEF XABKEY
70 FIRST_VBN = XABKEY::XAB$L_DVB UNLESS FIRST_VBN !Use
first data VBN?
201 DBS = XABKEY::XAB$B_DBS !Pick up Data Bucket
size.

For trial:
$ copy sys$library:vms$password_dictionary.data pass.tmp
$ mcr sys$login:block_copy null.tmp pass.tmp 50 100
$ mcr sys$login:block_copy null.tmp pass.tmp 50 200
$ run indexed_file_patch
File name? pass.tmp
Start VBN?
Number of buckets between status lines? 10
10 Buckets, 2 Reads. Current VBN = 111
*** VBN: 111 Address sample 0 ! Last valid VBN 99
20 Buckets, 3 Reads. Current VBN = 121
30 Buckets, 3 Reads. Current VBN = 131
40 Buckets, 3 Reads. Current VBN = 141
50 Buckets, 3 Reads. Current VBN = 151
Valid bucket 159 after 48 tries.
60 Buckets, 4 Reads. Current VBN = 183
*** VBN: 207 Address sample 0 ! Last valid VBN 195
70 Buckets, 4 Reads. Current VBN = 215
80 Buckets, 5 Reads. Current VBN = 225
90 Buckets, 5 Reads. Current VBN = 235
100 Buckets, 5 Reads. Current VBN = 245
110 Buckets, 5 Reads. Current VBN = 255
Valid bucket 255 after 48 tries.
120 Buckets, 7 Reads. Current VBN = 375
130 Buckets, 9 Reads. Current VBN = 495
140 Buckets, 11 Reads. Current VBN = 615
150 Buckets, 13 Reads. Current VBN = 735
160 Buckets, 15 Reads. Current VBN = 855
170 Buckets, 17 Reads. Current VBN = 975
180 Buckets, 19 Reads. Current VBN = 1095
190 Buckets, 21 Reads. Current VBN = 1215
200 Buckets, 23 Reads. Current VBN = 1335
*** There were 206 Data buckets in the file.
24 Reads done. Last bucket VBNs are: 1395 1407
*** Template patch command file PATCH.COM generated.
$ type patch.com
$PATCH/NONEW/ABSOLUTE pass.tmp
DEPOSIT ^D99 - 1 * 200 + 8 = ^D159 ! Last valid -> Next valid VBN
DEPOSIT ^D195 - 1 * 200 + 8 = ^D255 ! Last valid -> Next valid VBN
UPDATE
$ @patch
:

Now this did not work perfectly, notably as the bad data I put in
there was all zeroes, and tus the check byte for bucket 99 matched,
but the data was hosed.

Also, it just finds the next adjacent bucket, which after bucket
splits (and reclaims) may get you into a loop.
Still, it's a start if you need something desperately!

Hein.

JF Mezei

unread,

Mar 22, 2007, 5:39:26 PM3/22/07

to

Hein RMS van den Heuvel wrote:

> But I only just realized that with OpenVMS 8.2+ we got PATCH/ABSOLUTE
> back from VAXland

What happened with global buffers at 8.3 ?

RMS FILE ATTRIBUTES

File Organization: indexed
Record Format: variable
Record Attributes: carriage-return
Maximum Record Size: 2000
Blocks Allocated: 42528, Default Extend Size: 2010
Bucket Size: 30
File Monitoring: disabled
Global Buffer Count pre-V8.3: 15
Global Buffer Count post-V8.3: 0
Global Buffer Flags post-V8.3: none

I take it this would have to do with the XFC disk caching thing ? What are the
commands to manipulate the post 8.3 global buffer counts ?

Also, your explanations and my digging into the rms file structure has explained
a lot about behaviour when you delete records etc.

I do have a couple of questions though:

Does an RMS file have a bitmap of available clusters ? I haven't encountered
mention of such a beast (yet).

Say the second index has 26 entries with keys AZ BZ CZ DZ up to 0xFFFF
So it points to 26 data buckets each containing records.

Say you delete all B* records. The data bucket pointed to by the BZ entry in the
index is now empty. Does this mean that the "BZ" entry in the index gets
deleted too and the next time you insert a B* record, it gets added in the data
bucket pointed to by the CZ index entry ?

And what happens in the case of a CONVERT/RECLAIM ?
(this seems to be a very quick operation, so it obviously doesn't shuffle much
stuff in the file)

Ryan Moore

unread,

Mar 22, 2007, 8:10:29 PM3/22/07

to JF Mezei

On Thu, 22 Mar 2007, JF Mezei wrote:
> What happened with global buffers at 8.3 ?
>
> RMS FILE ATTRIBUTES
>
> File Organization: indexed
> Record Format: variable
> Record Attributes: carriage-return
> Maximum Record Size: 2000
> Blocks Allocated: 42528, Default Extend Size: 2010
> Bucket Size: 30
> File Monitoring: disabled
> Global Buffer Count pre-V8.3: 15
> Global Buffer Count post-V8.3: 0
> Global Buffer Flags post-V8.3: none
>
> I take it this would have to do with the XFC disk caching thing ? What are
> the commands to manipulate the post 8.3 global buffer counts ?

The 8.3 Release Notes or New Features document mentioned that Global
Buffers can now be in P2 space. This allows for a lot more global buffers
if they are needed.

The commands to modify the values are in the documents.

It doesn't have anything to do with XFC. XFC doesn't know anything about
RMS. XFC operates at the file system level and lower.

-Ryan

Hein RMS van den Heuvel

unread,

Mar 22, 2007, 8:16:38 PM3/22/07

to

On Mar 22, 5:39 pm, JF Mezei <jfmezei.spam...@vaxination.ca> wrote:

> What happened with global buffers at 8.3 ?

RTFM!

http://h71000.www7.hp.com/doc/83FINAL/6679/6679pro_009.html#xab1

> I take it this would have to do with the XFC disk caching thing ?

Absolutely not. Consider yourself spanked with a wet noodle for
dissapointing the teacher.

> What are the commands to manipulate the post 8.3 global buffer counts ?

And an other spanking.
RTFH!
$HELP SET FILE /GLOBAL
Also... NEW FDL SYNTAX:
FILE section
GLBUFF_CNT_V83 <number>
GLBUFF_FLAGS_V83 none

> Does an RMS file have a bitmap of available clusters ? I haven't encountered
> mention of such a beast (yet).

No, just the on disk AREA descriptor, which closely matches an XABALL,
but not exactly.
It points to the next free VBN to use.
ANAL/RMS/INT... DOWN; DOWN; DOWN AREA;

> The data bucket pointed to by the BZ entry in the
> index is now empty. Does this mean that the "BZ" entry in the index gets
> deleted too

No it stays. That bucket which held the B records remain permanently
associated with key BZ.

>and the next time you insert a B* record, it gets added in the data
> bucket pointed to by the CZ index entry ?

No, back into that BZ bucket.

> And what happens in the case of a CONVERT/RECLAIM ?

That's when it recursively expungens out emptyness.
In your example
- the next pointer in the A bucket would be made to point to C,
- the BZ index down pointer would be removed,
- the B bucket would be hooked up into the AREA free block list
- The B bucket would get a little text signature, to tell the world
where it has been
(I coded that up for fun after a problem with convert/reclaim :^)
- If removing that BZ index record empties that index bucket it is
removed recursively.

> (this seems to be a very quick operation, so it obviously doesn't shuffle much
> stuff in the file)

It reads every single data bucket so it can be slow for large files.
For some cases you may want to restrict it to a specific set of keys.
(notably for keys where records start with a given status code,
transition ($UPDATE) to a different intermediate code and then end
their live with some final code.)
CONVER/RECLAIM has a /STAT key to monitor whether it did anything
useful.

Cheers,
Hein.