Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: help need in reading a file using COBOL

407 views
Skip to first unread message

Warren M

unread,
Jun 27, 2012, 7:45:11 AM6/27/12
to
On Jun 27, 5:31 am, Tort <tort.karthike...@gmail.com> wrote:
> Hi - I have a input file (Entry sequenced) in below format.
>
> D12345678901234567890123456789
> D12345678901234567890123456789
> D12345678901234567890123456789
> D12345678901234567890123456789
> D12345678901234567890123456789
> D12345678901234567890123456789
> D12345678901234567890123456789
> D12345678901234567890123456789
> D12345678901234567890123456789
> D12345678901234567890123456789
> D12345678901234567890123456789
> D12345678901234567890123456789
> T0000012
>
> The input file doesnt have any key(and may contains duplicate). There will be one or more "D" (records starting with "D") records
> and only one "T" records. "T" records provides the number of"D" records.
>
> I need to delete the "T" record first, add few more "D" records and finally write the "T" record
> with updated count.
>
> I dont need to bother about the existing "D" records. it will be untouched.
>
> The COBOL program reads each record one by one until it finds the "T" record. Then adds "D" records
> and "T" record.
>
> The issue is, the input file has more than 25 Million "D" records. The programs read the file for "T"
> for more than 4-5 hours and after which the processing is happening.
>
> i would appreciate if one can suggest me a best possible way to reduce the time taken to read. is there a
> possible way to read the "T" record directly?
>
> Thanks

Use a Guardian open of the file, read backwards from EOF. The first
record you read will be the T record. Delete it. Guardian close the
file, then do your COBOL open, append your new D records and write the
new T record.

IMO, trailer records are a bad idea in a structured file.

Keith Dick

unread,
Jun 27, 2012, 8:50:47 AM6/27/12
to
You cannot delete records from an entry-sequenced file, as far as I remember, so if the file actually is an Enscribe entry-sequenced file, you cannot do what your description says.

The read reverse that Warren suggests could be done with regular COBOL statements, no need for using Guardian procedures directly, but that doesn't solve the problem because you cannot delete records from entry-sequenced files.

Taking 4 to 5 hours to read 25 million records seems a little slow to me, but maybe that does not indicate anything is wrong.

Some sort of redesign is needed. Without knowing the overall requirements, I cannot say what the new design should be. Possibly the simplest change to consider would be to make the file be relative rather than entry-sequenced. You can delete records from relative files, and Warren's suggestion of using read reverse probably would work in a relative file as well. But that might not be the best choice possible, depending on your overall requirements. If you want more suggestions, tell us more about the overall requirements.

dimandja

unread,
Jun 27, 2012, 9:56:51 AM6/27/12
to
Question to OP:

How does this file get on your system?

Are you creating it yourself?

Is it sent from another system?

Rather than trying to figure out how to deal with this (possibly poorly designed) file, see if you can use a different file design at the time this file is created.

wbreidbach

unread,
Jun 27, 2012, 10:11:35 AM6/27/12
to
A very simple proposal to speed up processing:
1. change the file to "buffered" using fup
2. create a second file with the same attributes.
3. add "reserve 3 areas" to your file-control paragraph and add the second file with "reserve 3 areas" as well
4. open the first file as input and the second one as output. Now read the first and write the second until you reach your T-record.
5. Write the new D-records and the updated T-record.
Using the reserve clause enables Fast-I-O and usually this will speed up processing dramatically. We experinced factors of 50 and more. Unfortunately this can only be used with input or output not with I-O.
Another way would be to use reserve 2 which only enables buffered i/o this might work with opening a file as I-O.

wbreidbach

unread,
Jun 27, 2012, 10:23:23 AM6/27/12
to
Just a little add-on.
Files with headers and trailers are still very common for interchanging data. So I do not think there is any chance to change the file itself.

Keith: I do not remember any COBOL function that positions the actual record pointer to EOF something you would need to do a read reverse (My COBOL is a bit rusty). So you would need to use the Guardian procedures for that. Indeed there is no way to delete a record from a sequential file, the only thing that could be done is a rewrite with a length of 0, but the record would still be there. So the only chance would be to replace the T-record by the first new D-record.

I would use the way of my previous post, the program could delete the old file and rename the new one at the end of processing.

Keith Dick

unread,
Jun 27, 2012, 11:45:24 AM6/27/12
to
Perhaps I am wrong about positioning to the end. A quick look at the START statement description indicates it can position to the end for relative and key-sequenced files, but apparently not for entry-sequenced files, unless I overlooked the part that allows that to be done. READ REVERSED should work for reading backwards, if you can get positioned to the end.

Tort

unread,
Jun 29, 2012, 5:13:43 AM6/29/12
to
Thanks everyone for your suggestions!

Keith:
currently the program reads the file, find the "T" record using READ NEXT, moves the data to temp variable(for the old count), without deleting the "T" record, the program adds few more "D" records plus writes one more "T" record with new count(Totally 2 "T" rec). This program is executed yearly once.

1) input file(with fixed rec length) is received from a different system (Mainframe)
2) add few more data to the file and FTP it to a different system.

Requirement: the downstream system needs only one "T" record + increase the performance of the program. Currently the program runs for 7~8 hrs.

Program needs to perform the following.
a) read the input file
b) find the "T" record, move it to a temporary variable and delete it
c) add few more "D" records
d) calculate the new total no of "D" records ( old + new "D" rec)
e) add "T" records with new count.

I can create a relative file and move the data to it from input file before executing the program.

Thanks again.

wbreidbach

unread,
Jun 29, 2012, 6:03:48 AM6/29/12
to
Just as posted previously: The long runtime is caused by record-oriented access, each record is retrieved separately. Switching to block-oriented processing should speed up the processing significantly, just changing the file itself to "buffered" should have a noticable effect. Changing to strictly sequential processing as required by "reserve 3 areas" should speed up the processing dramatically, we experienced changes in runtime from a couple of hours to just a few minutes.
It is still not clear to me, how you delete that record, my suspicion is that you are already copying the file to a new one and adding the records to the new one.
Use of a relative file would require copying all the data 2 times: Sequential to relative and relative to sequential again. And even with relative file there is no real delete! You can do a rewrite with a length of 0 bytes but the record is still there.

Keith Dick

unread,
Jun 29, 2012, 3:02:17 PM6/29/12
to
I agree with Wolfgang. From what you have told us, this program probably should be written as a traditional COBOL tape processing program would have been written, copying an input file to an output file while doing some processing of the records. (The file's design seems like it could be a holdover from long ago and may even have originated in a tape-oriented environment.) The program probably should have two input files (old file and the file containing the records to be added) and an output file. It probably should have a processing loop that reads an input record, writes it to the output if it is a D record, possibly accumulates totals or other data from the D records, and skips the record if it is a T record (possibly after verifying the totals in the record are correct). At EOF on the input file, it should read the new D records from the second input file and write them to the output file (probably also doing the totals accumulation and such), and at EOF on the sec
ond input file, write the new T record to the output file, close files, and stop.

Setting the buffered option on the output file (via FUP or equivalent) can easily speed up writing a file without making any changes to the program, but my experience is that it does not do much to speed up reading a file. The discprocess usually prefetches and keeps blocks in cache to satisfy reads, but unless the buffered attribute is set on the file, the discprocess must write the changed block to disc for every record written. Adding the buffered option eliminates the write to disc for every record written, which will speed up processing at the cost that if a processor failure occurs, not all of the data is guaranteed to be on disk, so recovery requires rerunning from the beginning, not trying to restart in the middle. That seems not to be a barrier in this case.

Simply using buffered mode still requires a message per record between the application and the discprocess, both for reading and writing.

The RESERVE 2 AREAS, when it can be used, greatly reduces the number of messages needed for reading by moving the records between the application and discprocess in large blocks. As far as I know, RESERVE 2 AREAS does not do any better for writing than creating the file with buffered mode does (but I believe it will turn on buffered mode for you, so you do not have to remember to do that when you create the file). It still requires one message per record to the discprocess when writing.

RESERVE 3 AREAS provides even better writing performance because it blocks the records in the application process and only sends the large blocks to the discprocess. I thought that RESERVE 3 AREAS performs the same as RESERVE 2 AREAS when reading a file, but the wording in the manual implies that RESERVE 3 AREAS improves read performance over that of RESERVE 2 AREAS, so perhaps it does something more than sequential block buffering when reading. In any case, RESERVE 3 AREAS cannot hurt read performance, and may improve it further.

To summarize, I agree with Wolfgang: Your program probably should be written to read the input file and write a new output file, using RESERVE 3 AREAS for both of those files. Since it seems that the third file containing the new D records to be added is small, it probably does not matter whether you use RESERVE 3 AREAS for it, but it certainly would not hurt to do so, except perhaps make the program use a little more main memory.

The main drawback to this approach would be that twice as much disk storage is required, since it produces a new copy of the data file.

My earlier suggestion to make the input file be a relative file could enable you to avoid making a copy of the data file by just deleting the old T record and appending the few new D records and a new T record to the existing file. It also could run much quicker if it were acceptable not to read all of the old D records, but rely on the data in the old T record to be correct and calculate the new T record by adding just the information from the new D records to the values from the old T record. This assumes that the file's home is on the NonStop system. However, you now tell us that the file is received from another system. Unless you can arrange to load that file onto the NonStop system as a relative file rather than as an entry-sequenced file, the extra pass over the data required to copy from an entry-sequenced file to a relative file negates a lot of the advantages of using the relative file organization. There certainly are ways to load an externally-sourced file a
s a relative file, so don't rule out using a relative file without thinking about it, but if you cannot arrange to load directly to a relative file in your case, the traditional input, process, output approach using entry-sequenced files probably would be best.

Depending on how you are introducing the externally-sourced file into the NonStop system, there might be an opportunity to improve the speed of that operation, too. If you want to investigate that, tell us some details about the steps taken to get the file onto the NonStop system, and we might be able to suggest improvements to that.
0 new messages