First, thank you very much for very helpful and informative response!
I have OpenVms v8.3
Shared access for source file
Here is FDL of one of source files:
FILE
CONTIGUOUS no
FILE_MONITORING no
ORGANIZATION indexed
GLOBAL_BUFFER_COUNT 0
GLBUFF_CNT_V83 0
GLBUFF_FLAGS_V83 none
RECORD
BLOCK_SPAN yes
CARRIAGE_CONTROL none
FORMAT fixed
SIZE 658
AREA 0
ALLOCATION 4520720
BEST_TRY_CONTIGUOUS yes
BUCKET_SIZE 6
EXTENSION 65520
AREA 1
ALLOCATION 45360
BEST_TRY_CONTIGUOUS yes
BUCKET_SIZE 6
EXTENSION 11520
AREA 2
ALLOCATION 656320
BEST_TRY_CONTIGUOUS yes
BUCKET_SIZE 14
EXTENSION 65535
KEY 0
CHANGES no
DATA_AREA 0
DATA_FILL 100
DATA_KEY_COMPRESSION yes
DATA_RECORD_COMPRESSION yes
DUPLICATES no
INDEX_AREA 1
INDEX_COMPRESSION yes
INDEX_FILL 100
LEVEL1_INDEX_AREA 1
NAME ""
NULL_KEY no
PROLOG 3
SEG0_LENGTH 10
SEG0_POSITION 10
SEG1_LENGTH 2
SEG1_POSITION 20
SEG2_LENGTH 8
SEG2_POSITION 22
SEG3_LENGTH 10
SEG3_POSITION 0
TYPE dstring
KEY 1
CHANGES yes
DATA_AREA 2
DATA_FILL 100
DATA_KEY_COMPRESSION yes
DUPLICATES no
INDEX_AREA 2
INDEX_COMPRESSION yes
INDEX_FILL 100
LEVEL1_INDEX_AREA 2
NAME ""
NULL_KEY no
SEG0_LENGTH 10
SEG0_POSITION 0
SEG1_LENGTH 10
SEG1_POSITION 10
SEG2_LENGTH 2
SEG2_POSITION 20
SEG3_LENGTH 8
SEG3_POSITION 22
TYPE string
dir/full:
File organization: Indexed, Prolog: 3, Using 2 keys
In 3 areas
Shelved state: Online
Caching attribute: Writethrough
File attributes: Allocation: 5248584, Extend: 65520, Maximum bucket size: 14, Global buffer count: 0, No version limit
Contiguous best try
Record format: Fixed length 658 byte records
Record attributes: None
RMS attributes: None
Journaling enabled: None
> - Current bottleneck? CPU? ( % cpu ) source, target ?
> -- Use a test to compare full program to a program just reading. And...?
Here are some tests, done in various modes:
Input record length: 658
Number of records processed : 9800488
Number of fields : 98
------------------------------------------------------------------------------------
FIELDS DISABLED / NO WRITING
Accounting information:
Buffered I/O count: 169 Peak working set size: 7552
Direct I/O count: 739584 Peak virtual size: 175296
Page faults: 474 Mounted volumes: 0
Charged CPU time: 0 00:06:41.15 Elapsed time: 0 00:11:47.34
------------------------------------------------------------------------------------
FIELDS ENABLED / NO WRITING
Accounting information:
Buffered I/O count: 169 Peak working set size: 7552
Direct I/O count: 739550 Peak virtual size: 175296
Page faults: 474 Mounted volumes: 0
Charged CPU time: 0 00:08:38.83 Elapsed time: 0 00:13:48.75
------------------------------------------------------------------------------------
FIELDS DISABLED / WRITE /REUSE
Fields disabled - I mean here that I write some constant values for each field ("DATE" for DATETIME type field, "TEXT" for text field, "NUMBER" for numeric field), instead of processing every field.
I/O is bigger because 'real' fields usually are shorter than constant. For example, numeric field values are often 0 and I write "NUMBER" instead.
So, this test is not 'correct' in sense of data written to disk.
Accounting information:
Buffered I/O count: 175 Peak working set size: 7712
Direct I/O count: 836187 Peak virtual size: 175440
Page faults: 484 Mounted volumes: 0
Charged CPU time: 0 00:07:49.14 Elapsed time: 0 00:15:33.76
------------------------------------------------------------------------------------
FIELDS ENABLED / WRITE /REUSE - FULL PROCESSING (2 RUNS)
Accounting information:
Buffered I/O count: 175 Peak working set size: 7712
Direct I/O count: 786797 Peak virtual size: 175440
Page faults: 484 Mounted volumes: 0
Charged CPU time: 0 00:10:22.16 Elapsed time: 0 00:17:18.15
Accounting information:
Buffered I/O count: 175 Peak working set size: 7712
Direct I/O count: 786789 Peak virtual size: 175440
Page faults: 484 Mounted volumes: 0
Charged CPU time: 0 00:09:59.69 Elapsed time: 0 00:16:51.42
Accounting information:
Buffered I/O count: 175 Peak working set size: 7712
Direct I/O count: 786759 Peak virtual size: 175440
Page faults: 484 Mounted volumes: 0
Charged CPU time: 0 00:10:10.95 Elapsed time: 0 00:17:03.52
>- Threading - IF the process is largely CPU bound, you may be able to use kernel threading to use more than 1 core,
to independently READ, PROCESS, WRITE
This is very good idea! You mean - READ thread reads records and pushes them to the input queue of PROCESS thread.
PROCESS thread takes records from input queue, processes them and pushes results to WRITE thread ?
I had more primitive idea - to distribute fields processing among several threads, because for this source file we have > 90 fields.
> XFC - The XFC cache can do read-ahead which can help a bunch, notably for small buckets.
Sounds interesting!
> You may want to use the CRTL - WRITE functions instead of RMS, or just roll your own perhaps using SYS$WRITE in (Async mode) to avoid dealing with file extends and such.
I use fopen extension and fwrite for output file.
> ASY - INPUT - Only useful if there is significant processing, and high IO rates + waits to overlap with. Normally RMS is cpu bound anyway, returning RMS$_SYNCH all the time except for once per bucket. (Be sure to instrument that for debugging! displaying total PENDING/SYNC counts after the run).
Yes, I see mostly RMS$_SYNCH and rarely RMS$_PENDING.
And some information about OpenVMS hardware. It is not production server.
$ show cpu
System: XXXXX, AlphaServer GS80 6/1001
CPU ownership sets:
Active 0-7
Configure 0-7
CPU state sets:
Potential 0-7
Autostart 0-31
Powered Down None
Not Present 8-31
Hard Excluded None
Failover None
$ show memory /full
System Memory Resources on 10-APR-2018 00:54:48.93
Physical Memory Usage (pages): Total Free In Use Modified
Main Memory (8.00GB) 1048576 359976 656278 32322
Extended File Cache (Time of last reset: 5-JUL-2017 15:00:07.50)
Allocated (GBytes) 1.56 Maximum size (GBytes) 4.00
Free (GBytes) 0.00 Minimum size (GBytes) 0.00
In use (GBytes) 1.56 Percentage Read I/Os 97%
Read hit rate 97% Write hit rate 0%
Read I/O count 77869375509 Write I/O count 1867859506
Read hit count 75738692336 Write hit count 0
Reads bypassing cache 55 Writes bypassing cache 123467219
Files cached open 1065 Files cached closed 304
Vols in Full XFC mode 0 Vols in VIOC Compatible mode 44
Vols in No Caching mode 0 Vols in Perm. No Caching mode 0
Granularity Hint Regions (pages): Total Free In Use Released
Execlet code region 2048 553 1495 0
Execlet data region 1024 505 519 0
S0S1 Executive data region 3421 0 3421 0
S0S1 Resident image code region 2048 834 1214 0
Slot Usage (slots): Total Free Resident Swapped
Process Entry Slots 4000 3847 153 0
Balance Set Slots 3998 3847 151 0
Dynamic Memory Usage: Total Free In Use Largest
Nonpaged Dynamic Memory (MB) 115.14 64.20 50.94 0.62
Bus Addressable Memory (KB) 128.00 110.87 17.12 104.00
Paged Dynamic Memory (MB) 27.78 21.41 6.36 21.36
Lock Manager Dyn Memory (MB) 26.64 4.01 22.62
Buffer Object Usage (pages): In Use Peak
32-bit System Space Windows (S0/S1) 0 9
64-bit System Space Windows (S2) 537 1649
Physical pages locked by buffer objects 537 1650
Memory Reservations (pages): Group Reserved In Use Type
Total (0 bytes reserved) 0 0
Write Bitmap (WBM) Memory Summary
Local bitmap count: 0 Local bitmap memory usage (bytes) 0.00
Master bitmap count: 0 Master bitmap memory usage (bytes) 0.00
Swap File Usage (8KB pages): Index Free Size
(Swap file name not available)
1 6240 6240
(Swap file name not available)
2 624992 624992
Total size of all swap files: 631232
Paging File Usage (8KB pages): Index Free Size
(Paging file name not available)
254 529510 531248
Total committed paging file usage: 418152
Of the physical pages in use, 39165 pages are permanently allocated to OpenVMS.
Kind regards
Sergejus