Using sys$get in asynchronous way.

Sergejus Zabinskis

unread,

Apr 6, 2018, 8:51:24 AM4/6/18

to

Hi all,

I have simple application that converts fixed length indexed files (data tables) to comma-separated files (CSV files).
This application is used to exchanged data between different systems - we export data to CSV files, transfer them to other servers and import them.
Original RMS files are not very small - could contain tens of millions records. So, the goal is to make application run faster.

I am using very primitive way:
Open source file for reading.
Open destination file for writing.
Configure RAB: rab$v_rrl = 1; rab$v_nlk = 1; rab$v_nql = 1; rab$b_rac = RAB$C_SEQ;
Scan file sequentionally:
while( not EOF ) {
sys$get;
copy current record from rab$l_ubf to my buffer
convert( output, buffer )
write output to sequential file
}

Convertation procedure is optimized enough and I don't see ways how to make it better.
I am thinking about I/O optimizations.
For output file I use rab$v_wbh=1; fab$v_dfw=1; Also I set rab$b_mbc=127 and rab$b_mbf=2;
Also I reuse preallocated output file rab$v_tpt=1 if it already exists (this saves some number of direct I/O).
Now, I am looking for other ways to achieve better performance, and I see that there is possibility to make asynchronous calls to RMS services.
My idea is to use the time when RMS is executing sys$get service for convertation, so main loop will look like that:

rab$v_asy = 0; // synchronous execution
sys$get
rab$v_asy = 1; // change to asynchronous execution
while( !eof )
{
copy current record from rab$l_ubf to my buffer
sys$get // now it is asynchronous - we don't wait here generally (but sys$get may also execute synchronously)
convert( output, buffer )
write output to sequential file
// wait for sys$get completion
if( status == RMS$_SYNCH )
{
// sys$get completed (in synchronous way)
}
else if( status == RMS$_PENDING )
{
// sys$get is still executing
sys$wait
}
}

I did many tests, comparing 'synchronous' and 'asynchronous' applications,
but I don't see any performance improvements. I see that mainly sys$get with v_asy=1 is executed synchronously (status=RMS$_SYNCH).
I think that the reason is that reading one record sequentially is too less for getting benefit of asynhronous read.
I would like to ask experts what they think about that ?

Kind regards
Sergejus

Snowshoe

unread,

Apr 6, 2018, 10:21:22 AM4/6/18

to

If I recall, SYS$GET and SYS$PUT can take two AST parameters, one for
success, one for failure. If you are careful, you can fire off a chain
of ASTs where the SYS$GET success AST fires off the next SYS$GET, does
the conversion and the SYS$PUT. If you have gobs of memory you can put
the data in a ramdisk, too.

DaveFroble

unread,

Apr 6, 2018, 12:29:59 PM4/6/18

to

I'm not sure that what you're doing can be faster, that would be the first
question. Second question is if there is a bottleneck, just where and what is it.

For your read I/Os, you can control the size of the reads, RMS buffer size,
multiple buffers, and such. Doubt global buffers would be of help. I seem to
recall something about read ahead. I don't do much with RMS so could be
confused with that last item.

--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: da...@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486

Stephen Hoffman

unread,

Apr 6, 2018, 7:07:24 PM4/6/18

to

On 2018-04-06 12:51:21 +0000, Sergejus Zabinskis said:

> I have simple application that converts fixed length indexed files
> (data tables) to comma-separated files (CSV files).
> This application is used to exchanged data between different systems -
> we export data to CSV files, transfer them to other servers and import
> them.
> Original RMS files are not very small - could contain tens of millions
> records. So, the goal is to make application run faster.

> ...

> I would like to ask experts what they think about that ?

Enabling read-ahead (read caching) and write-ahead (write caching) will
more likely help performance. Increases to multi-block and
multi-buffer settings are probably helpful, too.

If it's not already in use, faster storage will often help. SSDs are
massively faster than HDDs. Tossing memory and faster hardware at the
box, particularly if it's an older server configuration.

Asynch I/O probably won't help I/O performance all that much, as it'll
usually only defer reaching the underlying performance bottleneck.

Running a CONVERT on the indexed file might help too, as active and old
indexed files tend to build up a lot of cruft as records are deleted.
ANALYZE /FDL /etc and EDIT /FDL /NOINTERACTIVE can also be used to tune
the input index file, too.
Sometimes process quota increases are warranted.

Some related reading — there's a whole lot more around, too — includes
the following:
http://h41379.www4.hpe.com/wizard/wiz_8409.html
http://h41379.www4.hpe.com/wizard/wiz_7557.html
http://h30266.www3.hpe.com/odl/i64os/opsys/vmsos84/4506/4506pro_026.html#index_x_1484

http://h30266.www3.hpe.com/odl/i64os/opsys/vmsos84/4506/4506pro_022.html#apps_buffering_techniques

http://neilrieck.net/docs/openvms_notes_rms_file_tuning.html

--
Pure Personal Opinion | HoffmanLabs LLC

Hein RMS van den Heuvel

unread,

Apr 7, 2018, 12:57:45 PM4/7/18

to

Missing information

- OpenVMS version (RMS Global Buffer optimization)
- Shared access on source file?
- Bucket size for source file?
- Current bottleneck? CPU? ( % cpu ) source, target ?
-- Use a test to compare full program to a program just reading. And...?
- Threading - IF the process is largely CPU bound, you may be able to use kernel threading to use more than 1 core, to independently READ, PROCESS, WRITE.

HWM - high Water Marking - Should not bother simple, unshared, sequential writes. Still, run a test with SET VOLUME/NOHIGH to be sure this is not in play.

NQL - Great! Avoids any RECORD locking. Replaces/overwrites RRL + NLK

RAH - Unfortunately RMS does NOT have Read-AHead for Indexed files. It was on my list of things todo, but never did. RMS knows the (current) next buck number as soon as it starts working on the contents of the current bucket and could relatively easily trigger a read for that bucket.

XFC - The XFC cache can do read-ahead which can help a bunch, notably for small buckets.

WBH - Write-BeHind Works great for your sequential file output. May want to give it 4 buffers just in case, but normally 2 buffers will give 95+% of all possible gain. Would also be nice to have for indexed files, but not for this use.

RTL - Cobol, Sort, and the C-RTL all found that even with WBH, RMS spends too much (CPU) time writing simple sequential files, notably going in an out of EXEC mode, probing data structures and buffers as required. They all implemented native, usermode only, sequential file writes libraries.
You may want to use the CRTL - WRITE functions instead of RMS, or just roll your own perhaps using SYS$WRITE in (Async mode) to avoid dealing with file extends and such.

TPT - Excellent usage for repeat usage, avoiding the file re-allocation overhead (as well as file header and Directory entry creation)
Be sure to check HWM

DFW - Requires sharing. Only useful for indexed files really, and poorly implemented due to lack of WBH, once all buffers are 'dirty' the programs have to wait to WRITE one to free, and READ the next to process. 2 IOs for every buffer filled!

BKS - Bucket size - You cannot set that on an existing file. But you may be able to request to convert it to be bigger the next times.

While reading RMS should be 100% CPU bound to deliver record while from the same bucket. Once the last record in a bucket is read, the next read will involve physical IO. The more records you can stuff in a bucket, the fewer IOs/record.

GBC - Global Buffer Count. Requires locking, if you can read the source file in exclusive mode, that is normally a big gain right there!
Now if you have to open shared anyway, then global buffer may help, but for a surprise reason. Not to help with IO, but to reduce locking!
not that the program gets to choose, but the number of global buffer would hardly matter. The big gain is that with recent (less than a decade old) OpenVMS version, RMS will cache the current bucket lock if there are global buffer. Without that, for each GET, it will aquire (CVT) a bucket lock, find the records, release the bucket lock, irrespective of NQL.

ASY - Before you go there, you need to know where the hurt is. CPU or IO? Which CPU? User (actual code), Exec (RMS) or kernel (locks)
I strongly urge you to add some instrumentation using GETJPI EXECTIM, KRNLTIM, USERTIM. Silly example below.

ASY - OUTPUT - The positive effect of WBH is so good, that it is unlikely any further improvements can be obtained, other than NOT doing RMS as i wrote before.

ASY - INPUT - Only useful if there is significant processing, and high IO rates + waits to overlap with. Normally RMS is cpu bound anyway, returning RMS$_SYNCH all the time except for once per bucket. (Be sure to instrument that for debugging! displaying total PENDING/SYNC counts after the run).
Your code looks fine, overlapping current record processing with fetching next... but it will only rarely matter, as the processing will likely be microseconds and the (read-IO) wait multiple milliseconds. Maybe the process will reduce 5 or 10% wait time every 10 records or so, assuming 10+ record/bucket.

Code below.

Good luck!
Hein

#include stdio
#include jpidef

typedef struct { short len, cod; void *address; int *retlen; } item;
typedef struct { long count; void *address; } desc;

int sys$getjpiw(), sys$asctim(), sys$gettim(), lib$put_output(), sys$exit();

enum counters {KRNLTIM, EXECTIM, USERTIM, ELAPSED, DIRIO, BUFIO, PAGEFLTS, COUNTERS};
int init = 0, set_1[COUNTERS], set_2[COUNTERS], dif[COUNTERS], *old, *new;
float float_dif[ELAPSED + 1];
long long current_time, init_time;
item getjpi_items_1[] = {
sizeof (int), JPI$_KRNLTIM, &set_1[KRNLTIM], 0,
sizeof (int), JPI$_EXECTIM, &set_1[EXECTIM], 0,
sizeof (int), JPI$_USERTIM, &set_1[USERTIM], 0,
sizeof (int), JPI$_DIRIO, &set_1[DIRIO], 0,
sizeof (int), JPI$_BUFIO, &set_1[BUFIO], 0,
sizeof (int), JPI$_PAGEFLTS, &set_1[PAGEFLTS], 0,
0, 0, 0, 0};
item getjpi_items_2[] = {
sizeof (int), JPI$_KRNLTIM, &set_2[KRNLTIM], 0,
sizeof (int), JPI$_EXECTIM, &set_2[EXECTIM], 0,
sizeof (int), JPI$_USERTIM, &set_2[USERTIM], 0,
sizeof (int), JPI$_DIRIO, &set_2[DIRIO], 0,
sizeof (int), JPI$_BUFIO, &set_2[BUFIO], 0,
sizeof (int), JPI$_PAGEFLTS, &set_2[PAGEFLTS], 0,
0, 0, 0, 0};
item *getjpi_items;

char *arg, asctim[23+1];

char asctim[23+1], text[256];

struct {int len; char *addr;} asctim_desc = { sizeof asctim, asctim };

int my_init_timer() {

short iosb[4];
int status;

status = sys$gettim(&init_time);
status = sys$getjpiw ( 0, 0, 0, getjpi_items_1, iosb, 0, 0);
init = 2;
return status;
}

int my_show_timer() {
int i, status;
short iosb[4], timlen;
float force_transform;

if (!init++) {
for (i=0; i<COUNTERS; i++) {
set_1[i] = set_2[i] = dif[i] = 0;
}
status = sys$gettim(&init_time);
if (!(status & 1)) return status;
}
if (init & 1) {
getjpi_items = getjpi_items_2;
new = set_2;
old = set_1;
} else {
getjpi_items = getjpi_items_1;
new = set_1;
old = set_2;
}

status = sys$getjpiw ( 0, 0, 0, getjpi_items, iosb, 0, 0); // last status
if (!(status & 1)) return status;

status = sys$gettim(&current_time);
status = sys$asctim(&timlen, &asctim_desc, &current_time, 1 ); // time only
asctim[timlen]=0;
new[ELAPSED] = ( current_time - init_time ) / 100000; // 100 nano second to 10 millisecond

for (i=0; i<COUNTERS; i++) {
dif[i] = new[i] - old[i];
}
for (i=0; i<ELAPSED+1; i++) {
force_transform = dif[i];
float_dif[i] = force_transform / 100; // 10 millisecond to seconds
}
printf ("%s Elap %5.2f, User %5.2f, Exec %5.2f, Kern %5.2f, PAG=%d, BUF=%d, DIR=%d\n",
asctim, float_dif[ELAPSED], float_dif[USERTIM], float_dif[EXECTIM],
float_dif[KRNLTIM], dif[PAGEFLTS], dif[BUFIO], dif[DIRIO]);
return status;
}

Hein RMS van den Heuvel

unread,

Apr 7, 2018, 1:04:22 PM4/7/18

to

On Friday, April 6, 2018 at 10:21:22 AM UTC-4, Snowshoe wrote:
:

> If you have gobs of memory you can put the data in a ramdisk, too.

Yes, forgot to ask about the size.

It may be best to copy (BACKUP/IGNORE=INTERLOCK) the file to RAM and process the target in exclusive mode, avoiding all lock activity.
The copy will use LARGE, sequential IOs, versus RMS using potentially random, and definitely only bucket-size sized, IO.
Delete when done.

Is the source file regularly converted (ever convert).
That could also be a good first step, increasing bucket size if appropriate considering the day-to-day usage. Make it at least be 16 or so, and 63 if your extract program is the most important one on the system.

Hein RMS van den Heuvel

unread,

Apr 7, 2018, 1:21:40 PM4/7/18

to

On Friday, April 6, 2018 at 8:51:24 AM UTC-4, Sergejus Zabinskis wrote:
> Hi all,
>
> I have simple application that converts fixed length indexed files (data tables) to comma-separated files (CSV files).
> This application is used to exchanged data between different systems - we export data to CSV files, transfer them to other servers and import them.
> Original RMS files are not very small - could contain tens of millions records. So, the goal is to make application run faster.

Are you transferring all the data all the time?
Hourly? Daily? Weekly?
Isn't that the real core of the problem?

Maybe a CDC (Change Data Capture) solution should be considered !?

Connx (now Software AG) offers a hash-based comparison solution to find records which changed.

Attunity (disclosure: I work for Attunity) offers Replicate, which comes with the RMS-Logger for near-real-time RMS CDC.
Through a SYSTEM SERVICE INTERCEPT any and all changes (PUT, UPD, DEL) for selected RMS file (SEQ< REL, IDX) are captured and applied to target databases of your choice (SQLserver, mySql Oracle, Hadoop, Kafka, ...)

Of course Replicate also comes with a full-load feature for initial load.
It uses basic RMS to read the files, so no speed up there, but the processing will happen on the Replicate server, potentially without the need to land the data (in a CSV file) depending on the target database access options.

Fwiw, Connx can read RMS (indexed) files faster then RMS can, as can the SELECT tool from EGH ... but I'm not sure the latter product can still be bought.

btw... Way back when, When I was in the 'benchmarketing' business, when memory was restricted and when disks slow, I once optimized RMS processing by having a program 'peek ahead' telling RMS to get the next bucket (artifical RFA (VBN,1) into its global buffer.

It kept a lock on the current bucket with blocking AST. As soon as the main program came knocking, release the lock, find the next-next bucket in the current buffer, and initiate a read for that. This way the reads could stay a bucket or two ahead of the current processing.

Hein.

VAXman-

unread,

Apr 7, 2018, 1:40:24 PM4/7/18

to

In article <41c06e83-8ba5-4269...@googlegroups.com>, Hein RMS van den Heuvel <heinvand...@gmail.com> writes:
>On Friday, April 6, 2018 at 8:51:24 AM UTC-4, Sergejus Zabinskis wrote:
>> Hi all,

>>=20
>> I have simple application that converts fixed length indexed files (data =

>tables) to comma-separated files (CSV files).

>> This application is used to exchanged data between different systems - we=

> export data to CSV files, transfer them to other servers and import them.

>> Original RMS files are not very small - could contain tens of millions re=

>cords. So, the goal is to make application run faster.
>

>Are you transferring all the data all the time?=20

>Hourly? Daily? Weekly?
>Isn't that the real core of the problem?
>
>Maybe a CDC (Change Data Capture) solution should be considered !?
>

>Connx (now Software AG) offers a hash-based comparison solution to find rec=
>ords which changed.
>
>Attunity (disclosure: I work for Attunity) offers Replicate, which comes wi=
>th the RMS-Logger for near-real-time RMS CDC.=20
>Through a SYSTEM SERVICE INTERCEPT any and all changes (PUT, UPD, DEL) for =
>selected RMS file (SEQ< REL, IDX) are captured and applied to target databa=

>ses of your choice (SQLserver, mySql Oracle, Hadoop, Kafka, ...)
>
>Of course Replicate also comes with a full-load feature for initial load.

>It uses basic RMS to read the files, so no speed up there, but the processi=
>ng will happen on the Replicate server, potentially without the need to lan=

>d the data (in a CSV file) depending on the target database access options.
>

>Fwiw, Connx can read RMS (indexed) files faster then RMS can, as can the SE=
>LECT tool from EGH ... but I'm not sure the latter product can still be bou=
>ght.
>
>btw... Way back when, When I was in the 'benchmarketing' business, when mem=
>ory was restricted and when disks slow, I once optimized RMS processing by =
>having a program 'peek ahead' telling RMS to get the next bucket (artifical=
> RFA (VBN,1) into its global buffer.=20
>
>It kept a lock on the current bucket with blocking AST. As soon as the main=
> program came knocking, release the lock, find the next-next bucket in the =
>current buffer, and initiate a read for that. This way the reads could stay=

> a bucket or two ahead of the current processing.
>
>Hein.

I saw this thread this morning but I had other things to address before I
could reply to it. It's better that you did, Hein.

I would be happy to address any CDC questions Sergejus may have. If he's
reading static RMS index files to create CSV data, then CDC won't help him
much. However, if the are files in use and being modified, then CDC could
be a boon.

--
VAXman- A Bored Certified VMS Kernel Mode Hacker VAXman(at)TMESIS(dot)ORG

I speak to machines with the voice of humanity.

Sergejus Zabinskis

unread,

Apr 9, 2018, 5:58:33 PM4/9/18

to

First, thank you very much for very helpful and informative response!

I have OpenVms v8.3
Shared access for source file
Here is FDL of one of source files:

FILE
CONTIGUOUS no
FILE_MONITORING no
ORGANIZATION indexed
GLOBAL_BUFFER_COUNT 0
GLBUFF_CNT_V83 0
GLBUFF_FLAGS_V83 none

RECORD
BLOCK_SPAN yes
CARRIAGE_CONTROL none
FORMAT fixed
SIZE 658

AREA 0
ALLOCATION 4520720
BEST_TRY_CONTIGUOUS yes
BUCKET_SIZE 6
EXTENSION 65520

AREA 1
ALLOCATION 45360
BEST_TRY_CONTIGUOUS yes
BUCKET_SIZE 6
EXTENSION 11520

AREA 2
ALLOCATION 656320
BEST_TRY_CONTIGUOUS yes
BUCKET_SIZE 14
EXTENSION 65535

KEY 0
CHANGES no
DATA_AREA 0
DATA_FILL 100
DATA_KEY_COMPRESSION yes
DATA_RECORD_COMPRESSION yes
DUPLICATES no
INDEX_AREA 1
INDEX_COMPRESSION yes
INDEX_FILL 100
LEVEL1_INDEX_AREA 1
NAME ""
NULL_KEY no
PROLOG 3
SEG0_LENGTH 10
SEG0_POSITION 10
SEG1_LENGTH 2
SEG1_POSITION 20
SEG2_LENGTH 8
SEG2_POSITION 22
SEG3_LENGTH 10
SEG3_POSITION 0
TYPE dstring

KEY 1
CHANGES yes
DATA_AREA 2
DATA_FILL 100
DATA_KEY_COMPRESSION yes
DUPLICATES no
INDEX_AREA 2
INDEX_COMPRESSION yes
INDEX_FILL 100
LEVEL1_INDEX_AREA 2
NAME ""
NULL_KEY no
SEG0_LENGTH 10
SEG0_POSITION 0
SEG1_LENGTH 10
SEG1_POSITION 10
SEG2_LENGTH 2
SEG2_POSITION 20
SEG3_LENGTH 8
SEG3_POSITION 22
TYPE string

dir/full:

File organization: Indexed, Prolog: 3, Using 2 keys
In 3 areas
Shelved state: Online
Caching attribute: Writethrough
File attributes: Allocation: 5248584, Extend: 65520, Maximum bucket size: 14, Global buffer count: 0, No version limit
Contiguous best try
Record format: Fixed length 658 byte records
Record attributes: None
RMS attributes: None
Journaling enabled: None

> - Current bottleneck? CPU? ( % cpu ) source, target ?
> -- Use a test to compare full program to a program just reading. And...?

Here are some tests, done in various modes:

Input record length: 658
Number of records processed : 9800488
Number of fields : 98

------------------------------------------------------------------------------------
FIELDS DISABLED / NO WRITING

Accounting information:
Buffered I/O count: 169 Peak working set size: 7552
Direct I/O count: 739584 Peak virtual size: 175296
Page faults: 474 Mounted volumes: 0
Charged CPU time: 0 00:06:41.15 Elapsed time: 0 00:11:47.34

------------------------------------------------------------------------------------
FIELDS ENABLED / NO WRITING

Accounting information:
Buffered I/O count: 169 Peak working set size: 7552
Direct I/O count: 739550 Peak virtual size: 175296
Page faults: 474 Mounted volumes: 0
Charged CPU time: 0 00:08:38.83 Elapsed time: 0 00:13:48.75

------------------------------------------------------------------------------------

FIELDS DISABLED / WRITE /REUSE
Fields disabled - I mean here that I write some constant values for each field ("DATE" for DATETIME type field, "TEXT" for text field, "NUMBER" for numeric field), instead of processing every field.
I/O is bigger because 'real' fields usually are shorter than constant. For example, numeric field values are often 0 and I write "NUMBER" instead.
So, this test is not 'correct' in sense of data written to disk.

Accounting information:
Buffered I/O count: 175 Peak working set size: 7712
Direct I/O count: 836187 Peak virtual size: 175440
Page faults: 484 Mounted volumes: 0
Charged CPU time: 0 00:07:49.14 Elapsed time: 0 00:15:33.76

------------------------------------------------------------------------------------
FIELDS ENABLED / WRITE /REUSE - FULL PROCESSING (2 RUNS)

Accounting information:
Buffered I/O count: 175 Peak working set size: 7712
Direct I/O count: 786797 Peak virtual size: 175440
Page faults: 484 Mounted volumes: 0
Charged CPU time: 0 00:10:22.16 Elapsed time: 0 00:17:18.15

Accounting information:
Buffered I/O count: 175 Peak working set size: 7712
Direct I/O count: 786789 Peak virtual size: 175440
Page faults: 484 Mounted volumes: 0
Charged CPU time: 0 00:09:59.69 Elapsed time: 0 00:16:51.42

Accounting information:
Buffered I/O count: 175 Peak working set size: 7712
Direct I/O count: 786759 Peak virtual size: 175440
Page faults: 484 Mounted volumes: 0
Charged CPU time: 0 00:10:10.95 Elapsed time: 0 00:17:03.52

>- Threading - IF the process is largely CPU bound, you may be able to use kernel threading to use more than 1 core,
to independently READ, PROCESS, WRITE

This is very good idea! You mean - READ thread reads records and pushes them to the input queue of PROCESS thread.
PROCESS thread takes records from input queue, processes them and pushes results to WRITE thread ?
I had more primitive idea - to distribute fields processing among several threads, because for this source file we have > 90 fields.

> XFC - The XFC cache can do read-ahead which can help a bunch, notably for small buckets.

Sounds interesting!

> You may want to use the CRTL - WRITE functions instead of RMS, or just roll your own perhaps using SYS$WRITE in (Async mode) to avoid dealing with file extends and such.

I use fopen extension and fwrite for output file.

> ASY - INPUT - Only useful if there is significant processing, and high IO rates + waits to overlap with. Normally RMS is cpu bound anyway, returning RMS$_SYNCH all the time except for once per bucket. (Be sure to instrument that for debugging! displaying total PENDING/SYNC counts after the run).

Yes, I see mostly RMS$_SYNCH and rarely RMS$_PENDING.

And some information about OpenVMS hardware. It is not production server.

$ show cpu

System: XXXXX, AlphaServer GS80 6/1001

CPU ownership sets:
Active 0-7
Configure 0-7

CPU state sets:
Potential 0-7
Autostart 0-31
Powered Down None
Not Present 8-31
Hard Excluded None
Failover None
$ show memory /full
System Memory Resources on 10-APR-2018 00:54:48.93

Physical Memory Usage (pages): Total Free In Use Modified
Main Memory (8.00GB) 1048576 359976 656278 32322

Extended File Cache (Time of last reset: 5-JUL-2017 15:00:07.50)
Allocated (GBytes) 1.56 Maximum size (GBytes) 4.00
Free (GBytes) 0.00 Minimum size (GBytes) 0.00
In use (GBytes) 1.56 Percentage Read I/Os 97%
Read hit rate 97% Write hit rate 0%
Read I/O count 77869375509 Write I/O count 1867859506
Read hit count 75738692336 Write hit count 0
Reads bypassing cache 55 Writes bypassing cache 123467219
Files cached open 1065 Files cached closed 304
Vols in Full XFC mode 0 Vols in VIOC Compatible mode 44
Vols in No Caching mode 0 Vols in Perm. No Caching mode 0

Granularity Hint Regions (pages): Total Free In Use Released
Execlet code region 2048 553 1495 0
Execlet data region 1024 505 519 0
S0S1 Executive data region 3421 0 3421 0
S0S1 Resident image code region 2048 834 1214 0

Slot Usage (slots): Total Free Resident Swapped
Process Entry Slots 4000 3847 153 0
Balance Set Slots 3998 3847 151 0

Dynamic Memory Usage: Total Free In Use Largest
Nonpaged Dynamic Memory (MB) 115.14 64.20 50.94 0.62
Bus Addressable Memory (KB) 128.00 110.87 17.12 104.00
Paged Dynamic Memory (MB) 27.78 21.41 6.36 21.36
Lock Manager Dyn Memory (MB) 26.64 4.01 22.62

Buffer Object Usage (pages): In Use Peak
32-bit System Space Windows (S0/S1) 0 9
64-bit System Space Windows (S2) 537 1649
Physical pages locked by buffer objects 537 1650

Memory Reservations (pages): Group Reserved In Use Type
Total (0 bytes reserved) 0 0

Write Bitmap (WBM) Memory Summary
Local bitmap count: 0 Local bitmap memory usage (bytes) 0.00
Master bitmap count: 0 Master bitmap memory usage (bytes) 0.00

Swap File Usage (8KB pages): Index Free Size
(Swap file name not available)
1 6240 6240
(Swap file name not available)
2 624992 624992

Total size of all swap files: 631232

Paging File Usage (8KB pages): Index Free Size
(Paging file name not available)
254 529510 531248
Total committed paging file usage: 418152

Of the physical pages in use, 39165 pages are permanently allocated to OpenVMS.

Kind regards
Sergejus

Sergejus Zabinskis

unread,

Apr 9, 2018, 6:04:26 PM4/9/18

to

On Saturday, April 7, 2018 at 8:40:24 PM UTC+3, VAXman- wrote:

We convert generally non static files. Some of them change frequently, some less. But change detection is not acceptable here for some special reasons.

Kind regards
Sergejus

Hein RMS van den Heuvel

unread,

Apr 9, 2018, 11:09:27 PM4/9/18

to

On Monday, April 9, 2018 at 5:58:33 PM UTC-4, Sergejus Zabinskis wrote:
> First, thank you very much for very helpful and informative response!

Much to learn you still have... my old padawan.... This is just the beginning!

> We convert generally non static files. Some of them change frequently, some less.

I suspect the converts use a simple automatic 'optimized' from the last century. Consider reviewing and updating manually because you are likely missing out.

> But change detection is not acceptable here for some special reasons.

Most common reason is ignorance, fear of the unknown, and it-sort-of-works-lets-leave-it-alone.

If performance is at all important, please reconsider a serious, game-changing solution (perhaps 100 times faster) instead of putzing around with 10% or even 50% improvements.

> Here is FDL of one of source files:

Looks like this is a post-optimize FDL,
Some straight ANALYZE/RMS/FDL output is more valuable, giving actual used block, compression rates and record counts

Still, we can get close enough from here...

> BUCKET_SIZE 6

That's the core problem.
On you next convert, just change to 32 and you'll save 3 .. 4 minutes on the convert run.
Now that might impact production performance, then just settle for 16 and gain 2 minutes.
Currently there are about 13 record/bucket (IO), that could be made 35 or 70.

Details...

> SIZE 658
> ALLOCATION 4520720
> BUCKET_SIZE 6

> Number of records processed : 9800488
> Number of fields : 98

:
> Direct I/O count: 739584

> Charged CPU time: 0 00:06:41.15
> Elapsed time: 0 00:11:47.34

Let's assume all 4.5M allocated blocks (2.3GB) are in use, at 6 block bucket size that would take 753453 IOs to read. Pretty close to the reported 739584

We may assume the difference between Elapsed and CPU time is IO time, so that's 5 minutes or 306 second.
That would be about 8MB/sec, not likely a bottle-neck
Is is also 0.4 mS/IO which is likely the bottle-neck.
( @ 2500 IO/sec if they were back-to-back with no cpu processin
So one must make the IOs larger and have fewer of them to make improvements

> Fields disabled - I mean here that I write some constant values for each field ("DATE" for DATETIME type field, "TEXT" for text field, "NUMBER" for numeric field), instead of processing every field.
> I/O is bigger because 'real' fields usually are shorter than constant.

It does appear that there is 3 minutes or so of processing, which would be nice to be hidden behind the IO.

Instead of the SHOW ACCOUNTING, please consider simple LIB$SHOW_TIMER calls, or better still the MY_TIMER calls I offered up to understand to USER/EXEC/KERNEL CPUTIM decomposition.

> >- Threading - IF the process is largely CPU bound, you may be able to use kernel threading to use more than 1 core,
> to independently READ, PROCESS, WRITE
> This is very good idea! You mean - READ thread reads records and pushes them to the input queue of PROCESS thread.
> PROCESS thread takes records from input queue, processes them and pushes results to WRITE thread ?

Yes, perhaps a simple in memory buffer with interlocked REMQUE, INSQUE
With the bucket size of 6, you should expect 20 or so records 'in flight, maybe 100 with a larger bucket size.
You could have the reader fill super-buffers with 100 .. 1000 records to be allocated, filled, handed over.

> I had more primitive idea - to distribute fields processing among several threads, because for this source file we have > 90 fields.

Well, you are already burning 8 minutes cpu and 5 minutes waiting before you've done anything, and appear to be doing about 3 or 4 minutes of formatting.

So if you split that to 4 parallel streams then that may drop that work to say 1 minute from 4, for an overall process improvement of 10%. Whoop-ti-diddly-do! Fun project, and nice if you get paid for it, but not too practical, not really.

A minute saved for an intermediate step (still have to transfer the CSV, and apply to the database) when the actual core requirement surely is bring the target DB up to date which probably be done in less than minute.
[10M records, lets say 5% changes = 500,000 changes, apply at 2000/second = 25 seconds ]

> I use fopen extension and fwrite for output file.

I think you are in good shape, avoiding SYS$PUT.
Might want to double check (I'd use brute-force ^C ... ANALYZE/SYSTEM ... SHOW PROC/RMS=(FAB,FWA,RAB) )

Do you use a a useropen (acc=...) or just pass the rms settings?

> Yes, I see mostly RMS$_SYNCH and rarely RMS$_PENDING.

Good to hear you checked. Might want to create a 'verbose' run option outputting the actual counts for optimal insight without 'heisenberg' intervening.

> And some information about OpenVMS hardware. It is not production server.
> $ show cpu
> System: XXXXX, AlphaServer GS80 6/1001

Thanks... Nice box for non-prod. Itanium might do better though. Should try!

> $ show memory /full

> Physical Memory Usage (pages): Total Free In Use Modified
> Main Memory (8.00GB) 1048576 359976 656278 32322

Ah! More than 3 GB memory free, and XFC will to give back a Gig if need be.
So if this is a relevant example file of 2.3 GB, then you could copy to memory and run with NO sharing.
I suspect the copy will be read-only run under a minute (50 MB/sec) and a read-only test run might drop to 2 or 3 minutes, from 12.

This all brings back good memories.
Good clean Fun.

Still, admittedly not knowing the ultimate goal of the process, I suspect that if you want to make a real difference for the business, then you need to investigate CDC options. Those do not come cheap, but not to bad when compared to development 'play time' and system resources burned needlessly day in, day out.

Cheers,
Hein

Sergejus Zabinskis

unread,

Apr 10, 2018, 6:50:52 AM4/10/18

to

Again, many thanks for valuable information.

On Tuesday, April 10, 2018 at 5:09:27 AM UTC+2, Hein RMS van den Heuvel wrote:
> On Monday, April 9, 2018 at 5:58:33 PM UTC-4, Sergejus Zabinskis wrote:
> > First, thank you very much for very helpful and informative response!
>
> Much to learn you still have... my old padawan.... This is just the beginning!
>
> > We convert generally non static files. Some of them change frequently, some less.
>
> I suspect the converts use a simple automatic 'optimized' from the last century. Consider reviewing and updating manually because you are likely missing out.
>
> > But change detection is not acceptable here for some special reasons.
>
> Most common reason is ignorance, fear of the unknown, and it-sort-of-works-lets-leave-it-alone.

In this case it is not true. Implementing change detection based solution requires more programming than exporting/importing files. That means more money spent on this.
They are happy with 15-20 minutes on largest files. Before that they had much bigger times. I am working as, something like, freelance developer (this is not my full-time position) for them.
My work is very cheap comparing to products developed by big companies. And cost is more important than speed how data will appear in other systems (where CSV files travel).

>
> If performance is at all important, please reconsider a serious, game-changing solution (perhaps 100 times faster) instead of putzing around with 10% or even 50% improvements.
>
> > Here is FDL of one of source files:
>
> Looks like this is a post-optimize FDL,
> Some straight ANALYZE/RMS/FDL output is more valuable, giving actual used block, compression rates and record counts
>
> Still, we can get close enough from here...
>
> > BUCKET_SIZE 6
>
> That's the core problem.
> On you next convert, just change to 32 and you'll save 3 .. 4 minutes on the convert run.
> Now that might impact production performance, then just settle for 16 and gain 2 minutes.
> Currently there are about 13 record/bucket (IO), that could be made 35 or 70.
>
> Details...
>
> > SIZE 658
> > ALLOCATION 4520720
> > BUCKET_SIZE 6
> > Number of records processed : 9800488
> > Number of fields : 98
> :
> > Direct I/O count: 739584
> > Charged CPU time: 0 00:06:41.15
> > Elapsed time: 0 00:11:47.34
>
> Let's assume all 4.5M allocated blocks (2.3GB) are in use, at 6 block bucket size that would take 753453 IOs to read. Pretty close to the reported 739584
>

They also don't want to change anything in production system.

> We may assume the difference between Elapsed and CPU time is IO time, so that's 5 minutes or 306 second.
> That would be about 8MB/sec, not likely a bottle-neck
> Is is also 0.4 mS/IO which is likely the bottle-neck.
> ( @ 2500 IO/sec if they were back-to-back with no cpu processin
> So one must make the IOs larger and have fewer of them to make improvements
>
>
> > Fields disabled - I mean here that I write some constant values for each field ("DATE" for DATETIME type field, "TEXT" for text field, "NUMBER" for numeric field), instead of processing every field.
> > I/O is bigger because 'real' fields usually are shorter than constant.
>
> It does appear that there is 3 minutes or so of processing, which would be nice to be hidden behind the IO.
>
> Instead of the SHOW ACCOUNTING, please consider simple LIB$SHOW_TIMER calls, or better still the MY_TIMER calls I offered up to understand to USER/EXEC/KERNEL CPUTIM decomposition.
>
>
> > >- Threading - IF the process is largely CPU bound, you may be able to use kernel threading to use more than 1 core,
> > to independently READ, PROCESS, WRITE
> > This is very good idea! You mean - READ thread reads records and pushes them to the input queue of PROCESS thread.
> > PROCESS thread takes records from input queue, processes them and pushes results to WRITE thread ?
>
> Yes, perhaps a simple in memory buffer with interlocked REMQUE, INSQUE
> With the bucket size of 6, you should expect 20 or so records 'in flight, maybe 100 with a larger bucket size.
> You could have the reader fill super-buffers with 100 .. 1000 records to be allocated, filled, handed over.
>

I'll definitely implement this!

> > I had more primitive idea - to distribute fields processing among several threads, because for this source file we have > 90 fields.
>
> Well, you are already burning 8 minutes cpu and 5 minutes waiting before you've done anything, and appear to be doing about 3 or 4 minutes of formatting.
>
> So if you split that to 4 parallel streams then that may drop that work to say 1 minute from 4, for an overall process improvement of 10%. Whoop-ti-diddly-do! Fun project, and nice if you get paid for it, but not too practical, not really.

I got paid for it already. But I would like to make it faster just for myself. I don't think that company will pay me for convertation running faster by 5 minutes. It is not realtime business.

>
> A minute saved for an intermediate step (still have to transfer the CSV, and apply to the database) when the actual core requirement surely is bring the target DB up to date which probably be done in less than minute.
> [10M records, lets say 5% changes = 500,000 changes, apply at 2000/second = 25 seconds ]
>
> > I use fopen extension and fwrite for output file.
>
> I think you are in good shape, avoiding SYS$PUT.
> Might want to double check (I'd use brute-force ^C ... ANALYZE/SYSTEM ... SHOW PROC/RMS=(FAB,FWA,RAB) )
>
> Do you use a a useropen (acc=...) or just pass the rms settings?

I use rms setting in parameter list. Like that:

const char* openMode( m_reUse ? "rb+" : "w" );
const char* rop( m_reUse ? "rop=wbh,tpt" : "rop=wbh" );
fd = fopen( csvPath, openMode, "fop=dfw", "mbc=127", "mbf=2", rop );
I agree that useropen is more convenient and has more opportunities.

>
> > Yes, I see mostly RMS$_SYNCH and rarely RMS$_PENDING.
>
> Good to hear you checked. Might want to create a 'verbose' run option outputting the actual counts for optimal insight without 'heisenberg' intervening.
>
> > And some information about OpenVMS hardware. It is not production server.
> > $ show cpu
> > System: XXXXX, AlphaServer GS80 6/1001
>
> Thanks... Nice box for non-prod. Itanium might do better though. Should try!
>
>
> > $ show memory /full
> > Physical Memory Usage (pages): Total Free In Use Modified
> > Main Memory (8.00GB) 1048576 359976 656278 32322
>
> Ah! More than 3 GB memory free, and XFC will to give back a Gig if need be.
> So if this is a relevant example file of 2.3 GB, then you could copy to memory and run with NO sharing.
> I suspect the copy will be read-only run under a minute (50 MB/sec) and a read-only test run might drop to 2 or 3 minutes, from 12.
>

I have a talk with system administrator, he told that production system works much more actively and we can't assume that we will generally have enough memory for that.

> This all brings back good memories.
> Good clean Fun.
>
> Still, admittedly not knowing the ultimate goal of the process, I suspect that if you want to make a real difference for the business, then you need to investigate CDC options.
Those do not come cheap, but not to bad when compared to development 'play time' and system resources burned needlessly day in, day out.

The main keyword - "not cheap"! I think that my customer priorities are ordered in such way: Cost, Correctness of program: Count(bugs) --> min, ..., speed.

>
> Cheers,
> Hein

Kind regards
Sergejus

Hein RMS van den Heuvel

unread,

Apr 10, 2018, 8:04:45 PM4/10/18

to

On Tuesday, April 10, 2018 at 6:50:52 AM UTC-4, Sergejus Zabinskis wrote:
> Again, many thanks for valuable information.
>
> On Tuesday, April 10, 2018 at 5:09:27 AM UTC+2, Hein RMS van den Heuvel wrote:
> > On Monday, April 9, 2018 at 5:58:33 PM UTC-4, Sergejus Zabinskis wrote:
> > > First, thank you very much for very helpful and informative response!
> >
> > Much to learn you still have... my old padawan.... This is just the beginning!
> >
> > > We convert generally non static files. Some of them change frequently, some less.
> >
> > I suspect the converts use a simple automatic 'optimized' from the last century. Consider reviewing and updating manually because you are likely missing out.
> >
> > > But change detection is not acceptable here for some special reasons.
> >
> > Most common reason is ignorance, fear of the unknown, and it-sort-of-works-lets-leave-it-alone.
> In this case it is not true. Implementing change detection based solution requires more programming than exporting/importing files. That means more money spent on this.

I get that no product sale is likely. That's ok.
But i disagree with the reasoning.

A proper CDC solution, like Attunity Replicate, require ZERO programming.
Not on the source, not an the target. Select files; Describe (CDD? Cobol copy books? Basic MAP files); select target DB; hit 'GO'

The 'tricky' part, sometimes impossible part, is nasty datatypes like very strange date encodings, and dirty data like dates or number strings with just text in them. But you'd be battling with that now already, as it is.

> They are happy with 15-20 minutes on largest files. Before that they had much bigger times.

Great. Mission accomplished.

> I am working as, something like, freelance developer (this is not my full-time position) for them.
> My work is very cheap comparing to products developed by big companies. And cost is more important than speed how data will appear in other systems (where CSV files travel).

I full appreciate that position. Been there, done that, myself.

<snip>

> They also don't want to change anything in production system.

Well, if they don't want to do changes, then things will stay as they are, good or bad.

Again I'd argue though that they are reasoning out of ignorance and fear.
Things like bucket size increases and applying global buffer have zero impact on the programs and procedure, and may help by 50% where your maximum return on investment is like 10%.

<snip>

> > > >- Threading - IF the process is largely CPU bound, you may be able to use kernel threading to use more than 1 core,

<snip>

> I'll definitely implement this!

Great! It will be fun.
As I pointed out, I have my reservations about the net improvements,
It will sort of pays for itself in the learning.
It's a great feeling once you get it all going.

> I got paid for it already. But I would like to make it faster just for myself.

Been there, done that also. :-)

> fd = fopen( csvPath, openMode, "fop=dfw", "mbc=127", "mbf=2", rop );

DFW only applies to share files, and you definitely don't want that for the output. Still, it will not harm it will just be ignored.

MBF=2, typically gets 90% of the potential benefit, but why not use 4 or 5.
127 was indeed the max for 32 bit RABs. An normally bigger is better.
Some subsystem have 'sweet spots' which can help.
Like maybe transfers over 100 block get broken up. (I'm just making this up!)
The XFC breaks up (in memory) handling in 16 blocks (8KB).
So if you have time, and a reproduceable test, then be sure to try 64 blocks and 112 blocks. (128-16).

<snip>

> I think that my customer priorities are ordered in such way: Cost, Correctness of program: Count(bugs) --> min, ..., speed.

Got it.

Hein.

VAXman-

unread,

Apr 10, 2018, 9:25:19 PM4/10/18

to

In article <a1e148db-8689-4a7f...@googlegroups.com>, Hein RMS van den Heuvel <heinvand...@gmail.com> writes:
>On Tuesday, April 10, 2018 at 6:50:52 AM UTC-4, Sergejus Zabinskis wrote:
>> Again, many thanks for valuable information.
>>
>> On Tuesday, April 10, 2018 at 5:09:27 AM UTC+2, Hein RMS van den Heuvel wrote:
>> > On Monday, April 9, 2018 at 5:58:33 PM UTC-4, Sergejus Zabinskis wrote:
>> > > First, thank you very much for very helpful and informative response!
>> >
>> > Much to learn you still have... my old padawan.... This is just the beginning!
>> >
>> > > We convert generally non static files. Some of them change frequently, some less.
>> >
>> > I suspect the converts use a simple automatic 'optimized' from the last century. Consider reviewing and updating manually because you are likely missing out.
>> >
>> > > But change detection is not acceptable here for some special reasons.
>> >
>> > Most common reason is ignorance, fear of the unknown, and it-sort-of-works-lets-leave-it-alone.
>> In this case it is not true. Implementing change detection based solution requires more programming than exporting/importing files. That means more money spent on this.
>
>I get that no product sale is likely. That's ok.
>But i disagree with the reasoning.

;)

>A proper CDC solution, like Attunity Replicate, require ZERO programming.
>Not on the source, not an the target. Select files; Describe (CDD? Cobol copy books? Basic MAP files); select target DB; hit 'GO'

;)

Sloppy homegrown solutions are always better... not!