Scaling concurrent PHANTOM processes across multiple cores using String Database

81 views
Skip to first unread message

Robert Herbin

unread,
Feb 28, 2026, 5:33:04 PM (10 days ago) Feb 28
to Pick and MultiValue Databases
I've got a CPU-intensive batch job that has been taking longer than I would like to complete.  In an effort to speed things up, I tested splitting the work in two; half the work is done in one phantom process, the other half in a second phantom process which runs concurrently. So far the results have been encouraging, the execution time for the two combined processes was almost exactly twice as fast as a single process. Each process writes results to unique keys in a shared file so there are no record locking conflicts (there will be in excess of 100,000 results for a single combined run).

On a 2-core system I'm seeing close to a 2x speedup with 2 processes.  I theorize that this improvement would continue to scale if I had more cores. To further validate this, I ran a second test with three concurrent PHANTOM processes on the same 2-core system. Wall clock time increased by about 67% compared to the 2-process run, which is close to the theoretical expectation of 50% slower when three processes share two cores. This suggests the speedup is genuinely core-bound rather than due to some other factor, and that the optimal strategy is N processes for N cores.

A few questions:

Does this kind of parallelism continue to scale as you add more cores, or are there hidden bottlenecks I should know about?
Will concurrent writes to a shared file become a problem at higher process counts, even with no locking conflicts?
Any practical limits on the number of concurrent PHANTOMs I should be aware of?

Has anyone done something similar? Would love to hear from people with real world experience before I spend (more) money on hardware

Scott Ballinger

unread,
Feb 28, 2026, 7:29:16 PM (10 days ago) Feb 28
to Pick and MultiValue Databases
I used to manage a large 200 user D3 system. About 15 years ago we replaced the SCSI array drives with RAM disks (both Raid-10, about 1.6TB total data).

The file-save took all night, so we broke the main system into 6 accounts with Q-pointers to the files in the data2-6 accounts ( e.g. some files left in the main account, q-pointers in main to data2 files, q-pointers in main to files in data3, etc, balanced so that all accounts were all about the same size). We then ran 1 file-save and 5 account-saves simultaneously, with the result that everything finished in 2 hours.

At the time we were working with Doug Dumitru at EasyCo and it was his claim that once you moved from rotating storage to RAM disks you were limited by the CPU bandwidth, not the disk channel. That's why we landed on the 6 simultaneous saves on 6 accounts due to the 6-core Xeon system we were running at the time. Later changed to dual Xeon CPUs (+hyperthreading) but didn't bother to increase the data accounts + account-saves as it was plenty fast enough.
/Scott Ballinger


Bob Dubery

unread,
Mar 1, 2026, 2:57:10 AM (9 days ago) Mar 1
to Pick and MultiValue Databases
I did something similar some years ago. This was back when the Pentium processor was a big deal. The server had five pentiums: Four actually did processing, and the 5th distributed loads across the other four. 

The client had a job that was taking a long time to run. I thought well what if we can run many phantoms, each doing a part of the processing? Initial tests in our offices showed good results and we went ahead. In the field we quickly ran into a problem because there was a system wide limit on the number of phantom processes, and in our scenario (the client was a retail chain, and we decide to launch a phantom for each store) we overran that limit. The process was modified to check how many phantom jobs were running and then decide to launch or not. This cost us some time, but overall there was a worthwhile gain.

Another bottleneck I ran into later was that poorly structured selects can waste a lot of time. I took a select like
SELECT MyFile WITH A = 1 AND WITH B = 2 AND WITH C = 3 AND WITH D = 4 ..... AND WITH J = 11 
and analysed it. 

This was on UniVerse. First thing we established was that if A # 1 UniVerse still evaluates all the other terms and then applies the ANDs and ORs. So ordering the selection parameters to try to get an AND to fail as early as possible made no difference.

What did make a difference was that if A, C , I and E were all correlatives (I-types) looking at data on other files then you could get a good gain by splitting up into two selects
SELECT MyFile WITH B = 2 AND WITH D = 4.... AND WITH J = 11
SELECT MyFile WITH A = 1 AND WITH C = 3 .... AND WITH I = 10

Better still, you split it up again so that the correlatives for each select are concerned with just one file. The first select is quicker because there are less terms to evaluate. The subsequent selects are on smaller sets of data, so again there is a speed gain. The correlatives that use other files turned out to be costly, so you want to deal with those as late as you can thus doing the extra IO for as few records as possible.  

This made quite a bit of difference. The challenge is to figure out where in the process your bottle neck, or the narrowest bottle neck, is. 

But this was the gift that kept on giving. Further analysis of the day end processing revealed several processes that all did the same select. Now we could have one process that just did the select (factored as described above) and then wrote a select list away. All the other processes now just did GET-LIST...

Another thing I've done is to implement caching when the DBMS didn't do it. This involved dynamic arrays of record keys and LOWER()erd records. Before doing a read you check the first array and if the key is there you get the corresponding record from the second array and do a RAISE(). If you don't find the key then you place the key in the array, read the record and place LOWER(record) into the corresponding position in the 2nd array.

Again this was after analysis to prove that the number of keys was significantly lower than the number of reads. 

Analysis is the key to all of these: figuring out where the worst bottle neck is and addressing that so as you get the biggest bang for your programming buck.

Wol

unread,
Mar 1, 2026, 4:23:39 AM (9 days ago) Mar 1
to mvd...@googlegroups.com
That was my gut reaction - even if i/o isn't the problem, select lists
easily can be.

You mentioned TRANS's (I presume) into other files. Does the data in the
other file change? If not, you can presumably index it in your local
file. (That's dangerous in general - updating the other file can corrupt
the index.) There's a function (GETINDEX) in the SDME contrib area that
turns an index in a foreign file into a field in the local file.

How big are your files compared to spare RAM? Can the OS cache everything?

UniVerse (and I presume the rest of them) had operators for merging
select lists, so again as Bob suggests, do a bunch of sub-selects, and
then just merge the lists with AND or OR.

I guess what Bob and me are both saying is, i/o can be cpu-heavy, so
it's still very much worth investigating - finding the data is always
likely to be the most expensive part.

Cheers,
Wol

Robert Herbin

unread,
Mar 1, 2026, 11:04:55 AM (9 days ago) Mar 1
to Pick and MultiValue Databases

Thanks to all three of you for the helpful responses.

In my specific use case, selection is simple — I process every record in a file, so I pre-divide all record keys into N subsets using round-robin logic and QSELECT each subset  in the  corresponding PHANTOM process. This is working well.

The application reads small records, performs heavy CPU-bound processing on each record independently (millions of nested iterations per record), then writes a small result record to a separate file. Thousands of writes total, millions of iterations. As of current testing, record selection and disk I/O are not bottlenecks — the processing is overwhelmingly CPU-bound.

Bob — your point about the system-wide PHANTOM process limit is noted and appreciated. Since I'm planning to match the number of PHANTOMs to the number of cores (likely 4 on my next test system), I don't anticipate hitting that limit, but it's good to know it exists.

My next step is testing on a 4-core system to see if the 2x speedup I observed on 2 cores continues to scale linearly.

bdeck...@gmail.com

unread,
Mar 1, 2026, 11:11:29 AM (9 days ago) Mar 1
to mvd...@googlegroups.com

Depending on the hardware architecture you’re running on, you will eventually run into limits related to NUMA (the intel model).  This is where the underlaying system needs to communicate with other process’ memory over a slower mechanism.  Won’t go into NUMA here, but there is plenty to read about this especially if you are not pushing max IO for your storage but seem to be seeing performance degrade as you add cores, threads, or memory. 

 

Bruce Decker

--
You received this message because you are subscribed to
the "Pick and MultiValue Databases" group.
To post, email to: mvd...@googlegroups.com
To unsubscribe, email to: mvdbms+un...@googlegroups.com
For more options, visit http://groups.google.com/group/mvdbms
---
You received this message because you are subscribed to the Google Groups "Pick and MultiValue Databases" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mvdbms+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/mvdbms/4d07e518-7141-4154-af96-21c0a8c0b483n%40googlegroups.com.

Bob Dubery

unread,
Mar 2, 2026, 8:38:08 AM (8 days ago) Mar 2
to Pick and MultiValue Databases
On Sunday, 1 March 2026 at 18:04:55 UTC+2 Robert Herbin wrote:

tem-wide PHANTOM process limit is noted and appreciated. Since I'm planning to match the number of PHANTOMs to the number of cores (likely 4 on my next test system), I don't anticipate hitting that limit, but it's good to know it exists.

I believe it is also configurable, maybe somebody with experience in administering UV systems can comment. But no matter what the limit is set to, you can always end up trying to start limit + 1 phantoms, so best to have another way of controlling.
 

Bob Dubery

unread,
Mar 2, 2026, 8:45:53 AM (8 days ago) Mar 2
to Pick and MultiValue Databases
On Sunday, 1 March 2026 at 11:23:39 UTC+2 Wol wrote:


I guess what Bob and me are both saying is, i/o can be cpu-heavy, so
it's still very much worth investigating - finding the data is always
likely to be the most expensive part.



Yep. Even today with the much faster storage devices we have now. It's still the first thing I look at when asked to improve speed of some such process.

 
Reply all
Reply to author
Forward
0 new messages