Background: I'm using APL+Win version 15.1.01 under Windows 10 Professional. I'm writing various assembly language routines to be called via #CALL from APL+Win. I'm assembling using MASM, and am including the .486 and .XMM directives so that I can use SIMD extension instructions, in particular SSE 4.2 instructions.
In some of my routines I'm returning an explicit result which is a vector of integers. The length of the result could be anything from length 0 (empty vector) to whatever APLs vector length limits are. The length of the result depends, of course, on the processing of the particular arguments being passed in.
I'm using ISS (Interpreter Support Service) 3 (Create Variable) to create the initial empty vector result. The problem is that there is no (documented) ISS to catenate to an existing variable. For now, at least, I only need to catenate one integer at a time as I go. So what I am currently doing is creating TWO result variables within the ASM routine, and alternating between them. When I need to catenate another element, I recreate the "other" result variable (ISS 3 erases the old version automatically) with length 1+ the current result variable, copy the data elements of the current result variable to the "other" result variable, then finally insert the new value at the end. So each time I need to catenate ONE more integer value, I have to completely recreate the result variable.
This is obviously very inefficient, and I'm wondering if anyone here has come up with a better way of doing this sort of catenation within the ASM routine.
Ideas I've already thought of include:
1. Using ISS 6 (make unique copy of variable) to make a copy of the current result variable. The problem is you can't specify a longer length. And even if you could, it would still be almost as inefficient.
2. Just having one result variable and keep recreating it but with an ever-increasing length. The problem is that this destroys all the existing values in the data area because it first erases the current variable.
3. Running the routine with two passes, the first pass to determine the total number of elements that will be needed in the result, then the second pass to create the result variable and fill it in. This of course has its own inefficiency, namely having to process everything twice.
4. Pre-allocate a temporary "longest possible" result variable, fill it in, then copy it to a final result variable of the proper length copying only the actual result values. Since "longest possible" could theoretically mean a vector of length around 2*31 elements, this is impractical and grossly inefficient.
5. Attempting to "fool" APL+Win and manipulate the m-entry's length, number of elements, shape word and data area under the covers. I would be able to do that to a certain point, but it requires relocating the m-entry when something else is "in the way" of it getting any larger, which I do not know how to do, and is highly dependent on the internals of APL+Win which can and do change.
6. Using undocumented ISS routines that do what I need, which of course may or may not exist.
APL+Win itself obviously "knows" how to do efficient catenation. In particular, with in-place catenation (for example: a{<-}1 2 3 & a,{<-}4), APL+Win is obviously smart enough to do the catenation in place, if possible.
So: Does anyone have any experience with this? Thanks. / Rav