problems making a ccall to a varargs routine (v1.37.20)

105 views
Skip to first unread message

Eric Mandel

unread,
Sep 6, 2017, 3:33:01 PM9/6/17
to emscripten-discuss
In v1.37.20, I don't seem to be able to use Module.ccall to invoke a varargs routine, even if I specify the exact number of arguments for the given invocation. Should this work?

Consider a cvals() routine that just prints out the double precision varargs until the stop marker is found (or until we have gone too far):
int nmax = 4;
int cvals(double a, ...){
  int i=0;
  double b;
  va_list args;
  // declared value
  va_start(args, a);
  fprintf(stdout, "in cvals: [declared: %f]\n", a);
  while( 1 ){
    // get next double precision value
    b = va_arg(args, double);
    // stop if we reached the end marker or go beyond max
    if( b < 0 ){
      fprintf(stdout, "  found end marker\n");
      break;
    } else if( i > nmax ){
      fprintf(stdout, "  went past max args (BAD): %d\n", nmax);
      break;
    } else {
      fprintf(stdout, "  vararg %d: %f\n", i, b);
      i++;
    }
  }
  va_end(args);
  return i;
}
 
The expected result when calling this directly in C, e.g.:

cval(100.0, 1.01, 2.02, 3.03, -1.0)

is: 
in cvals: [declared: 100.000000]
  vararg 0: 1.010000
  vararg 1: 2.020000
  vararg 2: 3.030000
  found end marker
Using Module.ccall with a specific number of args gives a bogus result:

Module.ccall("cvals", "null", ["number", "number", "number", "number", "number"], [100.0, 1.01, 2.02, 3.03, -1.0])
 in cvals: [declared: 100.000000]
   vararg 0: 0.000000
   vararg 1: 0.000000
   vararg 2: 0.000000
   vararg 3: 0.000000
   vararg 4: 0.000000
   went past max args (BAD): 4

I see from a previous post (Binding varargs function, 9/15/13) that cwrap does not (yet) support varargs. But should ccall with a specific number of args work? If not, are there any suggested work-arounds, short of re-coding the C varargs routine?

Thanks!

Eric


Alon Zakai

unread,
Sep 6, 2017, 9:11:19 PM9/6/17
to emscripten-discuss
Yeah, ccall doesn't currently support C varargs methods. Under the hood, the ABI we use is to pass a pointer to the location of the arguments in memory, so the function actually has 1 argument (that pointer). ccall isn't aware of this, so it'll just pass the first argument you provide it as that pointer, so nothing works.

As a workaround, you can write C wrapper functions for various fixed numbers of arguments, something like that.

If someone wants to try, btw, it should be possible to add varargs support to ccall. Basically if we tell ccall the target is varargs (we'd need to add a way to do that) then it should allocate some stack space, write the arguments, and call the method with a pointer to those arguments.

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-discuss+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jukka Jylänki

unread,
Sep 7, 2017, 3:56:51 AM9/7/17
to emscripte...@googlegroups.com
Marked down https://github.com/kripken/emscripten/issues/5563 to
remember this for later. If you want to try to implement it yourself,
or a workaround, check out this snippet:
https://github.com/juj/emscripten/blob/multithreading/src/library_html5.js#L209,
which calls a vararg function from JS side.
>> email to emscripten-disc...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "emscripten-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to emscripten-disc...@googlegroups.com.

Eric Mandel

unread,
Sep 7, 2017, 9:28:50 AM9/7/17
to emscripten-discuss
Thanks for creating an issue. I may try to pick at a solution in my so-called free time, in which case we can continue this on GH/5563.

Perhaps a varargs-styled syntax extension like this would be natural:

Module.ccall("cvals", "null", ["number", "..."], [100.0, 1.01, 2.02, 3.03, -1.0])

where "..." is only valid in  the last "type position", and varargs types after the explicitly typed arg(s) must be either "string" or "number", determined by the implicit type of the arg itself. Though it looks like this might cause complications in cwrap ...

I might be misunderstanding your code snippet: it looks more like optional args (known number, but not all of them necessarily present) instead of varargs (unknown number of args determined either by a terminating marker or with the number of args itself passed in the call). Our varargs routines are used to specify (for example) an annulus with an arbitrary number (tens or even hundreds) of successive radii. If I've got your example wrong, please let me know.

Thanks again,

Eric

Jukka Jylänki

unread,
Sep 8, 2017, 10:19:04 AM9/8/17
to emscripte...@googlegroups.com
The code there is calling a varargs function, but only with a fixed
number of arguments. The same method applies though for calling an
arbitrary number of args, one will then use a for loop to populate the
parameters.

I think we'd want to avoid using an ellipsis string "..." as an
identifier, but preferably use some other kind of method to identify,
perhaps just the presence of a secondary array would denote varargs.
One thing that is important though is that there will need to be a
field that identifies the signature of the varargs, because it needs
to be possible to call both integer and float signatures. Non-default
C conversions can also have integers and floats of different size, so
we'll need to have a way to be forward compatible to those as well,
even if we did not implement them right away.

Eric Mandel

unread,
Sep 8, 2017, 10:51:45 AM9/8/17
to emscripte...@googlegroups.com
I’ve written some exploratory code, using a cheap implementation of printf as the compiled C routine, just to see what is involved. Packing the varargs arguments into stack space is ugly … perhaps that is what your code does … but once it's done:

Module.ccall("miniprintf", "null", ["string", "..."], ["%s %f %s %f\n", "foo", 1.234, "goo", 2])
foo 1.23399999999999 goo 2

… although the ellipsis problem rears its ugly head immediately:

Module.ccall("miniprintf", "null", ["string", "..."], ["%s %f %s %d\n", "foo", 1.234, "goo", 2])
foo 1.23399999999999 goo 0

So, yes, we will need to integrate the signature into the varargs identifier. I think you are suggesting something like this:

Module.ccall("miniprintf", "null", ["string", “[f,s,i]"], ["%s %f %s %d\n", "foo", 1.234, "goo", 2])

which looks promising.
Message has been deleted

Eric Mandel

unread,
Sep 8, 2017, 4:06:33 PM9/8/17
to emscripten-discuss
I attach a working replacement of ccall (renamed to ccall_js, to avoid Google attachment issues) that uses a varargs specification string of the form "[d,i,s, ...]" to process varargs. So you can do this:

Module.ccall("miniprintf", "null", ["string", "[d,i,s,i,s,d]"], ["%f %d %s %d %s %f\n", 1.234, 2, "foo", 3, "goo", -100.100])
1.23399999999999 2 foo 3 goo -100.09999999999999

(miniprintf also attached in calljs.c)

The varargs spec repeats if there are extra arguments, which would be our typical astrophysics case with hundreds of vararg doubles making annuli:

Module.ccall("miniprintf", "null", ["string", "[d]"], ["%f %f %f %f\n", 1.234, 2, 3.14, -100])
1.23399999999999 2 3.14 -100

but it repeats as a whole, so you can do this:

Module.ccall("miniprintf", "null", ["string", "[d i s]"], ["%f %d %s %f %d %s\n", 1.234, 2, "foo", 3.14, -100, "goo"])
1.23399999999999 2 foo 3.14 -100 goo

You will see that I speak Javascript with a heavy C accent, so I am not suggesting it as a PR. Just let me know whether you want to pursue this angle ...

Thanks,

Eric



calljs.c
ccall_js

Jukka Jylänki

unread,
Sep 11, 2017, 9:54:31 AM9/11/17
to emscripte...@googlegroups.com
That looks very promising. I think we'd probably want to flag the
presence of varargs with a separate function signature altogether,
such as Module.ccall_vararg(), to avoid any overhead to non-vararg
function calls (Module.cwrap() and Module.ccall() can be very on a
performance sensitive path).

In several different places in Emscripten toolchain, there already
exists these kind of "signature strings", where a single character
denotes the return type for multiple parameters. See e.g.
https://github.com/kripken/emscripten/blob/master/src/library_gl.js#L1024.
For example, a signature string "vii" would be a function taking two
32-bit integers (or pointers), and returning a void. I think same
signature string style could be used here, except that the first
character would not denote a return value, and the signature string
would only specify the varargs portion of the parameters, the
non-varargs portion is not needed.

Perhaps regex/globbing style * and + characters could be used to
denote a variable number trail. E.g. a string "iiif*" would denote
that the varargs part would have three 32-bit integers first, and
after that, 0-N single-precision floats. A string "ffd+" would say two
single-precision floats, followed by 1-N double-precision floats. This
would allow requiring one to start crafting strings that have as many
f's or d's as there are parameters in the input array. This way one
could be explicit about whether "iii" means exactly three, or 2-N, or
3-N, or 4-N.

If you're interested in championing this further, it would be best to
continue in a GitHub PR with work towards tests and patches.

Eric Mandel

unread,
Sep 11, 2017, 12:01:22 PM9/11/17
to emscripten-discuss
Great, I'll try to clear some time in the Autumn to think about this seriously -- this first offering was just a throw-away to see if there was any general agreement/interest. A separate function signature is fine. And, in principle, I agree that regex/globbing would be a preferred method of defining a number trail. But it could get complicated when trying to support the important use case of a trailing array of structs, where something like "iii(didis)+" would be needed. That looks like a (potentially slow) mess to parse and process, but we'll see.

BTW, I was not aware that floats (or chars or shorts) were allowed in the va_start() macro. They are promoted to double, leading gcc and clang to issue a compiler warning if you try to do something like: va_arg(argp, float). There may have to be an emscripten-specific judgement call on how to deal with that.


Jukka Jylänki

unread,
Sep 13, 2017, 12:09:57 PM9/13/17
to emscripte...@googlegroups.com
2017-09-11 19:01 GMT+03:00 Eric Mandel <ema...@cfa.harvard.edu>:
> I agree that regex/globbing would be a preferred method of
> defining a number trail. But it could get complicated when trying to support
> the important use case of a trailing array of structs, where something like
> "iii(didis)+" would be needed. That looks like a (potentially slow) mess to
> parse and process, but we'll see.

By regex, I only meant to support the characters + and * with the same
meaning as regex syntax has, I don't mean to support arbitrary regex
style string expansion. So we would not need parentheses or anything
like that, just a simple "if last char is a + or *, the preceding type
is multiplied 1-N or 0-N times". If someone has interest in supporting
full regex expansion like that, feel free, though I'd argue that
should only be available for cwrap() and not at all for ccall().

> BTW, I was not aware that floats (or chars or shorts) were allowed in the
> va_start() macro. They are promoted to double, leading gcc and clang to
> issue a compiler warning if you try to do something like: va_arg(argp,
> float). There may have to be an emscripten-specific judgement call on how to
> deal with that.

Hmm, that might be the case. I was under the impression that the
"standard promotion" only applied to C standard library functions, and
that arbitrary custom functions could do anything they wanted, but
perhaps the standard promotion applies to all types. In any case, good
to reuse the same signature string style so that it'll be ready for
extending to future uses, if needed/possible.

Eric Mandel

unread,
Sep 13, 2017, 12:48:05 PM9/13/17
to emscripten-discuss


By regex, I only meant to support the characters + and * with the same
meaning as regex syntax has, I don't mean to support arbitrary regex
style string expansion. So we would not need parentheses or anything
like that, just a simple "if last char is a + or *, the preceding type
is multiplied 1-N or 0-N times". If someone has interest in supporting
full regex expansion like that, feel free, though I'd argue that
should only be available for cwrap() and not at all for ccall().

Right, I was just trying to point out that restricting a repeating pattern to the last variable does not satisfy an important use case, in which a group of variables repeats:

weighted_centroid(x1, y1, n1, x2, y2, n2, x3, y3, n3 ..., xn, yn, nn, -1, -1, -1);

where x,y positions are double and counts are int, and all three repeat as a group until the end marker is found. My throw-away dealt with that using a quick mod, so that I could prove to myself that our particular needs could, in principle, be met without much processing overhead (we  can call our varargs routine thousands of times over a 2D image).

If I interpret our combined comments correctly, it's that we need a familiar way to specify repeat groups, e.g., pseudo-regexp syntax, even if full regexp will not be implemented.




Eric Mandel

unread,
Nov 6, 2017, 10:45:35 AM11/6/17
to emscripten-discuss
As promised back in September, I make a PR a few weeks ago that implements a varargs version of ccall. Are there any other actions I need to take in order to help move this along?

Reply all
Reply to author
Forward
0 new messages