Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Caml-list] va_arg values

12 views
Skip to first unread message

Bob Matcuk

unread,
Jan 13, 2007, 6:17:18 PM1/13/07
to caml...@yquem.inria.fr
I've been wondering what is the best way to write a C function that
takes a variable number of value arguments. Lets say, for example,
that I was writing a function that took an object, a string
(specifying a method on the object), and a variable number of
arguments to pass to the method. The function would then construct an
array (with the object being the first element) from these arguments
and pass it along to caml_callbackN. This function, of course, would
only ever be called by other C functions.

The thing that I guess I'm caught up on is the fact that I cannot
directly apply CAMLparam to these variable arguments. Some calling
conventions place all arguments on the stack, in which case CAMLparamN
could be used (as long as you knew whether the stack was building up
or down). However, some do not do this (the AMD64 calling convention,
for example, puts the first 6 arguments in registers, the rest on the
stack).

I guess the real question is: is it even necessary to worry about this?
The function, as I said, will only ever be called from other C
functions (who have already designated these values as being
params/local to themselves, assuming they are written correctly). I
seem to recall reading somewhere that if you write a function that
will only ever be called from other C functions that have already
registered the values (via CAMLparam/CAMLlocal), then it's
unnecessary to do it again. The function doesn't allocate any new
values either, so it shouldn't trip the GC anyway, right? The function
should, therefore, be something like this:

value func(value obj, char *method, int n, ...)
{
va_list ap;
int i;
value *args = calloc(n + 1, sizeof(value));
value r;

args[0] = obj;
va_start(ap, n);
for (i = 0; i < n; i++)
args[i + 1] = va_arg(ap, value);
va_end(ap);

r = caml_callbackN(caml_get_public_method(obj, hash_variant(method)),
n + 1, args);
free(args);
return r;
}

Should probably check calloc for success and maybe throw an exception
if it failed... Which actually brings me to another quick question: if
I throw an exception, say caml_failwith("..."), is it necessary to
still call CAMLreturn after it? Or will the exception cause the
function to exit?

Is it an invalid assumption that it is unnecessary to bother with the
CAMLparam/CAMLlocal stuff since there's nothing to trip the GC? If so,
what is the best way to handle all the CAMLparam/CAMLlocal stuff? For
example, CAMLlocalN(args, n + 1) is invalid because C does not allow
you to declare dynamic arrays. Looping over the arguments with
va_start/for loop/va_arg and calling CAMLparam on them is also invalid
because it would be declaring the caml__root_##x variable within the for
loop. I have typed up some code that should work if it is necessary,
but it's messy and if CAMLparam is ever changed, it's likely I'll need
to change my code too. I want to avoid that.

One last quick question: is the line "args[i + 1] = va_arg(ap, value);"
above legal? args[] is an array of value and va_arg(ap, value) will
return a value. So, essentially, it's the same thing as the assignment
in the following example:

value func(value v1)
{
value v2 = v1;
...
}

I know values are just pointers so it is syntactically correct, but
what I'm asking is: is it safe to do? Should I be using some function
instead to create a copy of the value?

Thanks in advance for any insight!

--
Bob Matcuk
http://www.Squeg.Net/

Explanation of My Return Address, GPG Key:
http://www.Squeg.Net/returnAddr.php

hamartiology - http://www.kokogiak.com/logolepsy/ow_h.html#hamartiology

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Richard Jones

unread,
Jan 15, 2007, 5:51:00 AM1/15/07
to Bob Matcuk
On Sat, Jan 13, 2007 at 07:18:49PM -0500, Bob Matcuk wrote:
> Thanks in advance for any insight!

The CAMLparam/CAMLlocal/CAMLreturn/CAMLxparam macros are pretty simple
to understand. I suggest you take some simple code using these
macros, run it through cpp, and have a look at what these macros
actually generate. You will be able to make your own (possibly
non-portable) variations which update caml_local_roots etc. directly.
If that is necessary.

Rich.

--
Richard Jones
Red Hat UK Limited

Bob Matcuk

unread,
Jan 15, 2007, 5:44:36 PM1/15/07
to caml...@yquem.inria.fr
On Mon, 15 Jan 2007 10:44:07 +0000
Richard Jones <ri...@annexia.org> wrote:

> The CAMLparam/CAMLlocal/CAMLreturn/CAMLxparam macros are pretty simple
> to understand. I suggest you take some simple code using these
> macros, run it through cpp, and have a look at what these macros
> actually generate. You will be able to make your own (possibly
> non-portable) variations which update caml_local_roots etc. directly.
> If that is necessary.

Indeed. As I said in my original e-mail, I have already written my own
code to do what CAMLparam/CAMLlocal/CAMLxparam do for variable
arguments. Basically, well... Lets start with the quickest question to
answer: is it even necessary?

IF my function is only to be called from other C functions AND if those
functions have already properly registered all of the values via
CAMLparam/CAMLlocal/CAMLxparam (a safe assumption assuming competent
programming) THEN: is it necessary for my function to bother
re-registering these values?

My guess would be no. Generally, it would seem to me that a function
should only have to bother with the
CAMLparam/CAMLlocal/CAMLxparam/CAMLreturn stuff if it makes allocations
via "caml_" functions (malloc and friends would be safe as they have no
hooks into the OCaml GC). Is this true?

Explanation of My Return Address, GPG Key:
http://www.Squeg.Net/returnAddr.php

_______________________________________________

Chris King

unread,
Jan 15, 2007, 11:23:17 PM1/15/07
to Bob Matcuk
Short answer: your function is okay, but not for the reasons you
state. Long answer:

On 1/13/07, Bob Matcuk <Hamart...@squeg.net> wrote:
> I guess the real question is: is it even necessary to worry about this?
> The function, as I said, will only ever be called from other C
> functions (who have already designated these values as being
> params/local to themselves, assuming they are written correctly).

This doesn't matter. The purpose of CAMLparam/CAMLlocal is twofold:
first, to declare that a particular value shouldn't be garbage
collected (which, as you point out, the other C functions take care
of). Secondly (and more importantly in this case), they inform the
garbage collector of the all roots which it may need to update in the
event that it relocates a block. If you don't use CAMLparam/CAMLlocal
and a GC cycle occurs, there's a chance that a block which one of your
values points to will be moved, and your value will become invalid.

> I
> seem to recall reading somewhere that if you write a function that
> will only ever be called from other C functions that have already
> registered the values (via CAMLparam/CAMLlocal), then it's
> unnecessary to do it again.

AFAIK this is incorrect (because blocks can be relocated). The only
time it is safe not to register values is if your code does not assume
that the contents of a value are valid after a call which could trip
the GC.

> The function doesn't allocate any new
> values either, so it shouldn't trip the GC anyway, right?

True, false. caml_callbackN executes arbitrary code, which may or may
not trip the GC. hash_variant and caml_get_public_method are
questionable also (since they return values), but looking at the Caml
source code, it seems that they are safe (but I don't think the docs
guarantee this).

(BTW you should use caml_hash_variant rather than hash_variant; the
comment for caml_get_public_method in caml/mlvalues.h should probably
be updated to this effect also.)

> Should probably check calloc for success and maybe throw an exception
> if it failed...

You could do this with caml_stat_alloc and caml_stat_free (in
caml/memory.h). These are equivalent to malloc/free but throw Caml's
out-of-memory exception if they fail. However in this case, I would
simply declare args as an array. Otherwise, if the callback throws an
exception, args will not be freed unless you explicitly catch
exceptions via caml_callbackN_exn, free it, and then re-raise the
exception.

Note that if you have no control over the C functions higher up the
call chain (say an external library which calls your function), they
could exhibit similar problems if they are unaware of the possibility
of your function raising an exception. The best thing to do in such a
case would be to return an error condition if possible, or at the very
least, print a warning and return or exit gracefully (the functions in
caml/printexc.h help here).

> Which actually brings me to another quick question: if
> I throw an exception, say caml_failwith("..."), is it necessary to
> still call CAMLreturn after it? Or will the exception cause the
> function to exit?

The exception causes the function to exit. You can see which
functions act like this in the header files by looking for the
"Noreturn" attribute at the end of their declaration.

> Is it an invalid assumption that it is unnecessary to bother with the
> CAMLparam/CAMLlocal stuff since there's nothing to trip the GC? If so,
> what is the best way to handle all the CAMLparam/CAMLlocal stuff?

Yes, it is an invalid assumption, because your code may in fact trip the GC.

BUT

Look over the function you wrote carefully. Notice that values obj
and *args are used only before the call to caml_callbackN, and that
the value r is used only after that call. Your function is indeed
safe, only because you don't use after the "unsafe" call any value
which was initialized before.

Of course, for the sake of maintainability, I wouldn't in general
endorse such eliding of CAMLparam/CAMLlocal. I'd recommend putting a
big ol' warning in there :)

> For
> example, CAMLlocalN(args, n + 1) is invalid because C does not allow
> you to declare dynamic arrays.

K&R C doesn't, but GCC does. If you're using another compiler or some
compatibility flag, then the alloca function (usually found in
alloca.h) should do the trick. It allocates space on the stack
exactly like an array declaration does, so the guts of CAMLlocalN
should apply to it.

> I know values are just pointers so it is syntactically correct, but
> what I'm asking is: is it safe to do? Should I be using some function
> instead to create a copy of the value?

Copying values with assignment is perfectly legal, provided the
locations to which they are copied are registered with the GC first
(just like any other value).

Hope this was able to clear things up, I've hit many of these bumps
myself while learning to write extensions. The best thing to remember
is that Caml's GC is not a reference counter but a generational
collector and can move blocks from right under your nose. Then the
reasons to use CAMLlocal/CAMLparam become clear.

- Chris

Bob Matcuk

unread,
Jan 16, 2007, 1:46:18 AM1/16/07
to caml...@yquem.inria.fr
Thank you for your long reply! This is exactly the information I was
looking for. Cleared up the role of the GC for me. I'm not used to
working with any kind of automatic GC; I'm somewhat of a hardcore C
fanatic. It didn't occur to me that the GC might move things around,
though I feel I should have known! Doh!

On Mon, 15 Jan 2007 23:18:51 -0500
"Chris King" <colan...@gmail.com> wrote:

> True, false. caml_callbackN executes arbitrary code, which may or may
> not trip the GC. hash_variant and caml_get_public_method are
> questionable also (since they return values), but looking at the Caml
> source code, it seems that they are safe (but I don't think the docs
> guarantee this).

Indeed. I was a bit weary of that myself.

> (BTW you should use caml_hash_variant rather than hash_variant; the
> comment for caml_get_public_method in caml/mlvalues.h should probably
> be updated to this effect also.)

Aye - I grabbed that from the documentation. I've noticed there are a
couple places where the documentation is missing the "caml_" but for
some reason, I didn't even think twice about that one.

> You could do this with caml_stat_alloc and caml_stat_free (in
> caml/memory.h). These are equivalent to malloc/free but throw Caml's
> out-of-memory exception if they fail. However in this case, I would
> simply declare args as an array. Otherwise, if the callback throws an
> exception, args will not be freed unless you explicitly catch
> exceptions via caml_callbackN_exn, free it, and then re-raise the
> exception.

I hadn't even thought of that! Thanks! Not used to functions that don't
return other than the exec's and exit.

> Note that if you have no control over the C functions higher up the
> call chain (say an external library which calls your function), they
> could exhibit similar problems if they are unaware of the possibility
> of your function raising an exception. The best thing to do in such a
> case would be to return an error condition if possible, or at the very
> least, print a warning and return or exit gracefully (the functions in
> caml/printexc.h help here).

Excellent advice; thanks again.

> K&R C doesn't, but GCC does. If you're using another compiler or some
> compatibility flag, then the alloca function (usually found in
> alloca.h) should do the trick. It allocates space on the stack
> exactly like an array declaration does, so the guts of CAMLlocalN
> should apply to it.

The problem with alloca is that it is not as portable (though, I can't
see what the problem is - I believe most, if not all architectures
could implement it as a single instruction). Still, given your previous
comment about the callback throwing an exception, perhaps it is the
best way to go...

Thank you again for your clarifications.

Explanation of My Return Address, GPG Key:
http://www.Squeg.Net/returnAddr.php

_______________________________________________

Mattias EngdegÄrd

unread,
Jan 16, 2007, 5:52:45 AM1/16/07
to colan...@gmail.com
>> For
>> example, CAMLlocalN(args, n + 1) is invalid because C does not allow
>> you to declare dynamic arrays.
>
>K&R C doesn't, but GCC does.

Not just GCC - it's a feature of standard C (C99), so any modern compiler
will do.

Xavier Leroy

unread,
Jan 21, 2007, 12:27:53 PM1/21/07
to Bob Matcuk
> value func(value obj, char *method, int n, ...)
> {
> va_list ap;
> int i;
> value *args = calloc(n + 1, sizeof(value));
> value r;
>
> args[0] = obj;
> va_start(ap, n);
> for (i = 0; i < n; i++)
> args[i + 1] = va_arg(ap, value);
> va_end(ap);
>
> r = caml_callbackN(caml_get_public_method(obj, hash_variant(method)),
> n + 1, args);
> free(args);
> return r;
> }
>
> Is it an invalid assumption that it is unnecessary to bother with the
> CAMLparam/CAMLlocal stuff since there's nothing to trip the GC?

It's a valid assumption. The hard rule is that you must register as
root all local variables that need to survive a call to the GC or to a
function that might trigger the GC. In general it's hard to keep
track of those variables, which is why the OCaml manual advocates
registering everything that has type "value". But in your case you
cannot, and the code fragment above look safe to me. (Make sure you
document this in a big comment, though...) As others pointed out,
you're also relying on the fact that caml_get_public_method
and hash_variant never trigger a GC, which is true now and in the
foreseeable future, but again a big comment is warranted.

> Should probably check calloc for success and maybe throw an exception
> if it failed... Which actually brings me to another quick question: if
> I throw an exception, say caml_failwith("..."), is it necessary to
> still call CAMLreturn after it? Or will the exception cause the
> function to exit?

The latter: the exception triggers something like a longjmp(), causing
immediate return from the C function (and de-registration of the local
roots, if any).

> One last quick question: is the line "args[i + 1] = va_arg(ap, value);"
> above legal? args[] is an array of value and va_arg(ap, value) will
> return a value. So, essentially, it's the same thing as the assignment
> in the following example:
>
> value func(value v1)
> {
> value v2 = v1;
> ...
> }
>
> I know values are just pointers so it is syntactically correct, but
> what I'm asking is: is it safe to do? Should I be using some function
> instead to create a copy of the value?

No copying is necessary.

Hope this helps,

- Xavier Leroy

0 new messages