[racket] FFI question again - how to get a string back from C

146 views
Skip to first unread message

key...@gmx.de

unread,
Jun 11, 2011, 2:54:32 AM6/11/11
to users
Hi again,

sorry for again asking such a basic FFI question, but I have a problem getting an output string from the C side...

E.g. in one case, in my first attempt

(def-ocilib datetotext OCI_DateToText : (date_ptr : _pointer) (fmt : _string) (size : _int) (strval : (_ptr o _string)) -> (result : _bool) -> (values strval result))

I simply tried using (strval : (_ptr o _string)) for the return string (the argument size indicates the desired size for the output string).


After the experience with the null-terminated strings from my recent post, I also tried an input-output-arg, passing a null-terminated string in to C:

(def-ocilib datetotext OCI_DateToText : (date_ptr : _pointer) (fmt : _string) (size : _int) (strval : (_ptr io _string) = (gen-output-string 100))-> (result : _bool) -> (values strval result))

where gen-output-string pads an empty string to a specified length.

But in both cases, I get the same error:

ptr-ref: expects type <cpointer> as 1st argument, given: #<bad-value>; other arguments were: #<ctype>

Honestly, I have no idea what the problem might be (or how to debug/investigate it), and would very much appreciate any hints... :-)

Many thanks!

Sigrid
_________________________________________________
For list-related administrative tasks:
http://lists.racket-lang.org/listinfo/users

Ryan Culpepper

unread,
Jun 11, 2011, 6:16:49 AM6/11/11
to key...@gmx.de, users
On 06/11/2011 12:54 AM, key...@gmx.de wrote:
> Hi again,
>
> sorry for again asking such a basic FFI question, but I have a problem getting an output string from the C side...
>
> E.g. in one case, in my first attempt
>
> (def-ocilib datetotext OCI_DateToText : (date_ptr : _pointer) (fmt : _string) (size : _int) (strval : (_ptr o _string)) -> (result : _bool) -> (values strval result))
>
> I simply tried using (strval : (_ptr o _string)) for the return string (the argument size indicates the desired size for the output string).
>
>
> After the experience with the null-terminated strings from my recent post, I also tried an input-output-arg, passing a null-terminated string in to C:
>
> (def-ocilib datetotext OCI_DateToText : (date_ptr : _pointer) (fmt : _string) (size : _int) (strval : (_ptr io _string) = (gen-output-string 100))-> (result : _bool) -> (values strval result))
>
> where gen-output-string pads an empty string to a specified length.
>
> But in both cases, I get the same error:
>
> ptr-ref: expects type<cpointer> as 1st argument, given: #<bad-value>; other arguments were: #<ctype>
>
> Honestly, I have no idea what the problem might be (or how to debug/investigate it), and would very much appreciate any hints... :-)

I'm not sure if it's possible to solve this problem with (_ptr o ???) or
(_ptr io ???); I'd be very interested to hear if it's possible.

It looks like the function wants you to pass in a buffer for it to write
into. The tricky part seems to be figuring out where the data ends. I
assume the end is indicated by a null terminator. (The APIs I've worked
with have had the courtesy to put the length of the string in an output
parameter, which makes it easier.) Here's how I would do it:

Use _bytes to represent the buffer. A type like _string won't work,
because the C function would be writing to a converted copy of the
string; it wouldn't modify the string you gave it.

The C function writes the formatted date, followed by a null terminator,
to the buffer. Now you need to extract only the formatted date, the part
before the null terminator. You could search for the position of the
null terminator yourself, or you could use an internal Racket function
that makes a byte string ("bytes") from a null-terminated byte array.

(define-ffi-definer define-racket (ffi-lib #f))

(define-racket scheme_make_byte_string
(_fun _bytes -> _racket))

For the main function, if you want to take the buffer as a parameter
(because you calculated the size elsewhere or because you want to reuse
the buffer), define the function this way:

;; OCI_DateToText : date-pointer string bytes -> (values boolean bytes)
(define-ocilib OCI_DateToText
(_fun (date fmt buffer) ::
(date : _pointer)
(fmt : _string)
(size : _int = (bytes-length buffer)
(buffer : _bytes)
-> (result : _boolean)
-> (values result
(scheme_make_byte_string buffer))))

I find it useful to specify the parameters explicitly; that's what the
part before the '::' is. It's sometimes mandatory, especially when
you're calculating some parameter values based on others; I just always
write them out unless the signature is trivial. The part after the first
'->' is a type-spec that specifies what the foreign function returns.
The part after the second '->' is a normal Racket expression that
determines what the Racket wrapper returns.

Note that the definition above gives you the result as bytes; you can
convert it to a string using bytes->string/???, whatever encoding it is.
Or you could use Racket's scheme_make_utf8_string instead of
scheme_make_byte_string if that's the right encoding.

If you want to create the buffer locally:

;; OCI_DateToText : date-pointer string -> (values boolean bytes)
(define-ocilib OCI_DateToText
(_fun (date fmt) ::
(date : _pointer)
(fmt : _string)
(size : _int = 100) ;; FIXME: big enough?
(buffer : _bytes = (make-bytes size))
-> (result : _boolean)
-> (values result
(scheme_make_byte_string buffer))))

HTH,
Ryan

key...@gmx.de

unread,
Jun 11, 2011, 8:02:02 AM6/11/11
to Ryan Culpepper, users
Hi Ryan,

thank you very much, this works very well!
I am not sure yet whether it'd be desirable to pass in the buffer to the function,- it's more work for the client, but in any case, the client will have to specify the buffer length.
Regarding garbage collection, it should not really make a difference, should it - the buffer will be automatically garbage-collected by racket (in one case, directly after the function call, earliest, in the other, when the client does not use it any more), is that correct?

Still regarding the conversion of the byte-string to a string, this is (besides garbage collection) yet another topic I still have to sort out and understand better - at the moment I'm not sure "who decides" the character set to be used, the OCILIB library, racket, or even the OS (the only thing I'm quite positive about, it will NOT be the database talked to by the c library... :-) ). But this is something I have to investigate.

Many thanks again for your help!
Sigrid

Thomas Chust

unread,
Jun 11, 2011, 9:48:05 AM6/11/11
to key...@gmx.de, users
2011/6/11 key...@gmx.de <key...@gmx.de>:
> [...]

> I have a problem getting an output string from the C side...

Hello Sigrid,

it would be helpful if you mentioned the signature of the original C
function, otherwise it is hard to decide whether your binding's
signature is correct.

I will assume that the function in question is declared as follows in
C (a declaration I copied from the online reference documentation of
a C library for Oracle database access):

OCI_EXPORT boolean OCI_API OCI_DateToText(
OCI_Date * date,
const mtext * fmt,
int size,
mtext * str
);

> [...]


> E.g. in one case, in my first attempt
>
> (def-ocilib datetotext OCI_DateToText : (date_ptr : _pointer) (fmt : _string) (size : _int) (strval : (_ptr o _string)) -> (result : _bool) -> (values strval result))
>
> I simply tried using (strval : (_ptr o _string)) for the return
> string (the argument size indicates the desired size for the output
> string).

> [...]

When the C function expects a char *, a (_ptr o _string) is wrong
because it will map to a char **. What you need is a buffer that can
be filled with the resulting string.

Something like this may do the job:

(def-ocilib date->text OCI_DateToText


(date fmt) ::
(date : _pointer) (fmt : _string)

(size : _int = (+ (string-length fmt) 127))
(str : _pointer = (malloc 'atomic (add1 size))) ->
(ok? : _bool) ->
(and ok? (cast str _pointer _string)))

This wrapper function takes the date object and a format string as
arguments, allocates a buffer for the formatted result that is 128
bytes larger than the format string and returns the formatted result
in case of success or #f otherwise.

The buffer is allocated in Racket garbage collected memory, so it is
automatically reclaimed at some time after the function call has
completed. The 'atomic flag to the allocation call tells the Racket
garbage collector that it doesn't have to scan the block of memory for
pointers to other live objects.

Also note that, for safety reasons, I allocate a buffer that is one
byte larger than the size information I pass to the C function, since
I couldn't find any documentation on whether the function will assume
the size information to include space allocated for the terminating
zero byte of the returned C string or not — both assumptions are
equally common in C libraries.

I hope this helps :-)

Ciao,
Thomas


--
When C++ is your hammer, every problem looks like your thumb.

key...@gmx.de

unread,
Jun 12, 2011, 2:41:14 AM6/12/11
to Thomas Chust, users
Hi Thomas,

thank you, too, very much for your solution! I had already successfully tried Ryan's way, but it's always good and instructive to learn several ways - now I know 2 ways to allocate a byte buffer and to cast it to a string afterwards :-)

I've tried your solution, and it worked perfectly, too (after exchanging 'atomic and the length argument to malloc, which seem to have to appear in opposite order).
In fact to be honest, I do not really understand the reason why I have to allocate a byte buffer here - what difference does it make to a char **, shouldn't both just be consecutive places in memory filled with ascii characters (sorry for asking the c-agnostic stuff again, but this project might be a chance to finally learn about these things, too :-;)

BTW thanks for pointing out the (and result output) wrapper return type - I like this, it's more concise than the (values result output) return I was using before, I don't really do anything with the "result" boolean anyway...

One last BTW,

>
> This wrapper function takes the date object and a format string as
> arguments, allocates a buffer for the formatted result that is 128
> bytes larger than the format string

I had also hoped to do something like this, but then I realized that you can't conclude from the length of the format string on the length of the output, as the format strings might not be of the straightforward "yyyy-mm-dd" - like types, which I normally only tend to use :-;

Ciao,
Sigrid

Carolyn Oates

unread,
Jun 12, 2011, 3:15:35 AM6/12/11
to us...@racket-lang.org, Wilson Francis
I am mapping the functions of the BotBall robot to Racket. 
Something similar would be logical for Lego Mindstorms, but getting it into code that runs on the Lego NXT would be different - perhaps easier... via cross-compiling.
Currently I have related defines like motor, sensor in there own racket source file.
So twould he user would have to "include" the ones they want when it into a teachpack or module? 
(First time to create my own so please excuse me if I don't use the right Racket terminology yet.)

The end goal would be to have some defines available in different language levels.
Can you set that up in one teach pack or do you need multiple or??

Thanks, Carolyn

Thomas Chust

unread,
Jun 12, 2011, 6:02:40 AM6/12/11
to key...@gmx.de, users
2011/6/12 key...@gmx.de <key...@gmx.de>:
> [...]

> In fact to be honest, I do not really understand the reason why I
> have to allocate a byte buffer here - what difference does it make
> to a char **, shouldn't both just be consecutive places in memory
> filled with ascii characters
> [...]

Hello Sigrid,

a byte buffer in Racket is something that C could see as a char *, a
char ** in C is something that Racket could see as a vector of byte
buffers — the second star adds a second level of indirection.

In terms of memory layout, a char * will be a pointer to a sequence of
byte sized objects while a char ** will be a pointer to a sequence of
word sized pointers, each pointing to a sequence of byte sized
objects.

On a sidenote, another level of indirection also tends to worsen
memory management headaches, because nothing guarantees that the
memory where the sequence of pointers is allocated and the memory
where those pointers' targets are living are related in any way.

> [...]


>> This wrapper function takes the date object and a format string as
>> arguments, allocates a buffer for the formatted result that is 128
>> bytes larger than the format string
>
> I had also hoped to do something like this, but then I realized that
> you can't conclude from the length of the format string on the
> length of the output, as the format strings might not be of the
> straightforward "yyyy-mm-dd" - like types, which I normally only
> tend to use :-;

> [...]

You could also add some flexibility by defaulting the buffer length to
some multiple or fixed offset of the format string but allowing the
user to optionally specify a different size — something like this may
work:

(def-ocilib date->text OCI_DateToText
(date fmt [size (+ (string-length fmt) 127)]) ::
(date : _pointer) (fmt : _string)
(size : _int)
(str : _pointer = (malloc (add1 size) 'atomic)) ->


(ok? : _bool) ->
(and ok? (cast str _pointer _string)))

But if there is no sure way to guess the correct buffer size before
the function call, your only failsafe option is to invoke the function
repeatedly with increasing buffer sizes as long as it reports
failure. Of course, in this case you should somehow check whether the
function really failed due to space constraints or for some other
reason.

Ciao,
Thomas


--
When C++ is your hammer, every problem looks like your thumb.

_________________________________________________

key...@gmx.de

unread,
Jun 13, 2011, 1:35:40 AM6/13/11
to Thomas Chust, users
Hi Thomas,

thanks a lot for your char * vs char** explanation!

>
> But if there is no sure way to guess the correct buffer size before
> the function call, your only failsafe option is to invoke the function
> repeatedly with increasing buffer sizes as long as it reports
> failure.

I see... but I think it will be okay, in this case, to place on the user the burden to specify the size (as a required argument) :-)

Ciao, and thanks again,
Sigrid

Reply all
Reply to author
Forward
0 new messages