[racket] FFI: problems using (_list i _string)

51 views
Skip to first unread message

key...@gmx.de

unread,
Jun 5, 2011, 2:26:08 AM6/5/11
to users
Hi all,

I have a problem with FFI which is certainly due to my lack of experience with c, and would very much appreciate any help.
I am using (_list i _string) to pass an array of strings to a c function, but all that ends up on the other side seems to be "nonsense" / random characters.

The c function is described as

boolean OCI_BindArrayOfStrings(OCI_Statement * stmt, const mtext * name, dtext * data, unsigned int len, unsigned int nbelem)

where mtext and dtext are aliases for char, the way the library was compiled, the second-but-last argument has to indicate the length of the longest string in the array, and the last argument has to be 0 apart from a special case not applicable here.
I've defined it as

(def-ocilib bindstringarray OCI_BindArrayOfStrings : (stmt_ptr : _pointer) (name : _string) (data : (_list i _string)) (maxstrlen : _uint = (getmaxlength data)) (no : _uint) -> (result : _bool))

where maxstrlen gives the length of the longest list member.
The function does not give any error, only the transmitted strings are random/whatever.


I have a "parallel" function using (_list i _int) to pass an array of ints, and this one works fine. Second,  I have no problem passing a single input string, as in

(def-ocilib bindstring OCI_BindString : (stmt_ptr : _pointer) (name : _string) (data : _string) (len : _int = (string-length data)) -> (result : _bool))

so I think I am using the _list construct itself correctly, and the datatype _string itself works fine, too - but there might be special things to pay attention to when passing arrays of strings, perhaps somehow related to field delimitation...?


By the way, I also have another problem understanding the (varname : (_list mode type)) form. If the varname still refers to the racket side, I should be able to pass it to a racket function, as I do with the
(data : (_list i _string))
in the "misbehaving" function cited above, doing (getmaxlength data),  and it works fine. But I also tried using
(data :  (_list io _string (length data)) 
instead, in order to check the correctness of the input strings, putting them out as part of the functions return values, but for doing (length data) I get the error

length: expects argument of type <proper list>; given #<cpointer>

so here data seems to be a c type, not a racket list anymore...

Many thanks in advance for any help regarding the above problem, and understanding the latter point :-)

Ciao,
Sigrid

Thomas Chust

unread,
Jun 5, 2011, 7:04:11 AM6/5/11
to key...@gmx.de, users
2011/6/5 key...@gmx.de <key...@gmx.de>:
> [...]

> The c function is described as
> boolean OCI_BindArrayOfStrings(OCI_Statement * stmt, const mtext * name,
> dtext * data, unsigned int len, unsigned int nbelem)
> where mtext and dtext are aliases for char, the way the library was
> compiled, the second-but-last argument has to indicate the length of the
> longest string in the array, and the last argument has to be 0 apart from a
> special case not applicable here.
> I've defined it as
> (def-ocilib bindstringarray OCI_BindArrayOfStrings : (stmt_ptr : _pointer)
> (name : _string) (data : (_list i _string)) (maxstrlen : _uint =
> (getmaxlength data)) (no : _uint) -> (result : _bool))
> where maxstrlen gives the length of the longest list member.
> [...]

Hello Sigrid,

if dtext is actually an alias for char, as you write, then the
signature of the C function implies that it expects a string, not a
list of strings, as its third argument.

The second problem with the binding definition is that it doesn't
convey any information about the length of the list to the C function,
which is almost certain to cause trouble. Judging by your description
of the C function's contract, simply adding a #f to the end of the
list before passing it to the marshaller should do the job here: Just
replace (data : ...) by (data : ... = (append data '(#f))).

Finally, when passing ephemeral data like this array and the string
pointers it contains to C, one has to be careful that the Racket
garbage collector doesn't interfer with assumptions made on the C side
about the incoming data: If the function doesn't copy the data but
simply stores the pointers somewhere and returns, the data may be
garbage collected before it is used again in some other C function,
which of course will fail miserably in that case.

> [...]


> I also tried using
> (data :  (_list io _string (length data))
> instead, in order to check the correctness of the input strings, putting
> them out as part of the functions return values, but for doing (length data)
> I get the error
> length: expects argument of type <proper list>; given #<cpointer>
> so here data seems to be a c type, not a racket list anymore...

> [...]

I think that in the lexical scope where the expansion of the _list
syntax inside the _fun syntax places the output argument length
computation, the name of the output argument is bound to the raw
pointer from C. You can circumvent this by using a different name for
the argument of the Racket wrapper procedure and the argument of the C
function — check the description of the _fun syntax for the maybe-args
part.

Ciao,
Thomas


--
When C++ is your hammer, every problem looks like your thumb.

_________________________________________________
For list-related administrative tasks:
http://lists.racket-lang.org/listinfo/users

key...@gmx.de

unread,
Jun 5, 2011, 1:02:26 PM6/5/11
to Thomas Chust, users
Hi Thomas,

thanks a lot for your answer!

>
> if dtext is actually an alias for char, as you write, then the
> signature of the C function implies that it expects a string, not a
> list of strings, as its third argument.


Oh, you are right, of course! I was mislead by the integer case, where indeed an array was expected...
In some example c code available, the argument actually is a two-dimensional array of chars, which then of course is passed as a pointer to char, and I guess the separation of strings is achieved by the null-termination of c strings then...
I wonder how I am going to do this in racket, should I append null to every string, string-append the strings and then pass a single string to the function? I suppose there is a better way?


>
> The second problem with the binding definition is that it doesn't
> convey any information about the length of the list to the C function,
> which is almost certain to cause trouble. Judging by your description
> of the C function's contract, simply adding a #f to the end of the
> list before passing it to the marshaller should do the job here: Just
> replace (data : ...) by (data : ... = (append data '(#f))).

I see... now that I'm not going to use _list (in this case) any more, it will not be applicable, but it's good to know!


>
> Finally, when passing ephemeral data like this array and the string
> pointers it contains to C, one has to be careful that the Racket
> garbage collector doesn't interfer with assumptions made on the C side
> about the incoming data: If the function doesn't copy the data but
> simply stores the pointers somewhere and returns, the data may be
> garbage collected before it is used again in some other C function,
> which of course will fail miserably in that case.

Thanks for pointing this out! Regarding racket garbage collection, on the one hand, and c pointer freeing, that is generally a topic I'm very unsure about and could perhaps use some "basic" advice (more basic than the FFI reference, I mean).
For example, do I have to explicitly "destroy" c pointers somehow, and if so, when? In the case of my c library given, I assume I have to implement every function like '<object>Free' and call it when I'm done, but apart from that?
(Sorry for asking such a plain question.)
In this concrete case, what would one do - use a non-automatically-freed pointer and explicitly free it afterwards?

>
> I think that in the lexical scope where the expansion of the _list
> syntax inside the _fun syntax places the output argument length
> computation, the name of the output argument is bound to the raw
> pointer from C. You can circumvent this by using a different name for
> the argument of the Racket wrapper procedure and the argument of the C
> function — check the description of the _fun syntax for the maybe-args
> part.


I will have a look, thanks!

Thank you again, I hope I didn't ask too many too basic questions now, but never having done any c programming and just knowing the concepts "theoretically", there a quite some things about using FFI I don't automatically understand from the Reference...

Ciao,
Sigrid

Thomas Chust

unread,
Jun 5, 2011, 6:56:27 PM6/5/11
to key...@gmx.de, users
2011/6/5 key...@gmx.de <key...@gmx.de>:
> [...]

> In some example c code available, the argument actually is a
> two-dimensional array of chars, which then of course is passed as a
> pointer to char, and I guess the separation of strings is achieved
> by the null-termination of c strings then...

Hello Sigrid,

in C, strings are almost always zero terminated, yes. However, a two
dimensional array that doesn't consist of string pointers but the
string data itself will necessarily contain fixed length strings in C,
possibly padded with zero bytes in case the actual payload for some
items is smaller than the available space.

> [...]


> I wonder how I am going to do this in racket, should I append null
> to every string, string-append the strings and then pass a single
> string to the function?

> [...]

Yes, I would say this is a perfectly fine solution. To convert a list
of strings to something that can be used as an array of fixed width
items try something like this module:

#lang racket/base
(require
srfi/13
srfi/26)

(define (string-list->fixed-width-array width items)
(string-join (map (cut string-pad-right <> width #\null) items)))

(provide
(all-defined-out))

> [...]


> Regarding racket garbage collection, on the one hand, and c pointer
> freeing, that is generally a topic I'm very unsure about and could
> perhaps use some "basic" advice (more basic  than the FFI reference,
> I mean). For example, do I have to explicitly "destroy"  c pointers
> somehow, and if so, when? In the case of my c library given, I
> assume I have to implement every function like '<object>Free' and
> call it when I'm done, but apart from that?

> [...]

Ok, I'll try to sum up a few basics: First it's important to note that
values allocated by Racket and blocks allocated by C code live in
different memory areas and are handled differently. While Racket
values are reclaimed automatically by the garbage collector some time
after they are no longer referenced, blocks allocated from C live
forever unless explicitly destroyed. These two worlds know nothing
about each other by default.

The pointers your code handles never have to be destroyed explicitly,
but the objects they point to may have to be destroyed. There is no
simple general rule when and how this has to happen. It all depends on
the way the C code is designed and what mechanisms it uses for memory
management.

In the most simple situation, you do not use any C functions that
allocate and store or return (pointers to) blocks of memory, so you
can just let all the data live in the memory area managed by Racket
and let the garbage collector do its job without caring about manually
freeing any blocks of memory. C structures can be allocated through
Racket's garbage collector, too, and they will be reclaimed just like
Racket values some time after all pointers referring to them go out of
scope. Look at the documentation of malloc in the ffi/unsafe module
for details about memory allocation from the Racket side.

Another case encountered frequently is that there are C functions
creating, operating on and destroying pointers to some opaque data
structure. These structures do not live in Racket's managed memory, so
calls to the creation functions have to be matched by calls to the
destruction functions or the objects will stay in memory until the
process terminates.

In this situation you have two basic choices: You can either provide
bindings to the functions that create and destroy objects and require
any client code to handle memory management of those objects
explicitly, manually destroying them when they are no longer
needed. Or you can set up an automatism in Racket that ensures all
those objects created by calls from Racket are freed some time after
Racket code no longer holds any references to them. An easy way to
achieve this is by decorating the bindings for the creation and
destruction procedures using allocator and deallocator from the
ffi/unsafe/alloc module. However, care has to be taken that no code
outside Racket is still using the objects when they are reclaimed by
the Racket garbage collector. Some C libraries use reference counting
to better handle the situation that different pieces of code may hold
references to an object for random periods of time; the
ffi/unsafe/alloc module also has support for that situation.

Things get progressively more complicated if the C structures put
inside Racket's memory areas contain pointers themselves, if
structures outside Racket's memory areas contain pointers to Racket
values, or if callbacks from C into Racket are used.

> [...]


> I hope I didn't ask too many too basic questions now, but never
> having done any c programming and just knowing the concepts
> "theoretically", there a quite some things about using FFI I don't
> automatically understand from the Reference...

> [...]

Among all the FFIs I have used, Racket's is one of the most
comfortable and full featured ones. However, its features imply
complexity, too, and I can well imagine that without any prior
experience in low level programming and manual memory management, one
can easily get lost. So questions have to be expected :-)

key...@gmx.de

unread,
Jun 6, 2011, 4:55:54 PM6/6/11
to Thomas Chust, users
Hi Thomas,

thanks so much for your introduction to memory management in FFI programming, which was very informative, concise and helpful indeed!
It gives me a perfect starting point for exploring the topic.
For the beginning, I think I'd be well advised, then, to explicitly call the ...Free() functions provided by the c library, and later on, I will experiment with the decorator strategy.
As regards the more complicated cases you mention, I will no doubt stumble over them sooner or later...

For now, thanks a lot again for taking the time, and writing things up so well :-)
Ciao,
Sigrid

key...@gmx.de

unread,
Jun 7, 2011, 8:41:44 AM6/7/11
to Thomas Chust, users
Hi Thomas again,

for time reasons, I'm answering the second (or rather, first) part of your mail separately :-;

Just wanted to say that your code suggestion


 (define (string-list->fixed-width-array width items)
   (string-join (map (cut string-pad-right <> width #\null) items)))


was even more helpful than I was aware of at first (it being always good to have code examples) :-)
I tried writing a simple null-appending and string-joining function first, but it turned out that padding every string to an equal length is a necessity, simply null-terminating them is not enough :-;

So thanks again,
Sigrid
Reply all
Reply to author
Forward
0 new messages