cgo CString

989 views
Skip to first unread message

Albert Strasheim

unread,
Feb 10, 2011, 2:03:36 AM2/10/11
to golang-nuts
Hello all

Given a C function like this:

void foo(const char *s);

Is there a good reason to prefer:

cstr2 := C.CString(str)
defer C.free(unsafe.Pointer(cstr2))
C.foo(cstr2)

to:

cstr1 := (*C.char)(unsafe.Pointer(&([]byte)(str)[0]))
C.foo(cstr1)

It seems to me like the second approach (although a bit harsher on the
eye), saves a call or two through the whole cgo layer.

However, both approaches are going to copy the string. Maybe that's
unavoidable, but not ideal.

Thoughts?

Regards

Albert

bflm

unread,
Feb 10, 2011, 3:33:02 AM2/10/11
to golang-nuts
Do not do it.

a) This way there's no zero byte/char terminating the C *char.

b) The fabricated *char passed to C has AFAICS no live Go side
reference and can be thus garbage collected at any moment - even while
the C function is executing.

Both of the flaws can make the app misbehaving/crashing.

Gustavo Niemeyer

unread,
Feb 10, 2011, 7:35:28 AM2/10/11
to golang-nuts
>> cstr1 := (*C.char)(unsafe.Pointer(&([]byte)(str)[0]))

What are you really trying to save by doing this? Is your computation
so inexpensive, or the string so large, that attempting to do this would
compensate? Have you looked at CString to see what's really going
on there?

Note that to avoid the copy you'll actually have to dig further. The
version above will have to copy the data to convert it to []byte.

> b) The fabricated *char passed to C has AFAICS no live Go side
> reference and can be thus garbage collected at any moment - even while
> the C function is executing.

While in general I agree with the "don't do it", this is not strictly true.
cstr1 is a pointer to the underlying data, and will prevent it from getting
garbage collected while the function called from Go is executing. You're
right that it has no guarantees of staying around after that, though.

--
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter

Jan Mercl

unread,
Feb 10, 2011, 8:05:42 AM2/10/11
to golan...@googlegroups.com
On Thursday, February 10, 2011 1:35:28 PM UTC+1, Gustavo Niemeyer wrote:

While in general I agree with the "don't do it", this is not strictly true.
cstr1 is a pointer to the underlying data, and will prevent it from getting
garbage collected while the function called from Go is executing.  You're
right that it has no guarantees of staying around after that, though.

IMO the compiler is free to collect cstr1 as soon as it detects it last possible accessibility. As in:

func f() {
    s := "abc"
    t := s + "d"
    g(t) // s is not reachable, can be already collected before g is invoked.
/*
t is prevented from collecting during g() execution as it is (at least initially) reachable via the invocation record of g(). Are the invocation records of C functions also tracked by the garbage collector? I guess they are not.
*/
}

In the OP case cstr1 might be out of scope after C.foo has been invoked but before it has returned. Regardless of it being implemented this way or not by the current Go compilers, it looks like per specs it can be implemented in the way which can then crash the C function.

It would be nice to hear from the Go authors what's the correct way of specs interpretation on this scenario.

Gustavo Niemeyer

unread,
Feb 10, 2011, 8:18:56 AM2/10/11
to golan...@googlegroups.com
> IMO the compiler is free to collect cstr1 as soon as it detects it last
> possible accessibility.

Sure, this is true for any garbage collection. The issue is defining what
last possible accessibility means.

>     s := "abc"
>     t := s + "d"
>     g(t) // s is not reachable, can be already collected before g is
> invoked.

Think about this case:

s := "abc"
t := s[1:2]
g(t) // Can s be collected here?

> In the OP case cstr1 might be out of scope after C.foo has been invoked but

> before it has returned. (...)

It can't, because there's necessarily a reference in the stack.

Jan Mercl

unread,
Feb 10, 2011, 8:39:44 AM2/10/11
to golan...@googlegroups.com
On Thursday, February 10, 2011 2:18:56 PM UTC+1, Gustavo Niemeyer wrote:
> IMO the compiler is free to collect cstr1 as soon as it detects it last
> possible accessibility.

Sure, this is true for any garbage collection.  The issue is defining what
last possible accessibility means.

>     s := "abc"
>     t := s + "d"
>     g(t) // s is not reachable, can be already collected before g is
> invoked.

Think about this case:

    s := "abc"
    t := s[1:2]
    g(t) // Can s be collected here?

That's a completely different story. Here s *is* still reachable as it shares the backing array with t (in fact in gc it does, but that's AFAIK *not* due to the  specs). That's why I used 't := s + "d"', to make sure t is independent of s. 

> In the OP case cstr1 might be out of scope after C.foo has been invoked but
> before it has returned. (...)

It can't, because there's necessarily a reference in the stack.

Here I can't figure out the context/meaning of the above sentence. Which stack? Of the C function or of the Go function? Would the Go function be e.g.:
func f(s string) {
    cstr1 := /* create unsafe.Pointer to s[0] */
    C.foo(cstr1)
    t := s + "d"
    cstr2 := /* create unsafe.Pointer to t[0] */
    C.foo(cstr2)
}

then cstr1 should be hopefully safe while cstr2 might be not, I guess.

Gustavo Niemeyer

unread,
Feb 10, 2011, 9:11:46 AM2/10/11
to golan...@googlegroups.com
> That's a completely different story. Here s *is* still reachable as it

Yes, it's a different story. The story the OP is asking about. He's got
some data in the heap and has a pointer to it.

> Here I can't figure out the context/meaning of the above sentence. Which
> stack? Of the C function or of the Go function? Would the Go function be

cstr1 is a Go variable within a Go function.

Jan Mercl

unread,
Feb 10, 2011, 9:36:42 AM2/10/11
to golan...@googlegroups.com


On Thursday, February 10, 2011 3:11:46 PM UTC+1, Gustavo Niemeyer wrote:
> That's a completely different story. Here s *is* still reachable as it

Yes, it's a different story.  The story the OP is asking about.  He's got
some data in the heap and has a pointer to it.

From the specs POV there is no heap ever mentioned. And from the OP bellow, I can't figure out anything about 'str', where it is (global var?, local var?, function argument? constant?) or where it come from. I don't think there can be in fact said anything specific about OP 'str'. But the critical entity here is 'cstr' (unfortunately also context less in OP). I tried to show a pattern which might in theory be problematic/dangerous. The actual OP pattern could be completely different and thus safe in this regard.

Still I don't see a good reason not to use the standard C.CString and that's probably the only important advice which can be given to the OP. That will also make the code interact with GC safely, as the pointer one gets from C.CString is obviously not traced nor traceable by the GC.

On Thursday, February 10, 2011 8:03:36 AM UTC+1, Albert Strasheim wrote:
Given a C function like this: 

void foo(const char *s); 

Is there a good reason to prefer: 

cstr2 := C.CString(str) 
defer C.free(unsafe.Pointer(cstr2)) 
C.foo(cstr2) 

to: 

cstr1 := (*C.char)(unsafe.Pointer(&([]byte)(str)[0])) 

Gustavo Niemeyer

unread,
Feb 10, 2011, 11:29:12 AM2/10/11
to golan...@googlegroups.com
> (...) The actual OP pattern could be completely different and thus safe
> in this regard.

This is the context in the OP message:

cstr1 := (*C.char)(unsafe.Pointer(&([]byte)(str)[0]))
C.foo(cstr1)

This is what I said about it:

"""
cstr1 is a pointer to the underlying data, and will prevent it from getting
garbage collected while the function called from Go is executing. You're
right that it has no guarantees of staying around after that, though.
"""

It's simple. Don't argue about points which haven't been made.

Jan Mercl

unread,
Feb 10, 2011, 11:58:46 AM2/10/11
to golan...@googlegroups.com
On Thursday, February 10, 2011 5:29:12 PM UTC+1, Gustavo Niemeyer wrote:
> (...) The actual OP pattern could be completely different and thus safe
> in this regard.

This is the context in the OP message:

    cstr1 := (*C.char)(unsafe.Pointer(&([]byte)(str)[0]))
    C.foo(cstr1)

This is what I said about it:

"""
cstr1 is a pointer to the underlying data, and will prevent it from getting
garbage collected while the function called from Go is executing.  You're
right that it has no guarantees of staying around after that, though.
"""

It's simple.  Don't argue about points which haven't been made.

I don't get it. First thing is that it's not clear to me if values of type unsafe.Pointer are considered by the GC or not. The second is still the same. If the next line after C.foo(cstr1) will be e.g. the closing right brace of a Go function then it still seems to me, that a conforming Go compiler is allowed to collect the pointee of cstr1 while C.foo() is executing.(1)

As an afterthought:

OK, let's say unsafe.Pointers are traced by GC (as you seems to rely on that in your reasoning about cstr1 above ). That would imply no C function can ever safely free any memory pointed to by some of it parameters. But that's a contradiction to the C.free(ptr_got_from_C_CString) idiom being useful at all.

So we have to assume unsafe.Pointer typed values are not traced by the GC. Then we are back at square one and the above (1) situation seems in theory possible.

BTW: IMO there's nobody arguing here. It's a pretty normal discussion. Differing opinions don't matter. False facts are disqualifying but mistakes are legal.

Gustavo Niemeyer

unread,
Feb 10, 2011, 12:19:29 PM2/10/11
to golan...@googlegroups.com
> I don't get it. First thing is that it's not clear to me if values of type
> unsafe.Pointer are considered by the GC or not.

They are.

> The second is still the same. If the next line after C.foo(cstr1) will
> be e.g. the closing right brace of a Go function then it still seems
> to me, that a conforming Go compiler is allowed to collect the
> pointee of cstr1 while C.foo() is executing.(1)

Lots of things can happen in a conforming Go implementation in
terms of memory management. It can move pointers around, it
can change the representation of native types, and it doesn't even
have to implement cgo, for a start. If you are poking at memory
with unsafe, you should be aware that you're not walking into a
safe land.

> As an afterthought:
> OK, let's say unsafe.Pointers are traced by GC (as you seems to rely on that
> in your reasoning about cstr1 above ). That would imply no C function can
> ever safely free any memory pointed to by some of it parameters. But that's

Yes, no C function can free() things which were not *alloc()ed. That's
a pretty well known constraint.

> a contradiction to the C.free(ptr_got_from_C_CString) idiom being useful at
> all.

Read the code of CString.

> BTW: IMO there's nobody arguing here. It's a pretty normal discussion.

Normal discussions are made of arguments. :-)

Ian Lance Taylor

unread,
Feb 10, 2011, 12:28:33 PM2/10/11
to golan...@googlegroups.com
Jan Mercl <jan....@nic.cz> writes:

> I don't get it. First thing is that it's not clear to me if values of type
> unsafe.Pointer are considered by the GC or not. The second is still the
> same. If the next line after C.foo(cstr1) will be e.g. the closing right
> brace of a Go function then it still seems to me, that a conforming Go
> compiler is allowed to collect the pointee of cstr1 while C.foo() is
> executing.(1)
>
> As an afterthought:
>
> OK, let's say unsafe.Pointers are traced by GC (as you seems to rely on that
> in your reasoning about cstr1 above ). That would imply no C function can
> ever safely free any memory pointed to by some of it parameters. But that's
> a contradiction to the C.free(ptr_got_from_C_CString) idiom being useful at
> all.
>
> So we have to assume unsafe.Pointer typed values are not traced by the GC.
> Then we are back at square one and the above (1) situation seems in theory
> possible.

Values of type unsafe.Pointer are tracked by the GC. However, all that
means is that, if they happen to point to some space allocated by the Go
runtime, that space will not be freed. The GC doesn't do anything about
memory allocated by the C runtime. It just ignores pointers into that
memory.

In the code cstr1 is still there in the local variable in the goroutine
while C.foo is running, so the GC can't release it until C.foo is
complete.

Ian

Jan Mercl

unread,
Feb 10, 2011, 12:45:11 PM2/10/11
to golan...@googlegroups.com
On Thursday, February 10, 2011 6:19:29 PM UTC+1, Gustavo Niemeyer wrote:
> I don't get it. First thing is that it's not clear to me if values of type
> unsafe.Pointer are considered by the GC or not.

They are.

Now I know. 

> As an afterthought:

> OK, let's say unsafe.Pointers are traced by GC (as you seems to rely on that
> in your reasoning about cstr1 above ). That would imply no C function can
> ever safely free any memory pointed to by some of it parameters. But that's

Yes, no C function can free() things which were not *alloc()ed.  That's
a pretty well known constraint.

> a contradiction to the C.free(ptr_got_from_C_CString) idiom being useful at
> all.

Read the code of CString.

Actually haven't done that. Enough was the enlightening provided by Ian's post. What I mixed wrong was GC tracking relation to GC freeing. I assumed the former implies the later which is not the case. The unsafe.Pointer is an example for "is tracked" but not "is freed" by the GC. This guarantees cstr1 is safe to pass to C.foo().

> BTW: IMO there's nobody arguing here. It's a pretty normal discussion.

Normal discussions are made of arguments. :-)

That's about my poor English. I thought "to argue" is related to "quarrel". Another mistake of mine.

Thanks for helping me to wrap my head around this.

Jan Mercl

unread,
Feb 10, 2011, 12:48:08 PM2/10/11
to golan...@googlegroups.com
On Thursday, February 10, 2011 6:28:33 PM UTC+1, Ian Lance Taylor wrote:

Values of type unsafe.Pointer are tracked by the GC.  ...

Thanks a lot Ian. I've discussed your post (and my faults) in a reply to Gustavo just above this one.

Reply all
Reply to author
Forward
0 new messages