C++ 11 to Golang convertor

aureal...@gmail.com

unread,

Jan 2, 2019, 10:37:17 PM1/2/19

to golang-nuts

Hello All

I have C++ 11 source files which I need to convert to Go language code

Is there any converter tool for this.

I google quite a few tools, none seems powerful and complete enough to do the job.for me.

Please help

Thanks

Abhishek

Ian Lance Taylor

unread,

Jan 2, 2019, 11:03:53 PM1/2/19

to aureal...@gmail.com, golang-nuts

On Wed, Jan 2, 2019 at 7:37 PM <aureal...@gmail.com> wrote:
>
> I have C++ 11 source files which I need to convert to Go language code
> Is there any converter tool for this.
> I google quite a few tools, none seems powerful and complete enough to do the job.for me.

C++ 11 is a much more complex language than Go. I think that if you
want to support all the features of C++11 this would essentially
require writing a C++11 compiler that generates Go code as its output.
I don't know of any such tool.

Ian

jake...@gmail.com

unread,

Jan 3, 2019, 12:11:06 PM1/3/19

to golang-nuts

There are so, so many ways to go about porting functionality from one language to another. I hope you have seriously considered why you want to make such a port. The answer to that will likely, in part, drive your strategy. In addition the nature and size of the code base, and your timeline, will effect the strategy used.

I would note that any tool that ports from C++, or even C, to Go is going to produce ugly, unmaintainable, and non-idiomatic code, at best. Turning that into real Go code would still be a major project. There is a great video about the process that the go team used to convert the compiler from C to Go, but I can not find it now.

Have you considered rewriting from scratch? That can often be less painful that one might think, if you already have a really good suite of "functional level" tests that you can use to ensure functional continuity.

Another strategy that comes to mind is to use cgo to do the rewrite one component of library at a time. This could be done one of two ways. Either keep the program (or library, or whatever it is,) as a C++ app, and call into your converted go code. Or, conversely, write a go program that calls into C++ for unconverted functionally.

Of course, with no real information about what you have, or what you are trying to achieve, you can only get general advice.

Good Luck.

Andy Balholm

unread,

Jan 3, 2019, 12:34:29 PM1/3/19

to aureal...@gmail.com, golang-nuts, Ian Lance Taylor

I’ve been working on a tool (called leaven) to convert LLVM IR (intermediate representation) to Go. So you can compile C to LLVM with clang, and then convert the result to Go. It’s actually pretty easy, because LLVM instructions are such simple operations. But it’s not very readable; given this:

int strcmp(const char *l, const char *r)

{

for (; *l==*r && *l; l++, r++);

return *(unsigned char *)l - *(unsigned char *)r;

}

It produces this:

func strcmp(v0 *byte, v1 *byte) int32 {

var v10, v11, v12, v13 *byte

var v5, v6, v7, v16, v17, v18 bool

var v3, v4, v14, v15, v21, v22 byte

var v23, v24, v25 int32

_, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _, _ = v3, v4, v5, v6, v7, v10, v11, v12, v13, v14, v15, v16, v17, v18, v21, v22, v23, v24, v25

v3 = *v0

v4 = *v1

v5 = v3 != v4

v6 = v3 == 0

v7 = v6 || v5

if v7 {

v21, v22 = v3, v4

goto block20

} else {

goto block8

}

block8:

v10, v11 = v1, v0

goto block9

block9:

v12 = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(v11)) + 1*unsafe.Sizeof(*(*byte)(nil))))

v13 = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(v10)) + 1*unsafe.Sizeof(*(*byte)(nil))))

v14 = *v12

v15 = *v13

v16 = v14 != v15

v17 = v14 == 0

v18 = v17 || v16

if v18 {

goto block19

} else {

v10, v11 = v13, v12

goto block9

}

block19:

v21, v22 = v14, v15

goto block20

block20:

v23 = int32(uint32(v21))

v24 = int32(uint32(v22))

v25 = v23 - v24

return v25

}

But it works!

I’ve never tried it with a C++ program, but once it’s compiled down to LLVM, there shouldn’t be much difference.

Whether it is anything like what you are looking for depends on your goals for the translation. (Though it’s almost certainly not complete enough yet.)

If your goal is to produce maintainable Go source that maintains the general appearance of the C++ original, you will need to build a custom tool that recognizes the idioms of your codebase and converts them to equivalent Go idioms, like Russ Cox did for translating the Go compiler. But keep in mind that he had the unfair advantage that he was translating C written by Go programmers.

I don’t think a general-purpose tool to convert C or C++ into maintainable Go is possible. As you handle more of the odd corner cases of C, the output looks more and more like machine code. Leaven skips that painful journey and produces asm.go (by analogy with asm.js) from day 1.

Leaven isn’t really ready for general use, but I decided to throw it on GitHub in response to your question. It’s at https://github.com/andybalholm/leaven, for whatever it’s worth. I haven’t gotten around to adding a README or a license, but I’m planning to use the MIT license.

Andy

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robert Engels

unread,

Jan 3, 2019, 1:01:30 PM1/3/19

to Andy Balholm, aureal...@gmail.com, golang-nuts, Ian Lance Taylor

I am pretty sure the other task is impossible, unless the generated code used CGo for all of its work.

It gets really difficult for multithreaded apps, pthread does not translate to Go routines, no TLS, etc.

I think correcting the converted Go would be more daunting that just rewriting it in Go to begin with.

Andy Balholm

unread,

Jan 3, 2019, 1:12:12 PM1/3/19

to Robert Engels, aureal...@gmail.com, golang-nuts, Ian Lance Taylor

You don’t use CGo; you just translate all the libraries you depend on as well. :-P

But you’re definitely right about pthreads; that would be a nightmare.

Andy

Eric Raymond

unread,

Jan 3, 2019, 5:46:39 PM1/3/19

to golang-nuts

On Thursday, January 3, 2019 at 12:11:06 PM UTC-5, Jake Montgomery wrote:

I would note that any tool that ports from C++, or even C, to Go is going to produce ugly, unmaintainable, and non-idiomatic code, at best.

These are two different cases. I agree that graceful C++ to Go transpilation is effectively impossible.

On the other hand, I believe graceful, comment-preserving C to idiomatic-Go transpilation is almost possible. By 'almost' I mean that the tool would pass through a small enough percentage of untranslated residuals for corrections to be around a 5% job for a human expert.

I've had a lot of incentive to think about this because my concerns center around vast masses of C infrastructure code in critical network services like NTP, DNS, etc. The security and reliability consequences of unsafe code in that swamp are serious and it needs to be drained. Transpilation to golang is, I think, the first realistic hope we've had of doing that without a prohibitively high labor input.

By possible I do not mean easy. I've scoped the job and done a design sketch. I think my qualifications for writing such a transpiler are exceptionally good, but it would nevertheless take me a minimum of two years of hard work to get there. I have put put some feelers for funding; if I get to choose my next major project after NTPsec, this would be it.

Valentin Vidic

unread,

Jan 3, 2019, 6:14:36 PM1/3/19

to golang-nuts

On Thu, Jan 03, 2019 at 02:46:39PM -0800, Eric Raymond wrote:
> On the other hand, I believe graceful, comment-preserving C to idiomatic-Go
> transpilation is almost possible. By 'almost' I mean that the tool would
> pass through a small enough percentage of untranslated residuals for
> corrections to be around a 5% job for a human expert.
>
> I've had a lot of incentive to think about this because my concerns center
> around vast masses of C infrastructure code in critical network services
> like NTP, DNS, etc. The security and reliability consequences of unsafe
> code in that swamp are serious and it needs to be drained. Transpilation
> to golang is, I think, the first realistic hope we've had of doing that
> without a prohibitively high labor input.
>
> By possible I do not mean easy. I've scoped the job and done a design
> sketch. I think my qualifications for writing such a transpiler are
> exceptionally good, but it would nevertheless take me a minimum of two
> years of hard work to get there. I have put put some feelers for
> funding; if I get to choose my next major project after NTPsec, this would
> be it.

Golang compiler was converted from C to Go in some version, but I don't know
if the tool used there is available somewhere.

--
Valentin

Andy Balholm

unread,

Jan 3, 2019, 6:35:07 PM1/3/19

to Valentin Vidic, golang-nuts

It’s at https://github.com/rsc/c2go. It might be a good starting place, but there is one significant difference in approach from what Eric is proposing. Russ incorporated all of the manual cleanup into the tool (or config files) as special cases, rather than leaving it as manual cleanup.

Andy

Ian Denhardt

unread,

Jan 3, 2019, 8:17:02 PM1/3/19

to Andy Balholm, Valentin Vidic, golang-nuts

Quoting Andy Balholm (2019-01-03 18:34:35)
> It's at [1]https://github.com/rsc/c2go. It might be a good starting

> place, but there is one significant difference in approach from what
> Eric is proposing. Russ incorporated all of the manual cleanup into the
> tool (or config files) as special cases, rather than leaving it as
> manual cleanup.

Right, the thing that made the project tractable at all was that it was
purpose-built for the Go compiler, not intended to be a general purpose
tool. As such, it only had to deal with C as used in the Go compiler,
not all of C.

This approach would also likely be tractable for a specific C++
codebase, but I think a general purpose tool is a lost cause.

-Ian

K Davidson

unread,

Jan 3, 2019, 10:50:48 PM1/3/19

to golang-nuts

I read somewhere that you can do some of the needed work using swig++, but like others have said, I don't think it would produce perfectly ported idiomatic code out of the box...

-kdd

jake...@gmail.com

unread,

Jan 4, 2019, 11:52:28 AM1/4/19

to golang-nuts

On Thursday, January 3, 2019 at 5:46:39 PM UTC-5, Eric Raymond wrote:

On Thursday, January 3, 2019 at 12:11:06 PM UTC-5, Jake Montgomery wrote:
I would note that any tool that ports from C++, or even C, to Go is going to produce ugly, unmaintainable, and non-idiomatic code, at best.

These are two different cases. I agree that graceful C++ to Go transpilation is effectively impossible.

On the other hand, I believe graceful, comment-preserving C to idiomatic-Go transpilation is almost possible.

I might believe that it could be comment-preserving and possible graceful, and pretty much human readable. But I seriously doubt that it could ever produce anything I would call idiomatic-Go. Perhaps we have different definitions of that term. Aside from the line by line form, I also mean "something that an experienced go programmer might write." Idomatic-Go by my definition, would include the use of things like interfaces and methods. It uses goroutines and channels where appropriate. That is what you want if you want "real" go code that is usable and maintainable.

"Transpilation" is really just the first step. It can get you to a working code base, but turning it into truly idiomatic Go, that people want to deal with, is work that needs to be done by humans. IIUC, that's what the go team did with the compiler. Getting a Go version that was 100% functionality-faithful to the original C was crucial. But it was only the first step.

Ian Lance Taylor

unread,

Jan 4, 2019, 1:16:53 PM1/4/19

to Jake Montgomery, golang-nuts

To be honest, the second step, making the compiler (and linker)
idiomatic Go, is still in progress.

Ian

Eric S. Raymond

unread,

Jan 4, 2019, 2:06:01 PM1/4/19

to jake...@gmail.com, golang-nuts

jake...@gmail.com <jake...@gmail.com>:

> On Thursday, January 3, 2019 at 5:46:39 PM UTC-5, Eric Raymond wrote:
> >
> > On Thursday, January 3, 2019 at 12:11:06 PM UTC-5, Jake Montgomery wrote:
> >>
> >> I would note that any tool that ports from C++, or even C, to Go is going
> >> to produce ugly, unmaintainable, and non-idiomatic code, at best.
> >>
> >
> > These are two different cases. I agree that graceful C++ to Go
> > transpilation is effectively impossible.
> >
> > On the other hand, I believe graceful, comment-preserving C to
> > idiomatic-Go transpilation is almost possible.
> >
>
> I might believe that it could be comment-preserving and possible graceful,
> and pretty much human readable. But I seriously doubt that it could ever
> produce anything I would call idiomatic-Go. Perhaps we have different
> definitions of that term.

Yes, we do.

> Aside from the line by line form, I also mean
> "something that an experienced go programmer might write." Idomatic-Go by
> my definition, would include the use of things like interfaces and methods.
> It uses goroutines and channels where appropriate. That is what you want if
> you want "real" go code that is usable and maintainable.

Mine is less ambitious than yours. I was implying something more like
"What a Go programmer would write if he were confined to the subset of Go
features rougly equivalent to C, plus real strings and garbage correction."
And replacing pointer walks with iteration - all that sort of thing.

Possibly a better description that "idiomatic" would be "line-by-line
translation that a Go programmer could read without wanting to barf."
And perhaps I should have said "maintainable" code rather than "idiomatic",
because my real intent was to contrast against the kind of horrifying
pseudo-assembler-like splooge that leaven and c2go generate. But I did mean
idiomatic, if in a relatively weak sense.

If this seems too weak to be interesting, remember my strategic objective.
I want to get critical infrastructure out of a memory- and type-unsafe
language that is difficult to verify in into a (more) memory- and
type-safe one in order to armor it against CVEs, basically. NTP would
be my first target but by no means my last.

Go is not perfectly memory-safe, alas, but is the first plausible
target for such a mechanized jump out of C-land that I've seen.

Perfectly idiomatic Go isn't required for this objective, just
maintainable Go that would tend to evolve into more strongly idiomatic Go as
more human hands touch it after translation.

So you understand my thinking better, here's a design sketch:

Perry Metzger, one of a relatively small number of people I consider
better qualified than me to do the job, is working on a comment and
whitespace-preserving C source-code walker that will emit
S-expressions describing an AST with some semantic annotation (types,
variable scopes, that sort of thing).

My plan is to bolt this to a rewrite engine, probably written in Lisp,
and a growing database of rewrite rules that transform the AST from
C-like to Go-like. It looks like the core language mapping won't
actually be very difficult; I expect most of the production rules will
actually be about mapping C library calls to Go library calls. Doing
a reasonably complete job of *that* is going to be a hairball, it's most
of why I'm estimating two years.

Third stage: AST goes to prettyprinter and you get Go...almost.

A crucial part of the design is that the rewrite engine will pass
through, marked "this is still C", AST spans that it can't map.
I don't expect the percentage of such spans to go entirely to zero;
the goal is, rather, to get those residuals small enough that even on
large codebases one human developer can finish off the grotty spots.

Another crucial part is that rewrite-rule development cam be
crowdsourced, and people can build their own rulebases for their
own C APIs.

I'll declare victory when I can lift the NTPsec codebase out of C with
this thing.

But this is just a plan in my head right now. I won't have time to do it
until I can leave NTPsec in a stable, reasonably finished state. And then
I'lll have to scare up funding.
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

My work is funded by the Internet Civil Engineering Institute: https://icei.org
Please visit their site and donate: the civilization you save might be your own.

Eric S. Raymond

unread,

Jan 4, 2019, 2:07:17 PM1/4/19

to Ian Lance Taylor, Jake Montgomery, golang-nuts

Ian Lance Taylor <ia...@golang.org>:

> To be honest, the second step, making the compiler (and linker)
> idiomatic Go, is still in progress.

I'm not even a bit surprised to hear that. :-)

robert engels

unread,

Jan 4, 2019, 2:41:19 PM1/4/19

to e...@thyrsus.com, jake...@gmail.com, golang-nuts

Isn’t an easier and better source of resource utilization just to farm out each program under consideration to “the crowd” and say, rewrite dnsd in Go.

For security verification purposes, you’d be going through each line of the converted program anyway.

I still think it would be a nearly impossible task given the C code in the wild - outside of threading, the common usage of ‘unions’ - there is no way I know of to map these to a simple Go struct, or even several - you need something like protobufs. So even if you could convert, you’d end up with code that is probably harder to maintain/verify than the original C code.

If it is just memory safety you require, you can already do this with C in many ways - https://www.doc.ic.ac.uk/~phjk/BoundsChecking.html

Eric Raymond

unread,

Jan 5, 2019, 10:40:48 AM1/5/19

to golang-nuts

On Friday, January 4, 2019 at 2:41:19 PM UTC-5, robert engels wrote:

I still think it would be a nearly impossible task given the C code in the wild - outside of threading, the common usage of ‘unions’ - there is no way I know of to map these to a simple Go struct, or even several - you need something like protobufs. So even if you could convert, you’d end up with code that is probably harder to maintain/verify than the original C code.

I've worked on a lot of this old infrastructure code myself, which is how I know that use of unions and threading simply isn't that common in it. You have to bear in mind that a lot of it was originally written in the last century, before the standards people won their war. So for practical purpose ANSI pthreads didn't yet exist, though in the NTP case (the one I've been elbow-deep in recently) some threading stuff was bolted on later to avoid stalling on DNS lookups.

As for security checking each line..yes, in an ideal world, but not necessary for the translation to be a worthy improvement, Sure, we know the C code is leaky, but given the verification properties of both languages, a line-by-line translation can't make you worse off unless there's some huge undetected hole in the Go libraries.

That's a bet I'm willing to make.

You haven't noticed what I think is a larger problem - preprocessor conditionals. That's a headache. Still puzzling on that one.

Andy Balholm

unread,

Jan 5, 2019, 11:38:12 AM1/5/19

to Eric Raymond, golang-nuts

Yes, the preprocessor…

The preprocessor is one of the biggest obstacles to readable C-to-Go translation. rsc/c2go largely ignores preprocessor directives other than #include—and it doesn’t include a translation of the headers at the top of every output file. But most C programs are a lot more dependent on the preprocessor than gc was. So every other C-to-Go tool I’ve seen works from preprocessed source. So they tend to dump a translation of most of /usr/include at the top of the file. (Leaven doesn’t, though, because clang optimizes most of that stuff back out.) And some functions are halfway to machine code by the time all the macros are expanded.

A good translation from C to Go needs to look as much like the original, un-preprocessed C source as possible. But a powerful translation tool will probably need to preprocess, parse, and typecheck the code, while keeping track of the source location that everything corresponds to—and then use that information to guide the translation.

Andy

Abhishek Tiwari

unread,

Jan 5, 2019, 1:12:06 PM1/5/19

to jake...@gmail.com, golang-nuts

Hi Jake & Friends,

Thank you so much for awesome response and great help. I am going through all replies in detail one by one.

Actually, I am working on solving a job assignment : 8-10 C++ files need to be converted to Go.

As far as the technical details of the code is concerned, seems its a little complex C++ 11 code.

I feel rewriting from scratch is good way to go about it but I am very new to Go Syntax , so that is the difficult part.

Hence I can not myself describe what that code is actually doing, at the moment :-)

Will come back soon!

Best Regards

Abhishek

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/aPHLrfwQh3A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

Michael Jones

unread,

Jan 5, 2019, 3:08:51 PM1/5/19

to Abhishek Tiwari, golang-nuts, jake...@gmail.com

A rewrite will be better. Really.

You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Michael T. Jones
michae...@gmail.com

Drew Derbyshire

unread,

Jan 5, 2019, 4:37:45 PM1/5/19

to golang-nuts

I am reminded of someone once asked how to go from Burlington MA to Waltham, when what he really wanted to do was go from Burlington to Logan Airport. (For the uninformed, via Waltham is not generally how most would do it.)

This conversation is thoughtfully covering all angles of the C++ to Go problem, but I may suggest one steps back and defines the ORIGINAL problem being solved and the requirements of the solution. (Not the approach, and certainly not the design, but the basic requirements.)

What is the C++ doing which is required, and why does it have to re-implemented in Go?

-ahd-

David Collier-Brown

unread,

Jan 5, 2019, 6:45:53 PM1/5/19

to golang-nuts

Mr. Tiwari:

Major ports are rather like major rewrites, in that if you think of the program as a tree, the parts that stay the same are the trunk and branches, and the parts that change are the leaves.

If you draw a picture of the old program on a whiteboard, you have an (initial) design for the new program. If the calls and parameters look like a reasonably good API, you can start by writing mock leaves, which will help you understand both the program and the new language. it should pass your integration tests, albeit with fake data.

Then start filling in the low-level details, starting with the last step and working backwards. Each time you break something, your tests will tell you, and you can fix it quickly.

In a previous life, this and some elegant tooling allowed my employer to make a profit off doing fixed-price ports.

Eric S. Raymond

unread,

Jan 6, 2019, 8:27:22 AM1/6/19

to Andy Balholm, golang-nuts

Andy Balholm <andyb...@gmail.com>:

> A good translation from C to Go needs to look as much like the original, un-preprocessed C source as possible. But a powerful translation tool will probably need to preprocess, parse, and typecheck the code, while keeping track of the source location that everything corresponds to—and then use that information to guide the translation.

Perry and I have two different target languages in mind. Perry's
target language is basically a repaired C - type-safe, almost
upward-compatible. He and I have the same strategic goal - draining
the swamp of old, leaky C code - we're just betting on slightly
different ways to do it. We intend to cooperate as much as possible.

But for both of them our intended strategy is *not* to preprocess.
Instead, we're willing to accept that some abuses of the preprocessor
need to be punted to a human, but treat other uses as part of the
parse tree.

For example if you have a set of #defines each of which expands to a
syntactically well-formed C expression. it seems clear to us that the
right thing is to recognize this and map them to const and func
declrations, checking for and throwing an error on capture of any
identifier that isn't a defined global.

We're willing to throw an error on that, or on syntactically
ill-formed macro expansion because we're not aiming for a completely
perfect mechanized trannlation. We know this is impossible in the
general case. Instead, our goal is just to get punt-to-a-human
frequency low enough that very large translations are practical with a
relatively small amount of human effort.

We don't mind requiring users to make point changes to C codebases to
get them through the translator when the code was horrible in C in
the first place.

As for unions, we'll do the stupidest possible thing that can work -
translate them to structs and throw a warning. If the union was just
a space optimization, that's it. If not, then a human must intervene.

(Both projects are contingent on funding we don't yet have, alas.
It's not like we need to buy a cyclotron or anything but we do need to
be able to eat and pay rent for a couple years each.)

It would sure help if Go had sum types. Has there been any discussion
of adding these?

andrey mirtchovski

unread,

Jan 6, 2019, 10:55:26 AM1/6/19

to Eric Raymond, Andy Balholm, golang-nuts

> It would sure help if Go had sum types. Has there been any discussion
> of adding these?

one of many, but has links to others: https://github.com/golang/go/issues/19412

Eric S. Raymond

unread,

Jan 6, 2019, 12:04:24 PM1/6/19

to andrey mirtchovski, Andy Balholm, golang-nuts

andrey mirtchovski <mirtc...@gmail.com>:

> > It would sure help if Go had sum types. Has there been any discussion
> > of adding these?
>
> one of many, but has links to others: https://github.com/golang/go/issues/19412

Thanks. My experience report on the reposurgeon port will point at yet another
use case for sum types that I think is significant - enabling compile-time
checking for limited polymorphism in an array of events.

Jan Mercl

unread,

Jan 6, 2019, 3:23:17 PM1/6/19

to e...@thyrsus.com, Andy Balholm, golang-nuts

On Sun, Jan 6, 2019 at 2:27 PM Eric S. Raymond <e...@thyrsus.com> wrote:

> But for both of them our intended strategy is *not* to preprocess.

> Instead, we're willing to accept that some abuses of the preprocessor
> need to be punted to a human, but treat other uses as part of the
> parse tree.

I'm afraid that a lot, if not most of existing C code, even when simple, clean and preprocessor-abuse-free, cannot be parsed before preprocessing.

--

-j

Eric S. Raymond

unread,

Jan 6, 2019, 4:38:30 PM1/6/19

to Jan Mercl, Andy Balholm, golang-nuts

Jan Mercl <0xj...@gmail.com>:

> I'm afraid that a lot, if not most of existing C code, even when simple,
> clean and preprocessor-abuse-free, cannot be parsed before preprocessing.

Having studied the problem, the most serious blocker is an ambiguity between
typedefs and variable names in some contexts. And there exists a good
strategy for resolving those.

Jan Mercl

unread,

Jan 6, 2019, 4:43:31 PM1/6/19

to e...@thyrsus.com, Andy Balholm, golang-nuts

I would like to learn more about that strategy, if possible.

On Sun, Jan 6, 2019, 22:38 Eric S. Raymond <e...@thyrsus.com> wrote:

Having studied the problem, the most serious blocker is an ambiguity between
typedefs and variable names in some contexts. And there exists a good
strategy for resolving those.
--

--

-j

Eric S. Raymond

unread,

Jan 6, 2019, 5:01:39 PM1/6/19

to Jan Mercl, Andy Balholm, golang-nuts

Jan Mercl <0xj...@gmail.com>:

> I would like to learn more about that strategy, if possible.

Digging....paper by two French guys on how astoundingly perverse parsing C
can get...Perry Metzger pointed me at it...

Ah, here it is:

A simple, possibly correct LR parser for C11

http://gallium.inria.fr/~fpottier/publis/jourdan-fpottier-2016.pdf

Perry is adopting their parsing strategy for his codewalker.

Jan Mercl

unread,

Jan 7, 2019, 4:56:15 AM1/7/19

to e...@thyrsus.com, Andy Balholm, golang-nuts

On Sun, Jan 6, 2019 at 11:01 PM Eric S. Raymond <e...@thyrsus.com> wrote:

> I would like to learn more about that strategy, if possible.

> A simple, possibly correct LR parser for C11
>
> http://gallium.inria.fr/~fpottier/publis/jourdan-fpottier-2016.pdf

Thanks. I probably caused a small misunderstanding. By strategy I meant solving the previously mentioned problem of parsing C without preprocessing. But the C parsing ambiguity of typedef'ed names is related, right.

Anyway, I know that paper (thanks to you actually) and there's a Go C parser that implements all those tricks available at http://modernc.org/cc/v2

The testdata are here at https://gitlab.com/cznic/cc/tree/master/v2/testdata/jhjourdan

--

-j

Bakul Shah

unread,

Jan 7, 2019, 8:54:33 AM1/7/19

to e...@thyrsus.com, Jan Mercl, Andy Balholm, golang-nuts

On Sun, 06 Jan 2019 17:01:20 -0500 "Eric S. Raymond" <e...@thyrsus.com> wrote:
>
> A simple, possibly correct LR parser for C11
>
> http://gallium.inria.fr/~fpottier/publis/jourdan-fpottier-2016.pdf

This paper says its lexer+parser is about 1000 lines, which
doesn't include the preprocessor. For comparison, subc (a
compiler for a self compiling subset of C) is under 5K lines,
including a preprocessor. It is a recursive descent compiler
which may be easier to grok. Best of all, there is an
associated book called "practical compiler construction" (for
a slightly older compiler). There is even a Go version of
subc! subc is completely public domain.
https://www.t3x.org/subc/index.html
May be worth checking subc out (though it will have to be
extended to cover missing features such as goto, typedef...)

The other suggestion I have is to figure out how to map
individual C constructs to Go and not try mapping idiomatic C
code to idiomatic Go code. The former is a hard enough
problem as it is. Then may be you can convert robotically
translated code to idiomatic code (sort of what the Go folks
must've done/are doing). This way you can internalize most of
the translation scheme before writing a single line of
production code. And it is much easier to change the xlation
scheme while you have just a paper design! [I once spent a
bunch of tine on writing a Beta language to C translator and
followed this path]

K Davidson

unread,

Jan 7, 2019, 11:57:03 AM1/7/19

to golan...@googlegroups.com

My original comment was in regaurd to c++11, but seeing as the discussion has drifted towards c, you may want to take a look at https://github.com/xlab/c-for-go, it is based off of its based off ofhttps://github.com/cznic/cc, and has been used to create go bindings for portaudio, libvpx, and a few others, including android ndk (android-go). It uses 'hints' via yaml file to ease indirect translations, and me be adaptable to your needs.

Dan Kortschak

unread,

Jan 7, 2019, 5:26:10 PM1/7/19

to K Davidson, golan...@googlegroups.com

Note there that cznic/cc has moved to gitlab, at https://modernc.org/cc

Nigel Tao

unread,

Jan 9, 2019, 1:46:15 AM1/9/19

to Eric Raymond, golang-nuts

Spun out of the "C++ 11 to Golang convertor" thread...

On Mon, Jan 7, 2019 at 12:27 AM Eric S. Raymond <e...@thyrsus.com> wrote:
> Perry and I have two different target languages in mind. Perry's
> target language is basically a repaired C - type-safe, almost
> upward-compatible. He and I have the same strategic goal - draining
> the swamp of old, leaky C code - we're just betting on slightly
> different ways to do it. We intend to cooperate as much as possible.

This is drifting quite off-topic, but I would expect transpiling C to
Go would result in something slower than the original, especially if
it involves a lot of low-level, slice-of-bytes processing. Faster is
obviously easier to sell than slower.

Go is memory-safe, but that entails e.g. runtime bounds checking. Any
individual bounds check might be cheap, measured in nanos, but for
e.g. image decoding, a per pixel bounds check multiplied by a
megapixel image (a 'low-res' photo, by today's standards) means that
nanos become millis.

Reasonable people can disagree, but I favor rewriting over
transpilation, for draining that swamp.

https://github.com/google/wuffs is a new, memory-safe programming
language and a written-from-scratch library of various file formats.
Runtime performance is a special concern: there are no implicit
(runtime) bounds checks. Unlike Go, bounds checks have to be explicit,
which means that programs can eliminate them from the object code by,
well, eliminating them from the source code.

Here are some work-in-progress microbenchmarks (the output of "wuffs
bench" piped to "benchstat") of Wuffs' GIF decoder versus my Debian
system's stock giflib 5.1.4, a library that you might be familiar with
:-). The giflib library is called "mimic" here, since the correctness
tests check that Wuffs mimics (produces the same output as) giflib.

The short story is that Wuffs is around 2x to 6x faster than giflib,
and giflib is, say, 1.5x faster than Go's image/gif, a package that
I'm familiar with :-).

----
wuffs_gif_decode_1k_bw/clang 245MB/s ± 1%
wuffs_gif_decode_1k_color/clang 140MB/s ± 1%
wuffs_gif_decode_10k_indexed/clang 183MB/s ± 1%
wuffs_gif_decode_100k_artificial/clang 530MB/s ± 1%
wuffs_gif_decode_100k_realistic/clang 212MB/s ± 0%
wuffs_gif_decode_1000k/clang 216MB/s ± 0%
wuffs_gif_decode_anim_screencap/clang 1.09GB/s ± 0%

wuffs_gif_decode_1k_bw/gcc 265MB/s ± 1%
wuffs_gif_decode_1k_color/gcc 147MB/s ± 1%
wuffs_gif_decode_10k_indexed/gcc 187MB/s ± 1%
wuffs_gif_decode_100k_artificial/gcc 517MB/s ± 1%
wuffs_gif_decode_100k_realistic/gcc 213MB/s ± 1%
wuffs_gif_decode_1000k/gcc 217MB/s ± 1%
wuffs_gif_decode_anim_screencap/gcc 1.08GB/s ± 1%

mimic_gif_decode_1k_bw 141MB/s ± 1%
mimic_gif_decode_1k_color 76.2MB/s ± 1%
mimic_gif_decode_10k_indexed 94.8MB/s ± 0%
mimic_gif_decode_100k_artificial 155MB/s ± 0%
mimic_gif_decode_100k_realistic 95.9MB/s ± 0%
mimic_gif_decode_1000k 98.1MB/s ± 0%
mimic_gif_decode_anim_screencap 176MB/s ± 0%
----

To reproduce these numbers:

git clone https://github.com/google/wuffs.git
cd wuffs
gcc -O3 -std=c99 -Wall -Werror test/c/std/gif.c -DWUFFS_MIMIC -lgif
./a.out -bench

Optionally pipe that final command to "benchstat /dev/stdin",
installed by "go get golang.org/x/perf/cmd/benchstat".

Wuffs' documentation is lagging the implementation by many months. Sorry.

Eric S. Raymond

unread,

Jan 9, 2019, 1:08:00 PM1/9/19

to Nigel Tao, golang-nuts

Nigel Tao <nige...@golang.org>:

> Spun out of the "C++ 11 to Golang convertor" thread...
>
>
> On Mon, Jan 7, 2019 at 12:27 AM Eric S. Raymond <e...@thyrsus.com> wrote:
> > Perry and I have two different target languages in mind. Perry's
> > target language is basically a repaired C - type-safe, almost
> > upward-compatible. He and I have the same strategic goal - draining
> > the swamp of old, leaky C code - we're just betting on slightly
> > different ways to do it. We intend to cooperate as much as possible.
>
> This is drifting quite off-topic, but I would expect transpiling C to
> Go would result in something slower than the original, especially if
> it involves a lot of low-level, slice-of-bytes processing. Faster is
> obviously easier to sell than slower.

I agree. The class of old C program I am interested in is, however,
not generally limited by CPU but by network and (less commonly) disk
stalls. Again bear in mind that my type examples are NTP and DNS service.
A lot of other legacy infrastructure code fits this pattern.

> Go is memory-safe, but that entails e.g. runtime bounds checking. Any
> individual bounds check might be cheap, measured in nanos, but for
> e.g. image decoding, a per pixel bounds check multiplied by a
> megapixel image (a 'low-res' photo, by today's standards) means that
> nanos become millis.

Again agreed. Transpilation to Go is only appropriate for cases where
nanos becoming millis doesn't matter. Typically, old infrastructure code
was designed for platforms now enough Dennard-scaling cycles in the past
that Go on current hardware will run faster than C did originally.

> Reasonable people can disagree, but I favor rewriting over
> transpilation, for draining that swamp.

The problem is that in general nobody *does* rewrite old
infrastructure code. It tends to work just well enough to fester in
place for a long time, and then you get shocks like the Heartbleed
bug. It is my aim to head off that sort of thing in the future.

> https://github.com/google/wuffs is a new, memory-safe programming
> language and a written-from-scratch library of various file formats.
> Runtime performance is a special concern: there are no implicit
> (runtime) bounds checks. Unlike Go, bounds checks have to be explicit,
> which means that programs can eliminate them from the object code by,
> well, eliminating them from the source code.

That is an interesting and clever approach.

I think all three approaches should be in the toolkit. They fit different
use cases.

Thomas Bushnell, BSG

unread,

Jan 9, 2019, 1:55:31 PM1/9/19

to e...@thyrsus.com, Nigel Tao, golang-nuts

On Wed, Jan 9, 2019 at 10:07 AM Eric S. Raymond <e...@thyrsus.com> wrote:

> Reasonable people can disagree, but I favor rewriting over
> transpilation, for draining that swamp.

The problem is that in general nobody *does* rewrite old
infrastructure code. It tends to work just well enough to fester in
place for a long time, and then you get shocks like the Heartbleed
bug. It is my aim to head off that sort of thing in the future.

I'm curious about why transpilation would have significantly mitigated the Heartbleed bug.

It sounds as though the idea is a mass transpilation of old libraries, but that's going to require more than just a library. Suppose openssl had been transpiled into Go well in advance of the attack. This isn't going to stop old C code from linking against the old library, and it's all going to stay there. So you'd need to transpile all the programs which use the library too, it seems to me, and release them into all the distros, and have them all agree. That seems a rather high hurdle. You could, of course, just release a transpiled version of a bunch of libraries, but now my guess is there's two things to maintain instead of one, and nothing ever makes the old one go away.

But suppose that hurdle is fixed, and think about the specific Heartbleed bug. The bug involved a mistake in unmarshalling a particular type of SSL packet, trusting a length field from an inbound packet; the fix was to see the length and discard the incoming packet if it was bogusly long.

The specific problem was that if the length field was bogusly big, the code would do the echo reply with the next N bytes after the incoming packet in memory, which could include whatever was in memory after the packet.

The transpiler is going to be amazing clever to even get this code right. It's code which bounces around *char manually unmarshalling a piece of structured data. Consider this: hbtype = *p++.

What's that going to be? Is it going to be turned into pointer arithmetic using the unsafe package? Now you'll just get the same bug again. Obviously new Go code would stick the packet in a []byte, and then use the facilities of the binary package to read the bits, and the attempt to read past the end of the slice will fail. But how is a transpiler going to automatically take the C code of openssl and do that?

Suppose it has a way, however. Now you have Go code which will have a bounds fault instead of a data leak. That's better, I suppose - the resulting bug is now "the server crashes" instead of "the server maybe leaks a key". This is an improvement, but a packet-of-death across a widely used library this puts the world in a not dissimilar position in terms of the level of panic and rapid response everybody needs.

So I'm not quite seeing it. It seems like a great idea from the outside ("hey, we could turn all these programs into memory-safe Go programs, automatically, what a win!") but in practice, I'm not sure I see such a transpiler actually working in a way that would achieve the result - and the end is to preserve a profound denial of service attack anyway.

Thomas

--

memegen delenda est

Jesper Louis Andersen

unread,

Jan 10, 2019, 11:33:14 AM1/10/19

to e...@thyrsus.com, Nigel Tao, golang-nuts

On Wed, Jan 9, 2019 at 7:07 PM Eric S. Raymond <e...@thyrsus.com> wrote:

I agree. The class of old C program I am interested in is, however,
not generally limited by CPU but by network and (less commonly) disk
stalls. Again bear in mind that my type examples are NTP and DNS service.
A lot of other legacy infrastructure code fits this pattern.

Can I get a -vv flag on this one?

That the PLL of NTP is network-latency sensitive, I understand. But a DNS service, to me, should never ever touch the disk, or you are doing something really wrong. I'm more inclined to guess that these services are bound by memory bandwidth and latency more than CPU execution speed.

So I'm curious what insight you might have on this subject?

Jesper Louis Andersen

unread,

Jan 10, 2019, 11:50:23 AM1/10/19

to Thomas Bushnell, BSG, e...@thyrsus.com, Nigel Tao, golang-nuts

On Wed, Jan 9, 2019 at 7:55 PM 'Thomas Bushnell, BSG' via golang-nuts <golan...@googlegroups.com> wrote:

I'm curious about why transpilation would have significantly mitigated the Heartbleed bug.

Heartbleed is a bug which relies on two things:

- Failure to do proper bounds checking

- Allocation of a buffer which is not initialized to some zero-value, and which straddles memory it shouldn't.

Many programming languages are constructed such that they address both of the above problems at the semantics level, and thus they avoid the really dangerous part of the bug, which is the leak of information, downgrading the bug to a denial of service attack, or even also mitigating that part of the bug. Array access is checked against the arrays bounds, and buffer allocated memory is properly 0-initialized before use.

Compilation from one language to another might have the side-effect of changing the semantics of the program because of the above observations. Thus making a previously unsafe program safe. In principle we want to be clever: augment the program with new safety semantics, but without changing the meaning of the rest of the program in any way.

Given there is a very large body of C code out there, live, we want to take an approach like the above:

- A rewrite, into say Rust because it is currently popular, runs the risk of re-introducing faults in the programs which were removed through corrections over the years.

- We rewrite too much, where we should reuse.

- C is a remarkably stable programming language in that most older C code still runs in this day and age. More or less, there are some caveats, which the compilation idea ought to address. Many modern languages have a tremendous amount of bit-rot in the sense even 2-3 year old programs now utter fail to run.

robert engels

unread,

Jan 10, 2019, 12:23:21 PM1/10/19

to Jesper Louis Andersen, Thomas Bushnell, BSG, e...@thyrsus.com, Nigel Tao, golang-nuts

Again, what is wrong with the bounds checking/memory protection library/technique for C I referred you to? Even a decrease in performance will probably still be on par or better than the equivalent Go program.

Much simpler and efficient.

Thomas Bushnell, BSG

unread,

Jan 10, 2019, 12:26:49 PM1/10/19

to Jesper Louis Andersen, e...@thyrsus.com, Nigel Tao, golang-nuts

On Thu, Jan 10, 2019 at 8:50 AM Jesper Louis Andersen <jesper.lou...@gmail.com> wrote:

On Wed, Jan 9, 2019 at 7:55 PM 'Thomas Bushnell, BSG' via golang-nuts <golan...@googlegroups.com> wrote:

I'm curious about why transpilation would have significantly mitigated the Heartbleed bug.

Heartbleed is a bug which relies on two things:

- Failure to do proper bounds checking
- Allocation of a buffer which is not initialized to some zero-value, and which straddles memory it shouldn't.

Many programming languages are constructed such that they address both of the above problems at the semantics level, and thus they avoid the really dangerous part of the bug, which is the leak of information, downgrading the bug to a denial of service attack, or even also mitigating that part of the bug. Array access is checked against the arrays bounds, and buffer allocated memory is properly 0-initialized before use.

I'm not sure the second one here is right. Heartbleed does not depend on unitialized memory as far as I can tell. It works to copy whatever lies after the incoming request buffer back to the attacker. It happens that in the actual openssl code the thing it's copying is a reused buffer that might have stuff in it (IIRC), but that's not essential to the operation of the bug. If it were an exactly sized buffer the same shape of problem would occur.

I don't think you responded to my email very successfully.

You left unaddressed:

* How would this magical translation going to occur, given the actual code of openssl? The obvious human translation is to allocate a request buffer, and then use the encoding/binary package to pull values. The attempt to read indexes greater than the size of the buffer would fault. But I don't see a way to take code like openssl and automatically make it use encoding/binary. The only clear way I can see to do it automatically is to use unsafe.Pointer, which simply turns off all the bounds checking you wanted.

* Even if we did this, the bug only turns into a packet of death. A packet of death on this scale is of almost the same level of annoyance and chaos. (Witness this week's firestorm about an email packet of death on some Cisco something or other.)

Thomas

--

memegen delenda est

Jesper Louis Andersen

unread,

Jan 10, 2019, 1:21:23 PM1/10/19

to robert engels, Thomas Bushnell, BSG, Eric Raymond, Nigel Tao, golang-nuts

This must have been before I started reading this thread, but I know of the CCured project by George Necula et.al, which is a C-to-C translator:

https://web.eecs.umich.edu/~weimerw/p/p477-necula.pdf

--

J.

Jesper Louis Andersen

unread,

Jan 10, 2019, 1:35:33 PM1/10/19

to Thomas Bushnell, BSG, Eric Raymond, Nigel Tao, golang-nuts

On Thu, Jan 10, 2019 at 6:26 PM Thomas Bushnell, BSG <tbus...@google.com> wrote:

I'm not sure the second one here is right. Heartbleed does not depend on unitialized memory as far as I can tell. It works to copy whatever lies after the incoming request buffer back to the attacker. It happens that in the actual openssl code the thing it's copying is a reused buffer that might have stuff in it (IIRC), but that's not essential to the operation of the bug. If it were an exactly sized buffer the same shape of problem would occur.

Well, if there is no way to reuse a buffer in the language, then things will work out. More or less any functional language will do.

You left unaddressed:
* How would this magical translation going to occur, given the actual code of openssl? The obvious human translation is to allocate a request buffer, and then use the encoding/binary package to pull values. The attempt to read indexes greater than the size of the buffer would fault. But I don't see a way to take code like openssl and automatically make it use encoding/binary. The only clear way I can see to do it automatically is to use unsafe.Pointer, which simply turns off all the bounds checking you wanted.

I think the problem is a good research question. I don't think we have a good solution at the moment. But there is a lot of value in pushing the research forward in the area. So the answer to this question is: "I don't know, but I do have ideas where I would start".

* Even if we did this, the bug only turns into a packet of death. A packet of death on this scale is of almost the same level of annoyance and chaos. (Witness this week's firestorm about an email packet of death on some Cisco something or other.)

I did address this. If each request is bounds checked and memory isolated, then a failure is just an exception of some kind and we handle this as we would any other exception. You could also just fork the process for each incoming request and obtain the same semantics.

Thomas Bushnell, BSG

unread,

Jan 10, 2019, 1:39:19 PM1/10/19

to Jesper Louis Andersen, Eric Raymond, Nigel Tao, golang-nuts

* Even if we did this, the bug only turns into a packet of death. A packet of death on this scale is of almost the same level of annoyance and chaos. (Witness this week's firestorm about an email packet of death on some Cisco something or other.)

I did address this. If each request is bounds checked and memory isolated, then a failure is just an exception of some kind and we handle this as we would any other exception. You could also just fork the process for each incoming request and obtain the same semantics.

The server crashes - that's how we handle "any other exception", as a rule.

That's a lot of work to convert "leaking session keys" into "crashes the server".

Especially since you turn "maybe leaks a session key on repeated tries" into "crashes the server immediately with a single packet". Maybe the result actually makes things worse. :)

I don't know what you mean by "just fork the process". First, if you're transpiling into Go, that's not a good strategy. Second, are you suggesting the transpiler would automatically rewrite the request handling loop to avoid the harm of crashes?

Thomas

--

memegen delenda est

robert engels

unread,

Jan 10, 2019, 1:39:30 PM1/10/19

to Jesper Louis Andersen, Thomas Bushnell, BSG, Eric Raymond, Nigel Tao, golang-nuts

It was actually a different but related thread about converting C to Go…. the link I sent was https://www.doc.ic.ac.uk/~phjk/BoundsChecking.html

Jesper Louis Andersen

unread,

Jan 10, 2019, 1:50:12 PM1/10/19

to Thomas Bushnell, BSG, Eric Raymond, Nigel Tao, golang-nuts

On Thu, Jan 10, 2019 at 7:39 PM Thomas Bushnell, BSG <tbus...@google.com> wrote:

The server crashes - that's how we handle "any other exception", as a rule.

I write Erlang for a living. We don't crash a server, ever, on a failure. Unless the failure is persistent :)

I don't know what you mean by "just fork the process". First, if you're transpiling into Go, that's not a good strategy. Second, are you suggesting the transpiler would automatically rewrite the request handling loop to avoid the harm of crashes?

I wasn't specifically thinking about Go here. In particular, Go doesn't have the properties I sketched out, so I'm not sure a C-to-Go compiler would solve the problem.

Personally, I think C-to-C translation in the style of CCured is the most likely successful path. But I still like the idea of embedding a C program into another language, thus forming a symbiotic relationship between the two.

--

J.

Thomas Bushnell, BSG

unread,

Jan 10, 2019, 3:00:19 PM1/10/19

to Jesper Louis Andersen, Eric Raymond, Nigel Tao, golang-nuts

On Thu, Jan 10, 2019 at 10:49 AM Jesper Louis Andersen <jesper.lou...@gmail.com> wrote:

On Thu, Jan 10, 2019 at 7:39 PM Thomas Bushnell, BSG <tbus...@google.com> wrote:

The server crashes - that's how we handle "any other exception", as a rule.

I write Erlang for a living. We don't crash a server, ever, on a failure. Unless the failure is persistent :)

Sorry, Eric was talking about Go, not just "something". There may be other choices that would solve the problem where Go would not.

I think the paper you linked is exciting, and actually suggests that the hard work which needs to be done will solve the problem without a change of language. This fits my intuition: the things necessary to take advantage of type safety can only be automatically used with the same kind of proof work you need to do to establish the code was fine to begin with.

Thomas

--

memegen delenda est

Jesper Louis Andersen

unread,

Jan 10, 2019, 3:20:21 PM1/10/19

to Thomas Bushnell, BSG, Eric Raymond, Nigel Tao, golang-nuts

On Thu, Jan 10, 2019 at 9:00 PM Thomas Bushnell, BSG <tbus...@google.com> wrote:

I think the paper you linked is exciting, and actually suggests that the hard work which needs to be done will solve the problem without a change of language. This fits my intuition: the things necessary to take advantage of type safety can only be automatically used with the same kind of proof work you need to do to establish the code was fine to begin with.

It is!

It is predated by another idea which is equally exciting but never ever really tested in the large, namely proof carrying code.

in a PCC scheme, a program is accompanied with a proof of its behavior according to some security policy. For instance that it is memory safe. The proof is sent as an oracle stream: whenever the proof checker is in doubt it consults the rule to apply from the oracle stream. This makes the proof small. A program is checked before it is run. In particular, this avoids the chain of trust by cryptographic signature. If the proof is correct, we can run the program. The onus on constructing the proof is on the writer of the program, or the code generator.

Whole classes of security bugs can be eliminated by updating the security policy.

A simpler solution was one I somewhat haphazardly tried to suggest Russ Cox on Twitter when he asked about solutions to the whole event-stream problem we saw on NPM of node.js fame. Let software be able to drop certain privileges after setup, in the style of OpenBSDs pledge(2) system call. If a module pledges itself to only ever use limited functionality, and we store this persistently, we solve many potential programming mistakes, and we make life much harder for malicious injection. This is like a poor-mans security policy, but it doesn't require the same attention to detail.

Another symbiosis solution I like is the idea that we should take old software and run it, but next to a "contract checker", which is a piece of software governing the potential unsafe software. Only if the checker accepts the input, will it be passed to the potentially unsafe program.

--

J.

Nigel Tao

unread,

Jan 11, 2019, 1:46:59 AM1/11/19

to robert engels, Jesper Louis Andersen, Thomas Bushnell, BSG, Eric Raymond, golang-nuts

On Fri, Jan 11, 2019 at 4:22 AM robert engels <ren...@ix.netcom.com> wrote:
> Again, what is wrong with the bounds checking/memory protection library/technique for C I referred you to? Even a decrease in performance will probably still be on par or better than the equivalent Go program.

Quoting from https://www.doc.ic.ac.uk/~phjk/BoundsChecking.html

"Matrix multiply (ikj, using array subscripting): Execution time:
slowdown of around 30 compared to unoptimised."

It's not clear what the comparison is to optmized code, but a 30x
slowdown is not on par with Go.

Nigel Tao

unread,

Jan 11, 2019, 3:51:22 AM1/11/19

to robert engels, Jesper Louis Andersen, Thomas Bushnell, BSG, Eric Raymond, golang-nuts

On Fri, Jan 11, 2019 at 5:46 PM Nigel Tao <nige...@golang.org> wrote:
> On Fri, Jan 11, 2019 at 4:22 AM robert engels <ren...@ix.netcom.com> wrote:
> > Again, what is wrong with the bounds checking/memory protection library/technique for C I referred you to? Even a decrease in performance will probably still be on par or better than the equivalent Go program.
>
> Quoting from https://www.doc.ic.ac.uk/~phjk/BoundsChecking.html
>
> "Matrix multiply (ikj, using array subscripting): Execution time:
> slowdown of around 30 compared to unoptimised."

That page also links to a "recent re-implementation" MIRO, dated 2007,
whose project report
(https://www.doc.ic.ac.uk/~awl03/projects/miro/MIRO.pdf), section 6.2,
says "gzip took considerably longer to compress when instrumented with
MIRO than when using Mudflap or when uninstrumented: on average about
400 and 2,000 times slower respectively" and for the benchcpplinux
suite, "our bounds-checker is much slower that Mudflap or unchecked
code. This matches the results of the gzip benchmark".

They also say that some of that could theoretically be clawed back by
(complicated) caching, but still, starting from 2000x slower is a long
way back.

Robert Engels

unread,

Jan 11, 2019, 9:05:03 AM1/11/19

to Nigel Tao, Jesper Louis Andersen, Thomas Bushnell, BSG, Eric Raymond, golang-nuts

Yes, but then you just got into those performance critical routines and hand code in the bounds checks and remove the automatic checking. Still a lot less work.

Eric S. Raymond

unread,

Jan 11, 2019, 12:33:27 PM1/11/19

to Thomas Bushnell, BSG, Nigel Tao, golang-nuts

Thomas Bushnell, BSG <tbus...@google.com>:

> Suppose it has a way, however. Now you have Go code which will have a
> bounds fault instead of a data leak. That's better, I suppose - the
> resulting bug is now "the server crashes" instead of "the server maybe
> leaks a key". This is an improvement, but a packet-of-death across a widely
> used library this puts the world in a not dissimilar position in terms of
> the level of panic and rapid response everybody needs.

The difference is trhat an overt bug will elicit a fast fix.

Eric S. Raymond

unread,

Jan 11, 2019, 12:42:33 PM1/11/19

to Jesper Louis Andersen, robert engels, Thomas Bushnell, BSG, Nigel Tao, golang-nuts

Jesper Louis Andersen <jesper.lou...@gmail.com>:

> This must have been before I started reading this thread, but I know of the
> CCured project by George Necula et.al, which is a C-to-C translator:
>
> https://web.eecs.umich.edu/~weimerw/p/p477-necula.pdf

That actually looks pretty interesting.

I may try testing on NTPsec. If it lives up to its billing, the case
for mass transpilation to Go gets a lot weaker.

It doesn't go to zero - I think the maintainability gains from C to Go
lifts are likely to be significant - but it would remove a lot of
the security/reliability urgency.

Eric S. Raymond

unread,

Jan 11, 2019, 12:48:52 PM1/11/19

to Jesper Louis Andersen, Nigel Tao, golang-nuts

Jesper Louis Andersen <jesper.lou...@gmail.com>:

I think I'm pretty much the same place you are - the usual bottleneck
will be network stalls. I included disk latency mainly because some network
services will have a database in back of them that's too large to be memory
resident.

Thomas Bushnell, BSG

unread,

Jan 11, 2019, 1:08:45 PM1/11/19

to e...@thyrsus.com, Nigel Tao, golang-nuts

On Fri, Jan 11, 2019 at 9:33 AM Eric S. Raymond <e...@thyrsus.com> wrote:

Thomas Bushnell, BSG <tbus...@google.com>:
> Suppose it has a way, however. Now you have Go code which will have a
> bounds fault instead of a data leak. That's better, I suppose - the
> resulting bug is now "the server crashes" instead of "the server maybe
> leaks a key". This is an improvement, but a packet-of-death across a widely
> used library this puts the world in a not dissimilar position in terms of
> the level of panic and rapid response everybody needs.

The difference is trhat an overt bug will elicit a fast fix.

Was the Heartbleed fix particularly delayed? It seemed to be to be all-hands-on-deck.

Also, this isn't part of your argument in the past; I would encourage you to make it explicitly, rather than treating it as a matter of "by transpiling we'll eliminate this category of security flaw". If the story is actually "we'll make the bugs more visible and people will panic sooner, resulting in a faster fix", that's a different argument, and I'd encourage making it explicitly instead of implicitly.

Thomas

--

memegen delenda est

Eric S. Raymond

unread,

Jan 11, 2019, 1:45:19 PM1/11/19

to Thomas Bushnell, BSG, Nigel Tao, golang-nuts

Thomas Bushnell, BSG <tbus...@google.com>:

> On Fri, Jan 11, 2019 at 9:33 AM Eric S. Raymond <e...@thyrsus.com> wrote:
>
> > Thomas Bushnell, BSG <tbus...@google.com>:
> > > Suppose it has a way, however. Now you have Go code which will have a
> > > bounds fault instead of a data leak. That's better, I suppose - the
> > > resulting bug is now "the server crashes" instead of "the server maybe
> > > leaks a key". This is an improvement, but a packet-of-death across a
> > widely
> > > used library this puts the world in a not dissimilar position in terms of
> > > the level of panic and rapid response everybody needs.
> >
> > The difference is trhat an overt bug will elicit a fast fix.
> >
>
> Was the Heartbleed fix particularly delayed? It seemed to be to be
> all-hands-on-deck.

No, but *noticing* it was delayed. Always easier to notice a crash bug
than an exploit with subtler consequences.

> Also, this isn't part of your argument in the past; I would encourage you
> to make it explicitly, rather than treating it as a matter of "by
> transpiling we'll eliminate this category of security flaw". If the story
> is actually "we'll make the bugs more visible and people will panic sooner,
> resulting in a faster fix", that's a different argument, and I'd encourage
> making it explicitly instead of implicitly.

Fair enough.

My general claim is that graceful transpilation to Go, if it can be
achived, will both eliminate significant classes of bugs *and* flush
others into the open. Both seem obvious consequences of (1) GC, (2)
improved type-chevking, and (3) runtime bounds-checking.

But maybe CCured is a better answer. I intend to investigate that.

David Chase

unread,

Jan 11, 2019, 4:11:02 PM1/11/19

to golang-nuts

I'm curious how much experience people have with hand-translation of one language into another.

What I find is that for not-too-different languages (e.g., C to Java, or C to Modula-3) I can process about 1000 lines per day.

K&R C to ANSI C goes a good deal more quickly.

C pointers translated into Java become a pair, the array that you are indexing into, and an index.

I used this strategy for gdtoa and it was good enough.

Unstructured and tricky-exit control flow translated into Java was a much more interesting change; for that I used a combination of break out of single iteration for loops, and sometimes booleans to indicate code that should be skipped (that was previously jumped over).

For Go you'd probably replace increment/decrement pointers with slices.

C++ would be harder because of templates, but it depends very much on how exciting the old code's use of templates is. The advantage of hand-translation is that you don't have to solve all the inputs that mighty theoretically arise, instead you need only consider the problem at hand, and if you can hack around the tricky bits in some ugly way, the rest tends to proceed rather smoothly. And obviously (I hope this is obvious), if your goal is to increase security, you will not do it all with unsafe.

I did once also work on an automated source-to-source checker for C and C++, and it turns out that you can get a tremendous amount of mileage out of recognizing idioms and special-casing them.

The one problem with the approach I describe -- dealing with the code that you've got, selecting idiom-specific translations -- is that if it fails, what it means is that you have even more motivation to translate the input code out of C, because it is most likely tricksy, squirrelly, and all the more untrustworthy because of that.

Jason E. Aten

unread,

Jan 15, 2019, 1:59:02 AM1/15/19

to golang-nuts

I came across this C to Go project. If you could revive the LLVM C output you could compile C++ to C with LLVM and then apply it.

https://github.com/elliotchance/c2go

It's rough/incomplete but it might give you something useful.

Eric S. Raymond

unread,

Jan 15, 2019, 6:43:55 AM1/15/19

to Jason E. Aten, golang-nuts

Jason E. Aten <j.e....@gmail.com>:

Have you actually looked it what it outputs?

If not, prepare to be horrified. Maintainability is an issue.

Lucio

unread,

Jan 15, 2019, 10:26:48 AM1/15/19

to golang-nuts

On Tuesday, 15 January 2019 13:43:55 UTC+2, Eric Raymond wrote:

Have you actually looked it what it outputs?

If not, prepare to be horrified. Maintainability is an issue.

Maybe it's a silly thought...

Back in the 1950s compilers needed to be small so as to be at all executable. We now have monstrously large programs that perform the most amazing feats of code translation.

But the language constructs that languages provide seem to still have their roots in those invented when a small memory footprint was essential, so perhaps the time has come to add much more complex idioms as constructs to future programming languages instead of providing lots of libraries.

Such "idioms" are of course no more than fancy "generics", but once the idea that a "construct" rather than a formalised object like a function can be processed at compile time, it may open the door to recognising common code pattern (which is what optimisation already does) and make transpilers more practical.

Am I outsmarting my own understanding? I'd been dreaming of more complicated language constructs for a very long time, inspired by Tcl, but I've never really paid the idea much attention.

Lucio.

Andy Balholm

unread,

Apr 6, 2020, 12:08:30 PM4/6/20

to golang-nuts

In looking back over some of these old conversations about converting C to Go, I realized that there is some confusion about the different programs named "c2go". There are basically 2:

rsc/c2go is the program that was used to convert the Go runtime, compiler, and linker from C to Go. It is not a general-purpose tool, but it produces Go output that is almost as readable as the original C. (I have a somewhat more general-pupose fork of it at github.com/andybalholm/c2go.)

elliotchance/c2go is a separate project that is intended to be able to translate arbitrary C programs, without needing manual cleanup. But this means that being universal and being automatic are prioritized above giving readable output.

Jan Mercl

unread,

Apr 6, 2020, 12:17:48 PM4/6/20

to Andy Balholm, golang-nuts

On Mon, Apr 6, 2020 at 6:08 PM Andy Balholm <andyb...@gmail.com> wrote:
>
> In looking back over some of these old conversations about converting C to Go, I realized that there is some confusion about the different programs named "c2go". There are basically 2:

Make it 3 please: modernc.org/gocc. Experimental, work in progress
etc., but I'd be grateful if anyone gives it a try and reports back
the failures.

Andy Balholm

unread,

Apr 6, 2020, 12:26:17 PM4/6/20

to Jan Mercl, golang-nuts

Only 2 named c2go, though, which is the specific confusion I was trying to address. (ESR, in particular, seemed to think that elliotchance/c2go was basically the same tool that the Go team had used to translate the compiler and runtime.)

By the way, if you want people to try gocc, a few paragraphs of documentation explaining what it does and how to use it would really help.

Andy

Reply all

Reply to author

Forward