Crystal and LLVM

259 views
Skip to first unread message

virtualma...@gmail.com

unread,
Dec 6, 2016, 12:38:40 PM12/6/16
to Crystal
I am absolutely fascinated with Crystal's use of LLVM. I have been studying /src/compiler/ and piecing together what I can on how Crystal targets LLVM. Between the Brainf*ck compiler example and the codegen sections in the Crystal compiler, I have learned a lot that I was simply struggling to grok from the LLVM Kaleidoscope tutorial. Now I am using Crystal to build my own toy front-end compiler on LLVM to better understand the ecosystem. I must say, the experience is making me appreciate all the hard work that was put into Crystal.

As someone with only basic C++ skills I find I miss a lot of information in the LLVM documentation. Aside from the LLVM programmers manual / source code did you have any helpful references when you were learning the API? The Brainf*ck example in particular was very helpful for me to see how to use the builder API but I still feel unsure of how the LLVM module system works, and how it handles scope in the grand scheme of program execution.

Ary Borenszweig

unread,
Dec 7, 2016, 5:40:24 AM12/7/16
to Crystal
Thank you for the kind words! :-)

Yes, LLVM can be hard to get at first. The main doc we followed was, as you mentioned, the language reference. Since we needed to bind to the C API, that was useful too (there are other files too, that's the main one).

As for LLVM modules, they are independent unit of compilations. They can use functions or globals defined in other modules if you declare them as an external declaration. In Crystal we create one module per type, so all methods of, say, String, will be generated in a String module. If another module uses methods from String they will declare them as external functions. At link time everything will be found. There isn't any other scope than that: local "variables" inside a function (which are always immutable), then functions which can be interla/private to the module, public, or defined externally, and the same applies to globals.

Another great source to check if you are generating good/optimal code is to take a piece of C /C++ code and run it through clang and generate llvm-ir:

$ clang file.c -S -emit-llvm # generates file.ll

Since clang is the main project that uses LLVM, if you generate code that more or less looks like what clang generates then you are probably doing things right :-)

Back to LLVM, we also checked the performance tips. Of those, the most important is to generate all alloca instructions in the entry block. Well, at least that's important if you want to generate efficient code.

On Tue, Dec 6, 2016 at 2:38 PM, <virtualma...@gmail.com> wrote:
I am absolutely fascinated with Crystal's use of LLVM. I have been studying /src/compiler/ and piecing together what I can on how Crystal targets LLVM. Between the Brainf*ck compiler example and the codegen sections in the Crystal compiler, I have learned a lot that I was simply struggling to grok from the LLVM Kaleidoscope tutorial. Now I am using Crystal to build my own toy front-end compiler on LLVM to better understand the ecosystem. I must say, the experience is making me appreciate all the hard work that was put into Crystal.

As someone with only basic C++ skills I find I miss a lot of information in the LLVM documentation. Aside from the LLVM programmers manual / source code did you have any helpful references when you were learning the API? The Brainf*ck example in particular was very helpful for me to see how to use the builder API but I still feel unsure of how the LLVM module system works, and how it handles scope in the grand scheme of program execution.

--
You received this message because you are subscribed to the Google Groups "Crystal" group.
To unsubscribe from this group and stop receiving emails from it, send an email to crystal-lang+unsubscribe@googlegroups.com.
To post to this group, send email to crysta...@googlegroups.com.
Visit this group at https://groups.google.com/group/crystal-lang.
To view this discussion on the web visit https://groups.google.com/d/msgid/crystal-lang/ca8974eb-7ca6-41c0-9ce0-d35ad0708529%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Ary Borenszweig         Manas Technology Solutions
[ar.phone]                      5258.5240       #ARY(279)
[us.phone]                      312.612.1050    #ARY(279)
[email]                         aboren...@manas.com.ar
[web]                           www.manas.com.ar

Joshua Mendoza

unread,
Dec 7, 2016, 2:42:03 PM12/7/16
to Crystal
What you're trying to achieve is quite respectable and Ary's insight is very fascinating. Have you consider documenting your learning journey as a series of blog posts or something like that where we can read on your progress? Unfortunately, I haven't had the time to dive deep as you have got into. So, good job! Keep going.

virtualma...@gmail.com

unread,
Dec 7, 2016, 3:40:13 PM12/7/16
to Crystal
Thanks Ary, that was very helpful.

That clarifies it quite a bit in terms on how llvm handles the module scope, I happened to find a reference after my post:
https://www.cs.cornell.edu/~asampson/blog/llvm.html
Assuming this information is still correct in the latest version of llvm then I am getting a much clearer picture here.

Think I need a few more days of woodshed on this to really feel this out. Especially, things I take for granted, like how the main function is invoked.

The S -emit-llvm flag is very interesting... I ran a fibonacci example through it and saw some pretty interesting code. For the time being I assume I am safe to ignore all the metadata and just focus on the core instructions. It already taught me some interesting concepts like nsw nuw and align, but am I also safe to assume that if I am able to make use of the LLVM builder API then it will automatically build efficient and safe instructions for me, or atleast allow me to run optimizations on it to get to a similar stage as the clang emitted ir?


virtualma...@gmail.com

unread,
Dec 7, 2016, 4:10:52 PM12/7/16
to Crystal

@Joshua
Up until this point I had not, however you are making me consider it. I still have a long way to go even for just a toy compiler but it would be very helpful to me and likely others to document the stages as I go. As someone who is completely self-taught with no formal education in computer science, I often feel like trying to learn how to make a toy language compiler will atleast make me a phd in google-fu.  Some days I wonder if there are programmers out there that specifically are trying to make their APIs as hard to understand as possible for someone like me. But then my curiosity gets the better of me and I try again.

virtualma...@gmail.com

unread,
Dec 14, 2016, 11:59:45 PM12/14/16
to Crystal
Your information along with some long study is really paying off. Just want to thank you again for all your amazing work. Studying Crystal's compiler is a never ending source of enlightening moments for me lately! Cheers!


On Wednesday, December 7, 2016 at 5:40:24 AM UTC-5, aborenszweig wrote:
Thank you for the kind words! :-)

Yes, LLVM can be hard to get at first. The main doc we followed was, as you mentioned, the language reference. Since we needed to bind to the C API, that was useful too (there are other files too, that's the main one).

As for LLVM modules, they are independent unit of compilations. They can use functions or globals defined in other modules if you declare them as an external declaration. In Crystal we create one module per type, so all methods of, say, String, will be generated in a String module. If another module uses methods from String they will declare them as external functions. At link time everything will be found. There isn't any other scope than that: local "variables" inside a function (which are always immutable), then functions which can be interla/private to the module, public, or defined externally, and the same applies to globals.

Another great source to check if you are generating good/optimal code is to take a piece of C /C++ code and run it through clang and generate llvm-ir:

$ clang file.c -S -emit-llvm # generates file.ll

Since clang is the main project that uses LLVM, if you generate code that more or less looks like what clang generates then you are probably doing things right :-)

Back to LLVM, we also checked the performance tips. Of those, the most important is to generate all alloca instructions in the entry block. Well, at least that's important if you want to generate efficient code.
On Tue, Dec 6, 2016 at 2:38 PM, <virtualma...@gmail.com> wrote:
I am absolutely fascinated with Crystal's use of LLVM. I have been studying /src/compiler/ and piecing together what I can on how Crystal targets LLVM. Between the Brainf*ck compiler example and the codegen sections in the Crystal compiler, I have learned a lot that I was simply struggling to grok from the LLVM Kaleidoscope tutorial. Now I am using Crystal to build my own toy front-end compiler on LLVM to better understand the ecosystem. I must say, the experience is making me appreciate all the hard work that was put into Crystal.

As someone with only basic C++ skills I find I miss a lot of information in the LLVM documentation. Aside from the LLVM programmers manual / source code did you have any helpful references when you were learning the API? The Brainf*ck example in particular was very helpful for me to see how to use the builder API but I still feel unsure of how the LLVM module system works, and how it handles scope in the grand scheme of program execution.

--
You received this message because you are subscribed to the Google Groups "Crystal" group.
To unsubscribe from this group and stop receiving emails from it, send an email to crystal-lang...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages