Regarding proposal of an idea for Elixir (GSoC 2018)

177 views
Skip to first unread message

Anshuman Chhabra

unread,
Jan 20, 2018, 10:41:50 AM1/20/18
to BEAM Community
Respected Mentors,

I realize that this is quite early for GSoC so I apologize for the inconvenience. I am interested in contributing to Elixir and had a rather ambitious project idea I wanted to discuss with the mentors for Elixir. I am currently a final year electrical engineering student from India. I have also attached my Resume to highlight relevant experience.

Thank you for your time,
Warm Regards,
Anshuman Chhabra
CV-Resume.pdf

José Valim

unread,
Jan 20, 2018, 11:51:23 AM1/20/18
to Anshuman Chhabra, BEAM Community
Hi Anshuman,

No need to apologize, we are happy to hear you are interested in contributing to Elixir.

Here is a perfect place to have a discussion about your idea, so please go ahead. :)



José Valim
Founder and 
Director of R&D

Anshuman Chhabra

unread,
Jan 21, 2018, 12:09:35 AM1/21/18
to BEAM Community
Hello!

Thank you for the prompt reply :)

Currently, there is a lack of machine learning tools and frameworks for Elixir. In my opinion, with the number of programmers learning/using machine learning only set to grow, supporting machine learning capabilities is essential for any programming language. I have also seen discussions on elixirforum.com regarding this and a talk on the same given at ElixirConf EU last year.

My idea is as follows: I would like to work on building a framework similar to Keras in Python (https://keras.io/) for Elixir. Keras uses Tensorflow as a backend for doing all the ML. Thus, using Native Implemented Functions (NIF) for the Tensorflow C API (https://www.tensorflow.org/install/install_c), we could seek to provide the same functionality in Elixir. However, I am not proposing building an entire framework over the summer, but the implementations of certain key functionalities that can allow developers to build Deep Learning architectures (such as an MLP, CNN, RNN etc.)

I am unsure of whether this is something that is required by the community as a whole. Also, using NIF would be tricky as this would eliminate the fault-tolerance that the Erlang VM boasts of, and some of the created functions might be long running (training, for example) which would have to be handled differently. 

Looking forward to hearing from you!

Warm Regards,
Anshuman

José Valim

unread,
Jan 21, 2018, 4:23:36 AM1/21/18
to Anshuman Chhabra, BEAM Community
Thank you Anshuman,

This is a great idea that we would love to pursue further. So a proposal is definitely welcome!

Regarding NIFs, there is support for a special class of NIFs called dirty NIFs, which can run as long as they want since Erlang 19. Here is an example project that uses them: https://github.com/antipax/nifsy - so this should simplify things a bit in the C front.

If you have more questions, please let us know!





José Valim
Founder and 
Director of R&D

Anshuman Chhabra

unread,
Jan 21, 2018, 1:05:21 PM1/21/18
to BEAM Community
Thank you for the help and support! I really appreciate it :)

I have looked at dirty NIFs through the repository you had linked as well as the Erlang docs, and they would definitely make the task easier. As a precursor to GSoC student applications opening, I will start working on a POC with some very basic functionalities working for Tensorflow in Elixir (such as outputting the TF version, running the Tensorflow Hello world example, creating a sample graph, etc.). If there are other requirements that you have in mind, please let me know.

If I have any questions regarding implementation, I shall ask here. Thank you again!

Warm Regards,
Anshuman. 

Anshuman Chhabra

unread,
Jan 25, 2018, 4:18:50 AM1/25/18
to BEAM Community
Hello José!

I had a quick question regarding the project. So I had mistakenly assumed that the Tensorflow C++ core which contains all functionality was being publicly exposed as the C API I had mentioned earlier. However, this is not the case-- the C API is still under development and currently only supports Inference. This means that while it can allow us to do predictions based on previously trained models (in Python) we would not be able to train models from scratch. Moreover, official Tensorflow bindings released by Google for other languages, such as Go, also support Inference only. Google has stated in it's documentation that models should be trained in Python and then can be executed in Go apps (source). Also the C++ core is not publicly exposed and is compiled using bazel every time it runs, so writing bindings for that is not possible.

In my opinion, enabling Inference by writing bindings for the C API would still be a challenging project. Also Google will keep adding code to the C API to enable Training support as well. Thus, once the Elixir project exists, this functionality could be added to it too, in time.

I wanted to know your thoughts and opinions regarding this. Would this be an acceptable GSoC project? If not, I could look at other frameworks but Tensorflow has the fastest growing API support and would be the best choice for the long-term. Also, this might be a bit of a generalization, but for Elixir, most ML requirements would be that of Inference, and might require predictions being made on the web. This is the major reason why I feel Google did not include Training support for Golang.

Sorry for the long post!

Thanking you,
Anshuman. 

José Valim

unread,
Jan 25, 2018, 2:04:46 PM1/25/18
to Anshuman Chhabra, BEAM Community
Thanks for the updates Anshuman!

I actually quite like that since it gives us a more focused to project and we have a shorter road to travel to achieve parity with other languages.

So yes, it would still make it a good GSoC project. If by any chance you finish the project much sooner, we can allocate more time for producing documentation and relevant material.


José Valim
Founder and 
Director of R&D

Anshuman Chhabra

unread,
Feb 3, 2018, 12:02:15 PM2/3/18
to BEAM Community
Hello again José!

I have been writing code (the NIFs) to achieve the Hello World equivalent for Tensorflow in Elixir. The Python code for this Hello World program would look something like this - 
import tensorflow as tf

with tf.Session() as sess:
    print(sess.run(tf.constant("Hello World!")))

The good news is that I have been able to achieve this in Elixir using the NIFs. So creation of new graphs, string constant tensors, operations and sessions to run this specific scenario work. I have created a repository with a reasonably detailed README so that you can look at the code for doing so: https://github.com/anshuman23/TensorflEx
(The code for the Elixir equivalent of the above Python program is here https://github.com/anshuman23/TensorflEx#how-to-run-these)

This is still just grazing the tip of the iceberg as the session run only supports string constant tensors and a real use case for Tensorflow will never involve just passing a string tensor and getting the output, but require pre-trained graphs to be loaded and then predictions to be obtained in the session for the provided inputs. This work so far was just done as a prerequisite to show that I can follow through on this project during the summer. 

I would really like to know your thoughts on the work. Also, should I continue working on more functionality or leave this for the time being? What would you recommend I do next till applications open? 

I'm sorry for springing you with such frequent updates! I really appreciate you taking out the time to help.

Best Regards,
Anshuman

José Valim

unread,
Feb 9, 2018, 4:56:45 AM2/9/18
to Anshuman Chhabra, BEAM Community
Hi Anshuman,

For the application purposes, you have already made a good progress! We are happy about that and about the frequent updates. Communication is key.

Although it is not necessary for the application itself, if you want to continue working on this, I would recommend the following steps:

  1. Structure your project like a proper Mix project. You can start a new one by calling "mix new tensorflex" in the command line. You will put the Elixir code in lib, the C code in c_src and use "mix compile" to compile everything. To invoke the makefile, you can use elixir_make: https://github.com/elixir-lang/elixir_make

  2. Remove the "tf_" prefix from functions, as it is not common in Elixir, and write some basic documentation

The steps above will be very important to create a project that is ready to run by others.



José Valim
Founder and 
Director of R&D

Message has been deleted

Anshuman Chhabra

unread,
Feb 14, 2018, 3:04:22 PM2/14/18
to BEAM Community
Hello Josè!

Thank you for the detailed feedback. I apologize for the delayed reply from my side, but I have been on a vacation trip for the past week with limited internet connectivity.

I will make the changes as per your recommendations and update the code in the repository along with more documentation. Please let me know if there is anything else that you would like me to do regarding this.

Thank you again for all the help,
Warm Regards,
Anshuman.

Anshuman Chhabra

unread,
Feb 17, 2018, 2:32:31 PM2/17/18
to BEAM Community
Hi José,

I've managed to make the changes you had recommended. I have structured the code as a mix project, added documentation and also removed the tf prefix before function names. You can look at the repository for the updates: https://github.com/anshuman23/tensorflex

Please let me know if I can make any further improvements. I also wanted to ask what would be a good time to start working on and subsequently share a first draft of my proposal with you.

Thank you! :)

 

José Valim

unread,
Feb 17, 2018, 2:58:23 PM2/17/18
to Anshuman Chhabra, BEAM Community
Great job Anshuman!

Right now I don't think we need more improvements. And feel free to share a first draft of the proposal at any time!

If you have some free time, it may also be a good idea to study the APIs from other tensorflow libraries that use the C bindings. Apparently on the tensorflow page they have some guidelines and describing the plan for implementing those would certainly enrich your proposal.

Have a good one!




José Valim
Founder and 
Director of R&D

Anshuman Chhabra

unread,
Feb 18, 2018, 2:05:34 AM2/18/18
to BEAM Community
Thank you José!

I will start working on my proposal then. I have seen the webpage which talks of building Tensorflow bindings for other languages and I will be sure to include my solutions for the guidelines in the proposal as well.

Warm Regards,
Anshuman.

Anshuman Chhabra

unread,
Mar 4, 2018, 1:19:46 PM3/4/18
to BEAM Community
Hello José!

I apologize for such a delayed reply, but I had some deadlines for projects in my university and couldn't take out the time to complete the proposal.

I have finally been able to write up a first draft and would like to know your opinion on the contents. If there are any changes I should make, please let me know. The Google doc link is: https://docs.google.com/document/d/1e9eTrA5XKmWNLjf_vMQHDqBi49BNDYNf7tnyizvJ4IA/edit?usp=sharing

I have attached  a PDF version as well. Thank you for helping out. Have a nice day!

Warm Regards,
Anshuman.
GSoC Proposal 2018.pdf

José Valim

unread,
Mar 5, 2018, 4:57:50 AM3/5/18
to Anshuman Chhabra, BEAM Community
Thank you Anshuman, that's a great proposal!

I have just one question regarding the APIs. Do you plan to have:

1. C API -> Low-level Elixir API -> High-level Elixir API

2. C API -> Low-level Elixir API AND C API -> High-level Elixir API

To be more precise, do you plan to provide a low-level Elixir API that wraps the C functions and then build build the high-level API on top of the low-level Elixir one. Or do you think both low- and high-level Elixir APIs will be implemented directly on top of the C ones?




José Valim
Founder and 
Director of R&D

Anshuman Chhabra

unread,
Mar 5, 2018, 9:01:29 AM3/5/18
to BEAM Community
Thank you for the feedback José. To clarify, what I had in mind was something like this:

Tensorflow C API -> Low-level and High-Level Elixir API functions

So the Elixir API would have all the functions. I was thinking the demarcation between low-level and high-level functions would be made apparent in the examples I would list in the documentation. Moreover, having all the functions together would make it easier for users who wish to combine both low-level and high-level functions. For example, one could potentially construct their own graphs from scratch by adding operations and inputs (using low-level functions) but then use the high-level create_and_run_sess function to run the session (This is somewhat similar to what I have done in the POC, albeit at a very very low complexity scale), instead of going through the pains of setting up (and running) the session by themselves.

Hopefully this makes sense. Do you agree with this design choice?

Thanking you,
Regards,
Anshuman 

José Valim

unread,
Mar 5, 2018, 9:41:53 AM3/5/18
to BEAM Community
Ideally, the best would be if you fully implement a low-level API around the C API. And then the higher-level API is built on top of the low-level Elixir API. This would allow us to be the most productive too. If we build the wrong high-level abstraction, we can rewrite it without touching the C code.

However, this may not be desired in practice for different reasons. One of those reasons is performance as every workflow will require going and coming back from the C API multiple times. Every time we do that, we need to convert Erlang "objects" to C, allocate NIF resources and so on.

Therefore my suggestion would be:

1. Build a low-level Elixir API around Tensorflow that is mostly a wrapper around the C one.
2. Build the high-level Elixir API on top of the low-level Elixir API
3. Implement high-level functionality in C only if they are deemed necessary for performance or complexity reasons

I think it is important to have a process around this. Otherwise we can easily end-up with a dozen thousand lines of C code to maintain while many of those could have been written in Elixir.

What do you think?



José Valim
Founder and 
Director of R&D

--
You received this message because you are subscribed to the Google Groups "BEAM Community" group.
To view this discussion on the web visit https://groups.google.com/d/msgid/beam-community/599cf521-22ab-4f11-8c2e-91fe2c6fe8bb%40googlegroups.com.

Anshuman Chhabra

unread,
Mar 5, 2018, 10:28:24 AM3/5/18
to BEAM Community
I understand your point of view and I completely agree. It would make more sense for maintenance purposes to first create the low level Elixir wrapper and then create high-level Elixir abstractions on top of it. Also, while there might be a slight dip in performance by doing this, I do not think we would need to worry about that too much. If by doing this, a high-level function is performing a task unreasonably slowly, we could always write the function using C code instead. 

This also makes a lot of sense as it would speed up the initial part of the project. Writing the low-level Elixir API would be much faster than thinking about how and what high-level functionalities to include in C code.

I think I will clarify this approach in the proposal by adding a few lines. Do you think any other changes should be made to it?

Also, is there anything else you have in mind regarding the project that I should work on currently?

Thank you :) 

José Valim

unread,
Mar 7, 2018, 2:18:59 PM3/7/18
to BEAM Community
I have thought about your questions and I don't have any other changes in mind. :)

About the project, nothing else is necessary for the proposal. But if you would like to try out different things or continue with development, we will be glad to help.



José Valim
Founder and 
Director of R&D

--
You received this message because you are subscribed to the Google Groups "BEAM Community" group.

Anshuman Chhabra

unread,
Mar 9, 2018, 1:19:57 PM3/9/18
to BEAM Community
Thanks José!  I do have some ideas I would like to try out. If I run in to any problems or need advice, I'll be sure to ask.

Warm Regards,
Anshuman

Anshuman Chhabra

unread,
Mar 14, 2018, 2:21:11 PM3/14/18
to BEAM Community
Hello José,

I had made the small changes to the proposal as I had said previously. I have completed all the formalities on the GSoC portal and shared/submitted the proposal there as well. :)

I have been thinking about working on development in the meantime, but some project deadlines till March 19th have not been giving me enough time. So I will continue coding after they have passed.

I also wanted to ask a rather trivial question. Would using C++ as the backend for the NIFs be just as problem free as it is with C? I remember that during some of my early experiments with NIFs, the erl_nif functions would run into problems with g++ during compile time. This question doesn't have a basis in the current project as I am certain using C will be sufficient. However, I was curious to know if C++ (leveraging the Tensorflow C API) could be used at all. 

Thanking you,
Anshuman

José Valim

unread,
Mar 14, 2018, 3:07:06 PM3/14/18
to BEAM Community
I am not familiar with any example that uses NIFs with C++ so I am not sure.

I would guess that it is possible but it would require some extra work.

And good luck with your submission!



José Valim
Founder and 
Director of R&D

--
You received this message because you are subscribed to the Google Groups "BEAM Community" group.

Anshuman Chhabra

unread,
Mar 15, 2018, 2:39:08 PM3/15/18
to BEAM Community
Thank you! :)

I will give an update of what I am working on and some of my ideas, when I start work on them next week.

Warm Regards,
Anshuman
Message has been deleted

Ryan B. Harvey

unread,
May 12, 2018, 2:05:44 AM5/12/18
to BEAM Community
Anshuman,

I'm curious if this project is happening. I see that there have not been any updates here since mid-March, and the code in the repo linked above has not been updated since late February.

I'm quite interested in the outcome of this project, and would like to be able to follow progress. If work is happening, can you let me know how best to follow it?

Thanks,
Ryan

Anshuman Chhabra

unread,
May 12, 2018, 1:01:45 PM5/12/18
to beam-co...@googlegroups.com
Hello Ryan!

Yes it did get accepted for the summer. :)

I will be starting the coding phase soon and will be writing blog posts with updates about the work. I will most probably be adding blog entries here: http://anshumanc.ml and updating the same GitHub repository with the code. I will be posting on the mailing list as well from time to time.

Warm Regards,
Anshuman.

--
You received this message because you are subscribed to a topic in the Google Groups "BEAM Community" group.
To view this discussion on the web visit https://groups.google.com/d/msgid/beam-community/976c5c89-73b1-4703-8993-27ea0fbd1a45%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages