Call for Discussion: New Project: Babylon

Paul Sandoz

unread,

Sep 6, 2023, 12:43:12 PM9/6/23

to dis...@openjdk.java.net

I hereby invite discussion of a new Project, Babylon, whose primary goal
will be to extend the reach of Java to foreign programming models such as
SQL, differentiable programming, machine learning models, and GPUs.

Focusing on the last example, suppose a Java developer wants to write a GPU
kernel in Java and execute it on a GPU. The developer’s Java code must,
somehow, be analyzed and transformed into an executable GPU kernel. A Java
library could do that, but it requires access to the Java code in symbolic
form. Such access is, however, currently limited to the use of non-standard
APIs or to conventions at different points in the program’s life cycle
(compile time or run time), and the symbolic forms available (abstract
syntax trees or bytecodes) are often ill-suited to analysis and transformation.

Babylon will extend Java's reach to foreign programming models with an
enhancement to reflective programming in Java, called code reflection. This
will enable standard access, analysis, and transformation of Java code in a
suitable form. Support for a foreign programming model can then be more
easily implemented as a Java library.

Babylon will ensure that code reflection is fit for purpose by creating a
GPU programming model for Java that leverages code reflection and is
implemented as a Java library. To reduce the risk of bias we will also
explore, or encourage the exploration of, other programming models such as
SQL and differentiable programming, though we may do so less thoroughly.

Code reflection consists of three parts:

1) The modeling of Java programs as code models, suitable for access,
analysis, and transformation.
2) Enhancements to Java reflection, enabling access to code models at compile
time and run time.
3) APIs to build, analyze, and transform code models.

For further details please see the JVM Language Summit 2023 presentations
entitled "Code Reflection" [1] and "Java and GPU … are we nearly there yet?"
[2].

I propose to lead this Project with an initial set of Reviewers that
includes, but is not limited to, Maurizio Cimadamore, Gary Frost, and
Sandhya Viswanathan.

For code reflection this Project will start with a clone of the current JDK
main-line release, JDK 22, and track main-line releases going forward.
For the GPU programming model this Project will create a separate repository,
that is dependent on code reflection features as they are developed.

We expect to deliver Babylon over time, in a series of JEPs that will likely
span multiple feature releases.
We do not currently plan to deliver the GPU programming model into the JDK.
However, work on that model could identify JDK features and enhancements of
general utility which could be addressed in future work.

Comments?

Paul.

[1] https://cr.openjdk.org/~psandoz/conferences/2023-JVMLS/Code-Reflection-JVMLS-23-08-07.pdf
https://youtu.be/xbk9_6XA_IY

[2] https://youtu.be/lbKBu3lTftc

David Alayachew

unread,

Sep 6, 2023, 3:15:54 PM9/6/23

to Paul Sandoz, dis...@openjdk.java.net

Hello Paul,

Thanks for posting this and the attached videos! I'm very excited for the possibilities that this project Babylon will bring about, but I do have a handful of initial questions.

1 - Can you help differentiate the goals between Babylon and Panama's FFI/M? I don't understand either project very well, and (from my ignorant perspective) there seems to be a lot of overlap. What separates them?

2 - I read a little about Project Sumatra, and other attempts for Java to interact with the GPU. While this project has a bigger end goal, will it be Java's current, main attempt to tackle GPU in the programming language?

    2.1 - If yes, what lessons have we learned from the past projects? Clearly, this is coming from a wildly different direction, but hearing about some motivations might help us understand the WHY along with the WHAT.

3 - Just flipping through the slides and having watched the videos, that is an intimidating amount of complexity. It intimidated me at least. Who is the target audience for this project, and what steps are needed to get them equipped to use these tools effectively? To give an example of the second question, String templates are clearly a needed and valuable feature, but it has become clear that the use cases and intended semantics for it aren't exactly clear to all Java developers. After extended discussion on the mailing lists and various social media threads, there's talk about a user guide for String templates being created. A similar thing was done for Text Blocks, if I'm not mistaken. I guess a better way of restating question 2 is, do you foresee difficulties in getting developers to see the use cases and intended semantics of the code reflection you spoke of? And if so, what steps do you plan to take to "nip it in the bud", so to speak?

4 - Why Babylon?

Thank you for your time and help!

David Alayachew

Paul Sandoz

unread,

Sep 6, 2023, 7:35:02 PM9/6/23

to David Alayachew, dis...@openjdk.java.net

Hi David,

On Sep 6, 2023, at 12:15 PM, David Alayachew <davidal...@gmail.com> wrote:

Hello Paul,

Thanks for posting this and the attached videos! I'm very excited for the possibilities that this project Babylon will bring about, but I do have a handful of initial questions.

1 - Can you help differentiate the goals between Babylon and Panama's FFI/M? I don't understand either project very well, and (from my ignorant perspective) there seems to be a lot of overlap. What separates them?

They are complementary, arguably the two together are greater than the sum of the parts. Panama focuses on interconnecting to native (or foreign) code, where as Babylon will focus on reaching out to foreign programming models. The GPU programming model will bring the two together, as presented by Gary at JVMLS.

2 - I read a little about Project Sumatra, and other attempts for Java to interact with the GPU. While this project has a bigger end goal, will it be Java's current, main attempt to tackle GPU in the programming language?

It will be an attempt that 1) will ensure code reflection is fit for purpose; and 2) inform how to design a good Java library that supports GPU programing. It’s too early to say where that will lead, but I do hope interested members of the OpenJDK community will collaborate on the design.

2.1 - If yes, what lessons have we learned from the past projects? Clearly, this is coming from a wildly different direction, but hearing about some motivations might help us understand the WHY along with the WHAT.

Others are more qualified to talk on the specifics of Project Sumatra than I, but I can give you my take. Sumatra was very ambitious and modified the JVM to support the compilation of bytecode to GPU code (as kernels), including support for garbage collection. That’s a lot of complexity esp. required to preserve Java program meaning, and modify the specification of Java program meaning for this case. We can learn from this and simplify the problem by providing a key building block, code reflection, such that an ordinary Java program (a library) can transform Java programs to GPU programs without having to preserve Java program meaning. Thereby the Java library is free to evolve independently of the Java platform.

3 - Just flipping through the slides and having watched the videos, that is an intimidating amount of complexity. It intimidated me at least. Who is the target audience for this project, and what steps are needed to get them equipped to use these tools effectively? To give an example of the second question, String templates are clearly a needed and valuable feature, but it has become clear that the use cases and intended semantics for it aren't exactly clear to all Java developers. After extended discussion on the mailing lists and various social media threads, there's talk about a user guide for String templates being created. A similar thing was done for Text Blocks, if I'm not mistaken. I guess a better way of restating question 2 is, do you foresee difficulties in getting developers to see the use cases and intended semantics of the code reflection you spoke of? And if so, what steps do you plan to take to "nip it in the bud", so to speak?

I do foresee difficulties getting the concepts across, esp. the concept of Java code that is not directly executable and represents some intent. Examples, prior art, and positioning documents will help but they will take time to produce. It’s early days so we have time and will adjust as we learn.

The target audience is library developers, enabling them to design novel APIs in combination with natural language constructs, which is a new form of API design. However, we should strive to design code reflection so it is reasonably accessible to a broad set of Java developers i.e., it has to have the “feel" of Java. My hope is we see some surprising and creative uses.

4 - Why Babylon?

Naming is hard!

The ancient city was a melting pot of learning and commerce, different cultures peacefully and productively interacting. In the context of Java "different cultures” ~= “foreign programming models”.

Also, from Wikipedia:

"

Babylon 5 – A science fiction series set on a futuristic space station that acts as a trading and diplomatic nexus between many different cultures. ...

"

Paul.

Ethan McCue

unread,

Sep 12, 2023, 3:31:40 PM9/12/23

to Paul Sandoz, dis...@openjdk.java.net

Can you elaborate more on prior work / the state of affairs in other language ecosystems? In the talk you reference Python "eating Java's lunch" - do they have a comparable set of features or some mechanism that serves the same goal (write code in Python, derive GPU kernel/autodiffed/etc. code)?

Paul Sandoz

unread,

Sep 12, 2023, 7:38:10 PM9/12/23

to Ethan McCue, dis...@openjdk.java.net

Hi Ethan,

Current/prior work includes Mojo, MLIR, C# LINQ, Julia [1], Swift for TensorFlow [2], Haskell [3].

In the context of lunch and Python what I had in mind is machine learning and all those frameworks, and I was also thinking about introspection of Python code which IIUC is what TorchDynamo [4] does.

Paul.

[1] https://arxiv.org/abs/1712.03112

[2] https://llvm.org/devmtg/2018-10/slides/Hong-Lattner-SwiftForTensorFlowGraphProgramExtraction.pdf

[3] http://conal.net/papers/essence-of-ad/essence-of-ad-icfp.pdf

[4] https://pytorch.org/docs/stable/dynamo/index.html

Juan Fumero

unread,

Sep 13, 2023, 1:04:42 PM9/13/23

to dis...@openjdk.org

Hi Paul,
I think this is a great initiative and very well-needed in the Java world. I have a few questions.

1)

> Babylon will ensure that code reflection is fit for purpose by creating a GPU programming model for Java that leverages code reflection and is implemented as a Java library.

Does this mean that one of the goals of the project is to define how GPUs should be programmed using the Code Reflection API, or for Java in General? Is Babylon limited to GPUs? Are you also considering other types of accelerators (e.g., AI accelerators, RISC-V accelerators, etc).

We have other programming models such as TornadoVM [1], which can be programmed using different styles (e.g., loop parallel programs and kernel APIs). How the new model/s will accommodate existing solutions? Is this to be defined?

2)

> We do not currently plan to deliver the GPU programming model into the JDK. However, work on that model could identify JDK features and enhancements of general utility which could be addressed in future work.

Does this mean that the GPU programming model will be only used as a motivation to develop the Code Reflection APIs for different use cases?

3) Is there any intent to support JVM languages with these models (e.g., R, Scala, etc), or will it be specific for the Java language?

4) I believe we also need new types. As we discussed in JVMLS this year, we will also need NDArray and Tensor types, Vector types and Panama-based types for AI and Heterogeneous Computing. This is aligned to the Gary's talk at JVMLS [2] in which he proposed the HAT initiative (Heterogeneous Accelerator Toolkit) and Panama-based types. Will be this also part of the Babylon project?

[1] https://tornadovm.readthedocs.io/en/latest/programming.html#core-programming

[2] https://www.youtube.com/watch?v=lbKBu3lTftc

Thanks
Juan

Paul Sandoz

unread,

Sep 13, 2023, 6:32:01 PM9/13/23

to Juan Fumero, dis...@openjdk.org

Hi Juan,

On Sep 13, 2023, at 10:03 AM, Juan Fumero <juan....@paravox.ai> wrote:

Hi Paul,
I think this is a great initiative and very well-needed in the Java world. I have a few questions.

1)
> Babylon will ensure that code reflection is fit for purpose by creating a GPU programming model for Java that leverages code reflection and is implemented as a Java library.

Does this mean that one of the goals of the project is to define how GPUs should be programmed using the Code Reflection API, or for Java in General?

The intent is a general approach that depends on the support of code reflection (and Panama FFM).

I think it is up to us, as members of the OpenJDK community, to determine where we head with regards to the GPU programming model, any concrete artifacts that could be produced, and where the dividing lines may be between APIs, implementations, and vendors. Gary can speak more to this than I.

Is Babylon limited to GPUs? Are you also considering other types of accelerators (e.g., AI accelerators, RISC-V accelerators, etc).

In principle it's not limited. As you have shown with TornadoVM the same programming model for GPUs can apply to other forms of hardware that are highly parallel processors, like FPGAs where a program is “printed out” (?) or uniquely arranged in some malleable hardware. In this case, assuming the programming model is applicable, it seems predominantly an area of implementation focus someone could choose to take on in their own implementations.

I think the more specialized the hardware the more limited the programming. So in some cases a parallel programming model may not apply, like with hardware that specializes only in multiplying tensors, which in effect reduces to some form of library calls.

We have other programming models such as TornadoVM [1], which can be programmed using different styles (e.g., loop parallel programs and kernel APIs). How the new model/s will accommodate existing solutions? Is this to be defined?

Again Gary can speak more to this, but I suspect the design will focus predominantly on a range-based kernel model (similar to Tornado’s kernel API). But, in principle I imagine it may be possible to plugin different kernel models (or copy parts of the design) where code reflection could be applied with different and more sophisticated approaches to program analysis and compilation, such as for a loop-based kernel model.

Two key ares of focus I see are:

1) the extraction of kernel call graphs using code reflection, as discussed in Gary’s JVMLS talk. Thus a developer does not have to explicitly build a task graph (as currently required by TornadoVM) and instead a specialized compiler does that work. (Note, it does not render any existing task graph API redundant, it just moves it more into the background as an important lower-level building block where the developer is not required to use it).

2) the ability to call pre-defined “native” kernels that exist in some where else e.g., GPU-enabled library, which may also be a solution for leveraging more exotic but constrained limited hardware.

2)
> We do not currently plan to deliver the GPU programming model into the JDK. However, work on that model could identify JDK features and enhancements of general utility which could be addressed in future work.

Does this mean that the GPU programming model will be only used as a motivation to develop the Code Reflection APIs for different use cases?

3) Is there any intent to support JVM languages with these models (e.g., R, Scala, etc), or will it be specific for the Java language?

It’s specific to the Java language and reflection of Java code.

4) I believe we also need new types. As we discussed in JVMLS this year, we will also need NDArray and Tensor types, Vector types and Panama-based types for AI and Heterogeneous Computing. This is aligned to the Gary's talk at JVMLS [2] in which he proposed the HAT initiative (Heterogeneous Accelerator Toolkit) and Panama-based types. Will be this also part of the Babylon project?

I think we will inevitably explore some of that, and they may be of such “general utility” we could decide to address in future work. However, I am wary of overly focusing on imperfections in this effort, esp. as in many of these cases there is a tendency to focus on syntax rather than the underlying model e.g., arrays (which requires much deeper and careful thinking, but result will be much better for that). It won’t be perfect and we can feed those imperfections into possible future work.

Paul.

Paul Sandoz

unread,

Sep 15, 2023, 7:25:36 PM9/15/23

to dis...@openjdk.java.net, Shaq Oliver

A question was sent privately that I have permission to reproduce and reply here for the benefit of everyone (sender CC’ed)

> On Sep 15, 2023, at 1:43 AM, Shaq Oliver <shaqo...@gmail.com> wrote:
>
> Hello,
> will project Babylon deliver features that help with creating domain specific languages in Java?
> Also will it deliver features that could help with neurosymbolic programming?
>
> Thanks a lot.

I am not very familiar with neurosymbolic programming, so I read up a bit and I am just a little bit more familiar and far from expert. My answer is I guess maybe so :-), if we can use an API in combination with natural language constructs to express both the neural net model with its layers and the symbolic components, from which we can obtain code models and reason about the Java code in symbolic form.

Relatedly, an example where I think code reflection can apply is to probabilistic programming, e.g., see Vate [1].

Vate currently specifies its own language for defining models, which I suppose one can describe as a DSL (and perhaps fits into the form of a symbolic component?). A Vate program is compiled to Java code that can execute at scale. The Vate language is designed to be familiar to Java developers (and I believe shares much of the grammar).

I strongly suspect we could devise a Java API combined with natural language constructs so developers could write a Vate program directly in Java, rather than using a separate language. The Vate compiler would then become one that operates on code models, transforming them to executable Java code. Interestingly, I wonder if that transformation could generate code that uses the GPU programming model.

Hth,
Paul.

[1] https://labs.oracle.com/pls/apex/f?p=94065:10:109345727860596:7229

Alan Bateman

unread,

Sep 28, 2023, 5:22:19 AM9/28/23

to Paul Sandoz, dis...@openjdk.java.net

On 06/09/2023 17:42, Paul Sandoz wrote:
> I hereby invite discussion of a new Project, Babylon, whose primary goal
> will be to extend the reach of Java to foreign programming models such as
> SQL, differentiable programming, machine learning models, and GPUs.

The Core Libraries group is happy to sponsor or co-sponsor this project.

-Alan.

Jonathan Gibbons

unread,

Oct 3, 2023, 4:21:06 PM10/3/23

to Paul Sandoz, dis...@openjdk.java.net

On 9/6/23 9:42 AM, Paul Sandoz wrote:
> I hereby invite discussion of a new Project, Babylon, whose primary goal
> will be to extend the reach of Java to foreign programming models such as
> SQL, differentiable programming, machine learning models, and GPUs.

The Compiler Group is happy to sponsor or co-sponsor this project.

-- Jon

Juan Fumero

unread,

Oct 11, 2023, 4:08:03 AM10/11/23

to Paul Sandoz, dis...@openjdk.org

Hi Paul,

This sounds great. We (the TornadoVM team at the University of Manchester) would like to collaborate and support this project moving forward.

Juan

-- 
CTO, Paravox Ltd

Reply all

Reply to author

Forward