Using AST in a multi-threaded environement

59 views
Skip to first unread message

Alexander Ryzhov

unread,
Dec 12, 2012, 12:41:05 PM12/12/12
to sab...@googlegroups.com
Hi,

I was wondering if it's possible to use AST in a multi-threaded environment. The one thing that's always bothered me about Java is that in theory you have no guarantee that changes to variable values are committed to main memory unless certain conditions are met. I was wondering if anybody can prove to me that it's safe to reference an AST generated by SableCC in another thread (read-only) without worrying that some changes might not have been committed to main memory since node variables aren't declared as volatile and there is no synchronized access.


Alexander

Niklas Matthies

unread,
Dec 12, 2012, 1:08:11 PM12/12/12
to sab...@googlegroups.com
On Wed 2012-12-12 at 09:41h, you wrote:
:
> I was wondering if anybody can prove to me that it's safe to
> reference an AST generated by SableCC in another thread (read-only)
> without worrying that some changes might not have been committed to
> main memory since node variables aren't declared as volatile and
> there is no synchronized access.

You have to "publish" the AST safely. The simplest way to do so is to
store a reference to it in a final instance variable after the AST has
been constructed, and ensure (a) that it is not modified afterwards
and (b) that other threads only reference it (transitively) through
this final instance variable. Java then guarantees that other threads
see the state of the object graph at the time of initialization of the
final variable, and not a previous state. See e.g.
http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#finalRight.

Otherwise, just use good old locking.

Niklas

Alexander Ryzhov

unread,
Dec 12, 2012, 3:47:50 PM12/12/12
to sab...@googlegroups.com, ml_sabl...@nmhq.net
Thanks a lot, Niklas. The document you are referring to is an official FAQ to JSR-133, and I was very surprised to read the following:"Assuming the object is constructed "correctly", once an object is constructed, the values assigned to the final fields in the constructor will be visible to all other threads without synchronization. In addition, the visible values for any other object or array referenced by those final fields will be at least as up-to-date as the final fields." This really seems too good to be true! If it's true then it means that the JVM walks through the reference graph of a final field and commits changes in that graph to main memory at the time of field initialization. I haven't read about anything like this anywhere else. However, the source you're citing is reputable, so this really sparked my interest. I will read JSR-133 in detail and will write back regarding my findings.

Alexander

Etienne Gagnon

unread,
Dec 12, 2012, 3:57:22 PM12/12/12
to sab...@googlegroups.com
Hi Alexander,

Walking the graph would be way too expensive. The JVM simply clears the
caches. More precisely, the JVM implements a read (or write) barrier,
depending on the operation (read/write), for final and volatile fields.

The last time I read about these things (a few years ago), no
high-performance JVM did fully implement the specification, as it would
have negatively impacted performance too much. So, be careful relying
too much on the specification of Java's memory model. The safest remains
to use explicit locking operations.

Etienne

Etienne Gagnon, Ph.D.
http://sablecc.org

Niklas Matthies

unread,
Dec 13, 2012, 6:25:20 AM12/13/12
to sab...@googlegroups.com
At least under x86, you are safe. For final fields to work as
described, only StoreStore barriers are needed, which are a no-op
under x86 (cf. http://g.oswego.edu/dl/jmm/cookbook.html).

Hopefully with Java 8, even the hotspot compiler will do the right
thing under the other architectures:
http://hg.openjdk.java.net/jdk8/tl/hotspot/rev/701a83c86f28

Niklas
> --
> -- You received this message because you are subscribed to the SableCC group. To post to this group, send email to sab...@googlegroups.com. To unsubscribe from this group, send email to sablecc+u...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/sablecc?hl=en
>
>

Alexander Ryzhov

unread,
Dec 13, 2012, 2:11:03 PM12/13/12
to sab...@googlegroups.com
Hi Etienne,

Sure, locking is always possible. However, consider that I am writing an interpreter that will parse a program once and will walk through the AST many times to execute it with different parameters, in different threads. Would it it be a good idea to put a synchronized block around the the execution phase? Perhaps not since it would eliminate concurrency completely. On the other hand, if AST itself was immutable, no synchronization would be needed. Would you consider making AST immutable in future SableCC versions?

Thanks,
Alexander

Etienne Gagnon

unread,
Dec 13, 2012, 3:22:05 PM12/13/12
to sab...@googlegroups.com
Hi Alexander,

You get away without any synchronization by providing the AST reference
as a parameter to the constructor of interpreter-thread objects:

...main(...) {
Node ast = ....parse();
for(...) {
Thread interpreterThread = new InterpreterThread(ast);
interpreterThread.start();
}
}

If interpreter threads can't get the reference this way (e.g. when using
a thread pool), a notification protocol would work:

...main(...) {
Node ast = ....parse();
synchronized(pool) {
pool.setAST(ast);
pool.notify(); // or pool.notifyAll()
}
}

Some SableCC users do modify the AST for various reasons (e.g. weeding
the AST).

Etienne

Etienne Gagnon, Ph.D.
http://sablecc.org

Le 12-12-13 14:11, Alexander Ryzhov a �crit :

Alexander Ryzhov

unread,
Dec 13, 2012, 4:27:03 PM12/13/12
to sab...@googlegroups.com
Etienne,

1. I guess the approach will work without synchronization assuming that: In order for a thread to see an object for the first time it must be committed to main memory. Is this a fair assumption?

2. In response to users who modify AST for "weeding": isn't the whole point of AST that it's a transformation of CST that doesn't contain any junk already?

3. To your earlier comment stating that the JVM clears the entire cache when an assignment to a final field is performed: that would actually kill performance and would discourage people from using final fields since they carry performance penalty, so I doubt that it's what's really happening.

Alexander

Etienne Gagnon

unread,
Dec 13, 2012, 5:02:38 PM12/13/12
to sab...@googlegroups.com
See below.

On 2012-12-13 16:27, Alexander Ryzhov wrote:
> 1. I guess the approach will work without synchronization assuming
> that: In order for a thread to see an object for the first time it
> must be committed to main memory. Is this a fair assumption?
The Java API specification is rather disappointing about it:
http://docs.oracle.com/javase/7/docs/api/java/lang/Thread.html#start()

In SableVM, creating a thread required global synchronization, in the
VM, in order to update the core thread infrastructure with the new
thread information. It seems relatively safe to assume most VMs would so
the same. But, there I don't remember seeing such guarantee in the Java
API spec, if this is what you are looking for.

> 2. In response to users who modify AST for "weeding": isn't the whole
> point of AST that it's a transformation of CST that doesn't contain
> any junk already?

Weeding is a very powerful and useful technique to simplify a grammar
and give better error messages.

I prefer seeing a message that says: semantic error: the "public"
modifier cannot be used twice when declaring the "foo" method
instead of: syntax error: expecting 'final', 'int', 'void',...

> 3. To your earlier comment stating that the JVM clears the entire
> cache when an assignment to a final field is performed: that would
> actually kill performance and would discourage people from using final
> fields since they carry performance penalty, so I doubt that it's
> what's really happening.

As I said, I have not kept up to date with the latest state of things.
When I was developing and maintaining SableVM, most hardware
architectures did not provide fine grain control on cache lines, and
none of the high-performance JVMs did fully implement the Java memory
model, due to performance problems.

It is possible that things have improved... You could have a look at
Bill Pugh's web page ( http://www.cs.umd.edu/~pugh/java/memoryModel/ )
to learn more about the intricacies of the memory model. Correctly using
the features of the Java memory model (even with a JVM that fully
implements it) to implement collaboration between threads without
explicit synchronization is, at the least, bug prone.

Etienne
P.S. If you care about my personal opinion, here it is: I think that the
whole multi-threading programming paradigm is to concurrent programming
the equivalent of assembly language for sequential programming.

Alexander Ryzhov

unread,
Dec 18, 2012, 1:23:22 AM12/18/12
to sab...@googlegroups.com
Hi Etienne,

Thanks for your help. Where can I read about weeding? I'd like very much to improve my error messages. As for synchronization, I see a big can of worms here. I still can't confirm that the approach suggested by Niklas would work, and the approach you suggested can't be guaranteed to work either if assumption #1 is not valid. Both suggestions at the very least are unorthodox. The use of immutable objects, on the other hand, is a sure way to achieve concurrent read access without synchronization, and is recognized by many as a good programming practice.

Alexander

Etienne Gagnon

unread,
Dec 18, 2012, 10:35:14 AM12/18/12
to sab...@googlegroups.com
Hi Alexander,

It is a technique I learned from my supervisor Laurie Hendren when I was
studying at McGill. It is probably alluded to (not necessarily under
this name) in standard compiler books. I haven't paid much attention to
it lately, so I don't remember in which book(s) I saw it.

Unfortunately, most compiler books spend a lot of time explaining the
techniques implemented in parser/lexer generators (DFA/NFA/LL/LR), but
spend little time explaining how to actually write lexical definitions
and grammars, except for the widely known ambiguous constructs (if/else,
expressions).

The idea is simple: you write a "wider' grammar, usually a simpler
grammar that accepts some invalid constructs, then you reject invalid
constructs after parsing (the weeding phase). While doing so, you can
also make minor modifications to the AST to help with the upcoming
semantic analysis.

Etienne

Etienne Gagnon, Ph.D.
http://sablecc.org

Michael Mast

unread,
Dec 18, 2012, 2:10:44 PM12/18/12
to sab...@googlegroups.com
Alexander,

Here's an example real world application where "weeding" and other AST
manipulations come in handy.

I had a situation where some rules were being specified. Basically a
variable could have one of many values, where the values appear in a comma
separated list. The issue was that more than one variable could have the
same (lengthy) list of values. The syntax for specifying the rules looked
something like this:

Entitlement1 in (A, B, C, D, E, F, G) And Entitlement2 in (A, B, C, D, E, F,
G)

But instead of values A through G, the real world case had about 30 or 40
different values. This was fraught with multiple opportunities of
introducing cut and paste errors and, due to the length of the rules, was
difficult to read. Instead I "invented" transcription operators, which are
a really just syntactic Boolean operators. These are the familiar And and
Or, just enclosed within parenthesis: (And) and (Or). So the above example
becomes:

Entitlement1 (And) Entitlement2 in (A, B, C, D, E, F, G)

Now the long list of values occurs only once. You can imagine what the AST
looks like for this. The AST is recursively descended and modified so that
it looks like the AST of the first example (the one with the duplicated list
of values). Of course, this is all done before the modified AST is used for
interpretation/code generation.

Mike

Stephen P Spackman

unread,
Dec 18, 2012, 3:34:30 PM12/18/12
to sab...@googlegroups.com
It's also a good way to think when you are *designing* a language. If (for example) all your compound statements are designed to have a general form like

initialKeyword [parameters] block {secondaryKeyword [parameters] block}

or

{keyword expression} end

or whatever, you can make a notation that is easier to learn, easier to make unambiguous, easier to extend in the future and easier to write source processing tools for, as well as easier on the compiler writer. You don't have to go all the way to Lisp to reap the benefits of uniform notation! (At Concordia, Hendrik Boom was calling this the "cover grammar" technique, though again I don't know if that's a standard term.)

—Stephen.


--
-- You received this message because you are subscribed to the SableCC group. To post to this group, send email to sab...@googlegroups.com. To unsubscribe from this group, send email to sablecc+unsubscribe@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/sablecc?hl=en



Alexander Ryzhov

unread,
Dec 19, 2012, 12:50:55 AM12/19/12
to sab...@googlegroups.com
Mike, that's a clever hack! However, hacks work to a certain extent. If your transformed AST should look just a bit different, you could face a choice between making the hack more ugly or getting rid of the hack altogether and build your own data structure based on the AST generated by the parser. For example, if you needed to add a child node to a node that doesn't have a child of that type, you could subclass the container node. Works, but not very maintainable. Besides, AST traversal would break but perhaps you weren't using it at that point. But let's say later on you decided to use something like ReverseDepthFirstAdapter to add position information to your AST and at that point you'll realize that it doesn't work because of the hack. I've been through stories like this with the hacks I wrote, and that's why the idea of AST modification doesn't look appealing.

Let me give you an example. In my language, I have types (classes) that can reference each other. So after AST is built I need to do type resolution. I could have extended AST to store that information but instead I chose to build my own semantic tree based on AST, and I am glad I did because as I add more functionality, it makes more sense to have one more layer on top of AST.

I am attempting to convince Etienne that today a parser should be designed with concurrency in mind. Parsing the file once and processing it in multiple threads, I am sure, is not something specific to my task.

Regards,
Alexander

Etienne Gagnon

unread,
Dec 19, 2012, 9:07:11 AM12/19/12
to sab...@googlegroups.com
Hi Alexander,

You do not need to convince me that SableCC ASTs should support concurrency; I agree with this.

I think that SableCC ASTs already support concurrency, even without synchronization (assuming a fully conforming JVM) when the AST is not concurrently modified.

In other words, the following already works:
public class Main {
  public static void main(String[] args) throws Exception {
    Node ast = new Parser(...).parse();
    for(...) {
      new InterpreterThread(ast).start();
    }
  }
}
public class InterpreterThread extends Thread {
  private final Node ast;
  public InterpreterThread(Node ast) {
    this.ast = ast;
  }
  public void run() {
    ... // walk the ast without modifying it
  }
}
I think that the programmer that creates a multi-threaded interpreter running on a SableCC AST can easily avoid modifying the AST, if immutability is a safety requirement for his concurrency design.

I don't see how modifying the ASTs generated by SableCC as you suggest would add anything to the above, except for partial static safety against a programmer that tries to modify the AST. I say partial safety, as we don't control what native code could do to AST objects!

Note that Java has not much support for real immutable objects. Immutability is approximated, in Java, by limiting the interface. For example, the String class is made approximately immutable by not providing methods to modify it observable state. In reality, it is quite possible that, internally, it is actually mutable for laziness reasons (such as lazy computation of hashcode).

Have fun!


Etienne
Etienne Gagnon, Ph.D.
http://sablecc.org
On 2012-12-19 00:50, Alexander Ryzhov wrote:

Gonzalo Ortiz Jaureguizar

unread,
Dec 19, 2012, 9:29:19 AM12/19/12
to sab...@googlegroups.com
I use SableCC ASTs like Etienne said and it works perfectly ;)


2012/12/19 Etienne Gagnon <ega...@j-meg.com>

Alexander Ryzhov

unread,
Dec 20, 2012, 1:29:48 PM12/20/12
to sab...@googlegroups.com
I decided to put the "final" theory to the test. I wrote a test program where one thread creates a deep tree similar to AST and shares it with other threads without synchronization, using only the "final" trick. The receiving threads traverse the tree and check that it's intact. To make things worse, the program didn't pass the object holding the final to a new thread but it had a thread pool where threads read the object from a static volatile variable. Added a loop and made things somewhat random to provoke different race conditions. I got a hold of a 2-CPU Quad Core Xeon 5620 server and ran it there on the latest OpenJDK 7. To my great surprise, it worked. But that's not all. Then I removed the final modifier, and it worked as well! Then I tried running it on JDK 1.6, and that worked, too.

I also observed an interesting effect by replacing the static volatile variable with a non-volatile variable guarded by a synchronized block: performance dropped by 20% but CPU load during the run was 25% compared to 100% with volatile.

And finally (unrelated to the discussion but still interesting), performance results varied greatly depending on the number of threads. Best results were achieved with thread count = 6 even though the server had 16 virtual cores. I observed that when the thread count is <= 8, the JVM tried to run all of them on one CPU.

I'd like to apologize for my ignorance and thank everyone who contributed to the discussion. My views were based on old Java books and I realized that I still have a lot to learn. I am excited about performance testing and I can see that Java performance can be made a lot better with more understanding about JVM internals. Finally, I am convinced that AST can be safely shared between threads without expensive synchronization.

Alexander
FinalTest.java

Pomeroy, Roger C

unread,
Jan 3, 2013, 12:18:25 PM1/3/13
to sab...@googlegroups.com

Is there still an eclipse plugin for SableCC available?  I found a link on the website ( http://cse.unl.edu/~kdeng/myweb/fun/code/code.html) , but it is no longer active. 

 

Thanks!

 

Roger

Etienne Gagnon

unread,
Jan 3, 2013, 3:38:42 PM1/3/13
to sab...@googlegroups.com
Hi Roger,

There's another plugin on the old site: http://sablecc.sourceforge.net/downloads/eclipseplugin3.zip

Let me know if it still works.


Etienne
Etienne Gagnon, Ph.D.
http://sablecc.org
Le 13-01-03 12:18, Pomeroy, Roger C a écrit :

Is there still an eclipse plugin for SableCC available?  I found a link on the website ( http://cse.unl.edu/~kdeng/myweb/fun/code/code.html) , but it is no longer active. 

 

Thanks!

 

Roger

--

Bui Hong Phuc

unread,
Jan 3, 2013, 3:44:43 PM1/3/13
to sab...@googlegroups.com
Hi, you can try to open your Eclipse project in Netbeans and use the Netbeans plugin.
Netbeans can open Eclipse's project without any change in eclipse's specific config
file. After you have your stable grammar, you can use Eclipse to develop your project.
--
Reply all
Reply to author
Forward
0 new messages