Example
The following code below is what I am referring to.
k := {Op: ONAME, Type: keyType}
v := {Op: ONAME, Type: valueType}
m := {Op: ONAME, Type: map[keyType]valueType} // Already exists as it's iterating over it
range := {Op: ORANGE, List: {k, v}, Right: m}
hit := {Op: ONAME, Type: *hiter } // Created or recycled iterator
iterConditional := {Op: ODOTPTR
for := {Op: OFOR, Ninit: init, Left: {Expr: {iter.key != nil}}, Right: {Op: OCALL, Function: "mapiternext"}}
Of course this is a simplified version, there's a lot more (I.E indirection of the key and value to return them by copy, the fact that we need to obtain the (*Sym) of the key and value (first two fields in the iterator), etc. I'm just trying to make a point here. Such a simple statement turns into a lot, and I'm very intrigued into how it does so.
For example, if 'for' gets turned into OFOR, and range turns into ORANGE, it still needs some way to get linked together. I.E, both tokens are not unrelated, and one requires the other to even make logical sense (I.E a range without a for loop wouldn't make much sense at all). Hence clearly it has a lot of rules set in place. Now I'm interested in how I could potentially add my own identifier/keyword, that also has it's own semantics as well.
New Keyword
The keyword I would like to implement maintains backwards compatibility in a rather unique way, which Go currently does not do, and I believe it can be implemented in a way that is not only very extensible (and allowing other additions to keywords, or at least newer semantics on the fly) but also in a way that makes sense (I.E, the keyword would make logical sense, and is not just a random token that cannot be used).
I want to implement a keyword that has the following rules and restrictions...
1) It must contain at least one parameter, of which the type must be a map, and explicitly, must be a certain type of map (to those who remember my previous posts, a Concurrent Map). Hence I need a way to typecheck reliably and throw compiler errors if it is wrong. It shouldn't be too hard, I'm no stranger by now to typechecking in the AST
2) A certain package must be included. The keyword 'belongs' to a certain package, and is only available while that package is imported. Hence, the lexer needs to recognize the '.' character as a potential valid character for a keyword. This also should not be hard.
3) Parse grammar and syntax into lexical tokens, which is the hardest part, as I'm not too familiar with how such things work.
4) Depending on where it is used, depends on what it does. I plan on having the keyword modify range loop semantics for my particular type of map only (throw a compiler error if a non concurrent map is used). The other is used at a statement level, which wraps a group of statements (which the statements inside shouldn't be too hard, there are examples inside the compiler I can work on).
The 'sync.Interlocked' keyword
Originally I was going to go with 'runtime.Interlocked' keyword, but I figured 'sync' makes more sense. Here examples of it's statement-level semantics
m := make(map[int]int, 0, CONCURRENT)
sync.Interlocked m {
m[1] += m[2] + m[3]
m[2] -= m[3] - m[4]
}
Now imagine the body above was used in a shared concurrent map. The Read-Modify-Write instructions would be incorrect, even though the accesses themselves are fine and well protected. The issues here are of course that even though individual access to the map is atomic and fast (for record, currently the map is fully working, iteration, insertion, removal, retrieval, etc., are all working, and besides for iteration which has it's quirks, they all scale VERY well under little and high contention; under little contention, I see gains of 2 - 3x, with high contention, I see gains of up to 100x, not even lying about that either... then again, the tests used are still rather primitive, just randomized (millions of) operations on GOMAXPROCS Goroutines), instructions which rely on the previous for results are, unfortunately, not.
My goal is to acquire all locks in order, first, before proceeding. This requires the compiler to do some magic.The above will be transformed into
m := make(map[int]int, 0, CONCURRENT)
mapacquire(m, 2, 3, 1, 4)
m[1] += m[2] + m[3]
// Note that the key '1' is no longer used after the above Read-Modify-Write statement
maprelease(m, 1)
m[2] -= m[3] - m[4]
maprelease(m, 2, 3, 4)
Generally, any potentially deadlocks can be solved this way, as obtaining keys ahead of time ensures atomic access to the map until it completes, and it much easier than rolling back actions (especially since I wouldn't know where to begin implementing software transactional memory in Go). Any conflicts can result in a type of global ordering based on the Goroutines that want to acquire the keys. Exponential back-off is a way to prevent excessive livelock as well.(Speaking of which, if anyone could let me know how to make Goroutines sleep for a few cycles without the entire 'm' going to sleep, please let me know).
Next, is the range iterator portion.
m := make(map[int], 0, CONCURRENT)
for k, v := range sync.Interlockd m {
v.field += something()
v.someMutatingFunction()
}
The above is rather unique. The only difference is that the keyword 'sync.Interlocked' is there, which modifies the semantics of the iterator. The iterator will return the value by pointer (to the one stored in the bucket), and the lock on bucket only released after the bucket is exhausted and it moves on to the next.The main issue is that `someMutatingFunction()` could be a long function call, which is why the keyword modifies the semantics if and only if the user wants it that way. The normal mode acts on an atomic snapshot and greatly reduces contention by releasing the bucket immediately after copying. The issue there is that the value is returned by value, so modifications to it are not possible. This also allows the for loop body to execute for however long it wants, as it is working on a copy. Lastly, if an element is removed from the map, and it is still in the snapshot, it will still be available, which may not be what the user intends. The keyword 'sync.Interlocked' will ensure that whatever they use will be in the list, as they always have the bucket lock.
I should also note that the map works by having individual bucket Test-And-Test-And-Set Spin Locks (in the future, with exponential backoff once I figured how to safely do so). Once a bucket is full, it just gets converted to another array of buckets (size increased by power of two of last one) to further increase concurrency and greatly reduce collision. The performance gain so far is enormous and I would like to continue with this, even if the general consensus is it not being approved. Hence, when I say "it acquires the keys', I mean it acquires the bucket lock.
TL;DR
1) How do I add a new keyword, 'sync.Interlocked' which can modify the semantics of pre-existing AST nodes, and also create my own
2) How do I turn my own keyword into AST nodes?
3) Are there any other things I need to know, like how AST nodes are transformed into actual executable code in the SSA? Or should the nodes be enough as they merely add some compiler-inserted additions and function calls, and/or change presently existing nodes.
All I need is the knowledge on where to begin with the keyword portion, and the rest I can do on my own.