Current project status

30 views

Skip to first unread message

Ultimus

unread,

Jan 9, 2010, 11:01:43 PM1/9/10

to anic

Hey there, Adrian (project lead) here.

So I've been getting quite a bit of email asking me to clarify what
the status of the project is; we're very much alpha dev (not feature-
complete), and the I'm aware that the examples from the tutorial don't
fully compile (even the default test suite doesn't succeed) -- yet.
But we're getting there!

Thanks for the budding interest and support; the future of the project
is taking shape and this is probably the ideal time to get involved!

In particular, in the immediate future, I'm looking to make sure that
the compiler builds cleanly on all sorts of dev environments (we're
trying to be platform agnostic, relying only on classic posix/gcc
stuff). I'm trying to write portably and test on a wide variety of
archs, but my resources are limited. So anyone who tries building the
source on their machine and reports on errors/warnings is actually
helping out lot! Just run make and see what you get.

Secondly, I'm rethinking the compiler installation framework; right
now it's naive and silly (src/installBinary.sh). If anyone with more
experience in multiplatform installation can help out, that would be
really useful. Things like putting together a man page source + build
script + installation are also easy to do and high-priority but I'd
rather put my own efforts towards the compiler core; anyone with
experience in the area is highly encouraged to throw their skills into
the mix.

Thirdly, the language spec is constantly evolving, and now is the time
to make comments/suggest improvements to the language (before
everything is implemented and difficult to change). Currently, I'm
ironing out the last of the typing details and will finish type flow
analysis in the near future. Once every part of the program is well-
typed, code gen is trivial; the types unambiguously dictate the code
to be run, so we can just inject code using a loop and a huge switch
statement. For the time being, this part of the coding can be left to
me; it's all planned out and just needs to be written.

However, for the most part, I will give committing rights to anyone
who requests them because I'm sure there's tons of things to fix. A
few people have already informally submitted patches, which I will be
incorporating onto the repo. But in the future, if you feel you can
bring something to the code, just ask for commit perms and go wild!
Version control means you can't do any damage I won't be able to fix.

Thanks again for the comments/support: keep it coming; it really does
help!
Adrian

Dominic Morneau

unread,

Jan 10, 2010, 8:57:54 PM1/10/10

to anic

I think it may be more important to clarify the code generation part,
which won't be that trivial if your aim is indeed better-than-C
performance. I am not talking about the implementation (which I think
is what you mean by a loop/switch), but just the logic of it, what you
want to generate. The cost of mapping every object onto an OS thread
could be very high. Will the runtime use user threads? How will the
mapping of user threads to OS threads be done? What about load
balancing? Will every stream and latch require a mutex (ouch!)?

My impression is that you worry a lot about the compiler and syntax,
but I'd say it's actually the easy part. C(++) makes it look harder
than it is (although there is boost::spirit and other libs to help)
but a parser for a simple language like the examples so far could be
built very quickly in an ML or Haskell. What really matters is the
output. Is it going to be a bytecode? C? Assembler? LLVM? What will be
part of the runtime? How will you ensure good performance without a
lot of mutexes and blocking?

What about the standard library? Even if you were to output C
(questionable if you're "faster than C"), C won't provide easy-to-use
thread-safe containers. Let's say I need to use a hash map to store a
map of IPs to hostnames, or something similar. How will that work?
Sure, you could put a big mutex on it, but then you lose the main
advantage over C. Clojure and Haskell, among others, provide special
data structures for good parallel performance, but it's not a simple
problem.

Also, how will memory be managed? If it's automatic, how will you
collect garbage and still beat C? If you're thinking on using the
Boehm GC, AFAIK it will stop all your threads while it collects
garbage, and it's not an especially fast collector. Given your heavily
threaded approach, reference counting could be even worse.

I'm actually surprised how most of the comments on Hacker News, etc.
so far are just on syntax and (IMHO) trivially fixable issues. It
sounds like you've got an interesting idea, but I think you may be
seriously underestimating what comes after the AST is generated. Then
again, I only know what's posted so far on the site. It would be
interesting to have more information on this on the FAQ.

Daniel Kersten

unread,

Jan 10, 2010, 9:57:08 PM1/10/10

to ani-co...@googlegroups.com

Hi,

The following is probably opinion based on my own understanding of
dataflow programming and certainly dosn't reflect truths about ANI ;-)

2010/1/11 Dominic Morneau <dmor...@gmail.com>:

> I think it may be more important to clarify the code generation part,
> which won't be that trivial if your aim is indeed better-than-C
> performance. I am not talking about the implementation (which I think
> is what you mean by a loop/switch), but just the logic of it, what you
> want to generate. The cost of mapping every object onto an OS thread
> could be very high. Will the runtime use user threads? How will the
> mapping of user threads to OS threads be done? What about load
> balancing? Will every stream and latch require a mutex (ouch!)?

The key to a good dataflow compiler is determining how much to merge
into a single parallel execution block. The more fine grained the
blocks are (eg, each object is a parallel block in itself), the more
dataflow-y everything is and the more theoretically parallel
everything is - but the greater thread overhead is. So, a dataflow
compiler would, generally, merge blocks together to produce less
fine-grained parallel blocks to reduce thread overhead.
For example, it may make sense to merge all blocks within a single
pipe (which executes sequentially anyway) into a single block - one
parallel execution block per pipe. I guess, some clever graph analysis
would need to be done to determine the optimal granuality. Ideally,
the code within a block would try to take advantage of
instruction-level parallelism too.

The question, though, is if number of available processing cores
should affect the block granuality? If so, then a JIT compiler may be
needed, so that the block sizes can be adjusted at runtime. This may
be desirable regardless.

>
> My impression is that you worry a lot about the compiler and syntax,
> but I'd say it's actually the easy part. C(++) makes it look harder
> than it is (although there is boost::spirit and other libs to help)
> but a parser for a simple language like the examples so far could be
> built very quickly in an ML or Haskell. What really matters is the
> output. Is it going to be a bytecode? C? Assembler? LLVM? What will be
> part of the runtime? How will you ensure good performance without a
> lot of mutexes and blocking?

The syntax is important, because if people do not like or understand
it, then nobody would use the language. Of course, there are a lot of
other, similarly (or even more) important aspects too, such as library
support...

As for language choice, I'm guessing Adrian is using what hes most
familiar with, for better or worse.

Generated code - if it were me, I'd generate wither assembly, LLVM or
some hybrid (to allow for runtime code mutation/block granuality
modification/dynamic runtime compilation (eg to support things like
eval)), though I guess it depends on the language goals and what
Adrian is most familiar/comfortable with.

>
> What about the standard library? Even if you were to output C
> (questionable if you're "faster than C"), C won't provide easy-to-use
> thread-safe containers. Let's say I need to use a hash map to store a
> map of IPs to hostnames, or something similar. How will that work?
> Sure, you could put a big mutex on it, but then you lose the main
> advantage over C. Clojure and Haskell, among others, provide special
> data structures for good parallel performance, but it's not a simple
> problem.

Actually, there are some languages which claim to be faster than C
which DO output C: Applied Type System (ATS) being one of those. I
guess how this works is that ATS has more metadata about the code and
types so that it can often produce more highly optimised C code (or
code the C compiler can better optimise) than anyone would ever hand
produce.

I think, ultimately, you would want to bypass C and do it yourself, if
you really want to overtake C (especially if you want
instruction-level parallelism, thread-level parallelism, SSE and such,
you might be able to generate better code with the knowledge that you
have than the C compiler can), but then you need to rewrite all of the
existing things that good optimising C compiler already do... a big
task.

>
> Also, how will memory be managed? If it's automatic, how will you
> collect garbage and still beat C? If you're thinking on using the
> Boehm GC, AFAIK it will stop all your threads while it collects
> garbage, and it's not an especially fast collector. Given your heavily
> threaded approach, reference counting could be even worse.

I imagine most memory is automatically managed by the latches and
streams. Dataflow languages, as I see them, don't really use memory in
the same way as imperative languages.

In imperative languages, you have a static chunk of data (the data
isn't static, the chunk is - well, kinda) and a stream of instructions
flows through manipulating it.
In dataflow languages, you have a static chunk of code with a stream
of data flowing through the instructions.

The difference is that in imperative languages, you have no way of
knowing that the data is no longer needed - a garbage collector needs
to scan the chunk of data and see whats being referenced and what
isn't, so it can free anything no longer reachable, but in dataflow,
as soon as a stream/pipe/connection-between-code-nodes ends, that data
is no onger in use and can be freed. This simplifies garbage
collection a lot.

Of course, that assumes that if a data stream is used by two execution
nodes, it is actually duplicated, but you can elliminate this
requirement through simple reference counting. In fact, I would
probably use copy-on-write reference counted data streams.

There may be cases when it IS desirable to reference static blocks of
memory that aren't treated in the dataflow manner. A framebuffer
maybe? In those cases you could apply a traditional garbage collector
or maybe use manual memory management.

>
> I'm actually surprised how most of the comments on Hacker News, etc.
> so far are just on syntax and (IMHO) trivially fixable issues. It
> sounds like you've got an interesting idea, but I think you may be
> seriously underestimating what comes after the AST is generated. Then
> again, I only know what's posted so far on the site. It would be
> interesting to have more information on this on the FAQ.
>

Thats a tough one.. The syntax dictates how the code will be
generated, yet the generated code dictates how the syntax will need to
work. There is some flexibility with the syntax, of course, but I see
them, at least partially, tied together. I guess it depends on
priorities: are the execution semantics more important, or the syntax?
They're both pretty important, IMHO. Without good execution semantics,
you lose the advantages of this approach, but without good syntax, it
becomes a pain to program in...

--
Daniel Kersten.
Leveraging dynamic paradigms since the synergies of 1985.

Ultimus

unread,

Jan 10, 2010, 10:03:42 PM1/10/10

to anic

On Jan 10, 8:57 pm, Dominic Morneau <dmorn...@gmail.com> wrote:
> I think it may be more important to clarify the code generation part,
> which won't be that trivial if your aim is indeed better-than-C
> performance. I am not talking about the implementation (which I think
> is what you mean by a loop/switch), but just the logic of it, what you
> want to generate. The cost of mapping every object onto an OS thread
> could be very high. Will the runtime use user threads? How will the
> mapping of user threads to OS threads be done? What about load
> balancing? Will every stream and latch require a mutex (ouch!)?
>

I do describe this on the main page, but to clarify: what is threaded
is not objects, but the pipes themselves. Further, there isn't any
static mapping of threads to pipes; an administrator thread
dynamically modifies jump points to route worker threads at run time,
the pool of which will be statically determined at compile-time based
on how much parallelism the underlying architecture can handle (in
number of processing elements).

The mapping of user->kernel threads will be 1-1 and I'm planning on
letting the OS take care of scheduling (not ideal, but most flexible).
And it's the administrator routing thread that handles synchronization
and dispatching threads to pipes; static data flow analysis builds
complete dependency graphs that for the most part eliminate the need
for locking in favor of synchronously patching addresses in a largely
compile-time determined fashion.

> My impression is that you worry a lot about the compiler and syntax,
> but I'd say it's actually the easy part. C(++) makes it look harder
> than it is (although there is boost::spirit and other libs to help)
> but a parser for a simple language like the examples so far could be
> built very quickly in an ML or Haskell. What really matters is the
> output. Is it going to be a bytecode? C? Assembler? LLVM? What will be
> part of the runtime? How will you ensure good performance without a
> lot of mutexes and blocking?
>

I'm not too picky about syntax specifics, but the syntax does actually
play a large role into how all of this will be compiled; I will give
examples shortly for how this is so. The target is raw assembly with
labels specifying the entry points and meta data for each pipeline;
most objects are allocated _at compile-time_, statically within the
binary; the syntax (in which data flows are specified in a rather
strict heiarchical fashion) allows for this static allocation to be
implemented quite simply. With regard to concurrent locking, as
mentioned previously, the data path is very much statically
determinate, and metadata is included with each pipe that allows the
arbitrator thread to know when it is safe to dispatch a thread to a
specific part of the code.

> What about the standard library? Even if you were to output C
> (questionable if you're "faster than C"), C won't provide easy-to-use
> thread-safe containers. Let's say I need to use a hash map to store a
> map of IPs to hostnames, or something similar. How will that work?
> Sure, you could put a big mutex on it, but then you lose the main
> advantage over C. Clojure and Haskell, among others, provide special
> data structures for good parallel performance, but it's not a simple
> problem.
>

We're not outputting C; it's raw assembler. Maps are planned to be
part of the standard library, though here I use the term "standard
library" loosely. In fact, most (if not all) of the stuff in the "std"
namespace won't be ANI library code; it will be hand-implemented in
assembler, because such things need to be extremely fast. Thus the
"std" namespace is reserved for magic-like hooks to assembly code
under the hood. The other stuff (like additional math libraries) will
be in another namespace, and will be libraries in the traditional
sense in that they will be implemented in ANI (and fully precompiled
at compiler-build time) so as to not have needless compile-time
overhead.

So with your example, there will be an assembly hook to a hashmap
implementation written in assembly that will map bitstrings (arbitrary
object hashes) to object addresses under the hood. But to the
programmer, they will just see a "magic" std.map object that works
fast and as they would expect it to in any other language.

> Also, how will memory be managed? If it's automatic, how will you
> collect garbage and still beat C? If you're thinking on using the
> Boehm GC, AFAIK it will stop all your threads while it collects
> garbage, and it's not an especially fast collector. Given your heavily
> threaded approach, reference counting could be even worse.
>

There is actually very minimal memory management needed for ANI
programs, and the language+syntax was designed from the bottom up not
to need it ;-). As I already mentioned, most objects+pipes will be
compiled statically into the binary. Of course, this means larger
binaries, but since we're moving into the age of 64-bit computing and
cheap disk space, this is becoming less and less of a concern. You
could say that ANI leans heavily towards the "time" end of the "time-
space tradeoff". The only part that needs dynamic memory management is
streams, which are just bounded buffers under the hood and easy to
handle; no need to reference count or GC because in ANI, cyclic
references are not allowed (there's no way to create them in ANI, due
to the syntax+semantic grammar). So maybe there's a good reason for
paying attention to syntax early on in development.

> I'm actually surprised how most of the comments on Hacker News, etc.
> so far are just on syntax and (IMHO) trivially fixable issues. It
> sounds like you've got an interesting idea, but I think you may be
> seriously underestimating what comes after the AST is generated. Then
> again, I only know what's posted so far on the site. It would be
> interesting to have more information on this on the FAQ.

It's really not up to me what people are talking about, and I agree
that focusing too much on syntax might be missing the point. However,
there _has_ been a great deal of thought put into syntax as well as
the back-end implementation (in case the above didn't make that
clear). I think the main reason behind the confusion is that I don't
plan on building the compiler in the traditional sense: No memory
management in the language, no need for locking due to smart thread
routing, and dynamic jump patching to get control flow are all ways
that _anic_ strives to differ from traditional compilation schemes; I
hope you agree that all of this is promising. I've done my research,
and I do have a lot of experience with writing compilers from the
ground up, but I fully admit in the first sentence on the main project
page that this is a highly experimental project -- the likes of which
I haven't seen before; hopefully it ends up being fruitful!

In the end, I fully agree: the FAQ needs to be extended; it's just not
easy to find the time to do so. I do have other obligations to tend to
besides this project; that's a good part of the reason I'm asking for
help.

Thanks for the thoughtful comments and concerns; if there's anything
that needs clarification, please get back to me!
Adrian

Reply all

Reply to author

Forward

0 new messages