Much discussion has been made on IRC concerning symbol names.
The request, mainly, is for imcc to handle sigil characters from other languages which basically equates to exposing a lot to imcc from the high-level language. I won't argue how much of that is good or bad; I'd rather just try to make imcc as friendly as possible.
The state of things:
1) Declared symbols can be handled pretty easily with any character we want to support, imcc just has to track it. It just so happens that we don't allow many non alpha characters at this time.
2) $ is currently used to denote a symbolic register ($I[0-9]+ is an int register). which is not pre-declared. It just pops up in the instruction stream and imcc assigns a register.
It is possible that we can stick with $ for temporaries, but make imcc check symbol tables first, and allow people to declare symbols with $ as well. This would solve some issues but might make for some confusing looking code.
In reality it would not really be that confusing if you don't know your variables the same convention as temporaries, but who can guarantee that.
Another option is to use quotes for symbols with sigils, but since most of our code will end up coming from Perl6, that won't be optimizing for the common case.
Finally, I for one, support name mangling. Its arguable how much high level compilers should expose to the back-end compiler. I think, though, that most people prefer to be able to debug PIR code and see the original symbols, and I sympathize.
This is just an example to stimulate discussion. I'd like to hear all sides before making any decisions.
(And remember namespaces when considering solutions)
I don't like the leading C<.> option, what about having a leading _ for temporaries instead and allowing any non-space, non-operator character in symbol names, so _$foo would be a valid temp. This has the advantage of not conflicting with symbolic registers.
The other option is to force registers to use a sigil that does not occur in perl such as # Then all of the "normal" sigils (&, %, @, and $) would be available for use in variable names.
Melvin Smith wrote: > RFD = Request For Discussion ;)
> Much discussion has been made on IRC concerning > symbol names.
> The request, mainly, is for imcc to handle sigil characters > from other languages which basically equates to exposing > a lot to imcc from the high-level language. I won't > argue how much of that is good or bad; I'd rather just try to > make imcc as friendly as possible.
> The state of things:
> 1) Declared symbols can be handled pretty easily with any character > we want to support, imcc just has to track it. It just so happens > that we don't allow many non alpha characters at this time.
> 2) $ is currently used to denote a symbolic register ($I[0-9]+ is an int > register). > which is not pre-declared. It just pops up in the instruction stream and > imcc assigns a register.
> It is possible that we can stick with $ for temporaries, but make imcc > check symbol tables first, and allow people to declare symbols with $ > as well. This would solve some issues but might make for some > confusing looking code.
> In reality it would not really be that confusing if you don't know your > variables the same convention as temporaries, but who can guarantee > that.
> Another option is to use quotes for symbols with sigils, but since most > of our code will end up coming from Perl6, that won't be optimizing > for the common case.
> Finally, I for one, support name mangling. Its arguable how much high > level compilers should expose to the back-end compiler. I think, though, > that most people prefer to be able to debug PIR code and see the > original symbols, and I sympathize.
> This is just an example to stimulate discussion. I'd like to hear all sides > before making any decisions.
> (And remember namespaces when considering solutions)
> I don't like the leading C<.> option, what about having a leading _ for > temporaries instead and allowing any non-space, non-operator character > in symbol names, so _$foo would be a valid temp. This has the advantage > of not conflicting with symbolic registers.
But could it potentially conflict with labels, which also start with an underscore right now IIRC. I agree with the idea that non-alphanumeric should be allowed in symbol names, such as $, though. [1]
> The other option is to force registers to use a sigil that does not > occur in perl such as #
# is currently used for comments. Of course, we can change this (and the underscore situation above if I'm right about it) if we really wanted...but...
> Then all of the "normal" sigils (&, %, @, and $) would be available for
use in variable names. It isn't just Perl we're dealing with. Other languages could potentially have other sigils. e.g. _ and #. Other languages have no sigils, in which case name mangling is certainly needed as a variable called I2, for example, would cause all kinds of "fun" if not mangled.
I would go with the idea of having a sigil that is placed before all local variables, and another (different!) sigil for registers (of the IMCC-handled type). Anything without one of those is a direct register access. Or a syntax error. Clean, simple rules. What the sigils are is relatively immaterial if what is placed after them (for locals, not registers) can contains non-alphanumeric stuff. And whatever sigils a language wants can be put there. This way, name mangling can be "avoided" - though arguably we're defining a syntax that "auto-mangles". :-)
Jonathan
[1] (C|S)hould we potentially provide a "quoting" mechanism, e.g. for languages that want variable names containing characters that are not allowed due to IMCC syntax rules? Or is it up to the compiler to emit "compliant" names? And I'm too scared of unicode to mention unicode variable names.
>I don't like the leading C<.> option, what about having a leading _ for
I don't care. Really, I don't care. I kinda like $, but I don't care. I currently get by just with $[I.N.S.P]nnn symbolic temporaries because I set a flag to use them. At the switch of a flag I can emit code using _XX_<original_var_name>_nnnn, where XX is some helpful info regards type, <original_var_name> is, well, the original var name, and nnnn is a four-digit number I made up to make me fairly happy that imcc won't confuse the local integer i in one routine with the local integer i in another routine, since it is obvious to me that IMCC cannot possibly cope with different scope rules for languages left right and sundry.
Personally, I think you should change $ to . if and only if it helps perl (which is not my bag). The rest of us, in the words of Dan, can cope: a little whining is acceptable, if somewhat unbecoming ;-)
Jonathan Worthington writes: > I would go with the idea of having a sigil that is placed before all local > variables, and another (different!) sigil for registers (of the IMCC-handled > type). Anything without one of those is a direct register access. Or a > syntax error. Clean, simple rules. What the sigils are is relatively > immaterial if what is placed after them (for locals, not registers) can > contains non-alphanumeric stuff. And whatever sigils a language wants can > be put there. This way, name mangling can be "avoided" - though arguably > we're defining a syntax that "auto-mangles". :-)
Hooray! That is precisely what sigils are for. No use in making a "variable" sigil that shares its name with a register sigil.
On the other hand, we could define four sigils and do away with the $S358 syntax, like so:
?foo # I register +foo # N register ~foo # S register $foo # P register
(I took the most Perl6ish representitave sigils I could think of... doesn't matter what they are, really)
> Jonathan
> [1] (C|S)hould we potentially provide a "quoting" mechanism, e.g. for > languages that want variable names containing characters that are not > allowed due to IMCC syntax rules? Or is it up to the compiler to emit > "compliant" names? And I'm too scared of unicode to mention unicode > variable names.
I can envisage something like:
$'%foo' = new PerlHash
But is that all that more readable than:
$Hfoo = new PerlHash
I would say we should definitely go with something like this if registers held more permanent values. But in writing my own compilers, I've found that most of my register naming comes from prefixing a constant string to an incremented counter. Lexical pads already let you do this, and those are the ones that need to.
Bascially, I think the current system works fine, but it would be nice to namespace locals somehow.
Melvin Smith <mrjoltc...@mindspring.com> wrote: > Another option is to use quotes for symbols with sigils,
And we have to cope with unicode finally. So I'd vote for that alternative. *But* as code normally comes out of a compiler and there may be many different compilers, we can't deal with arbitrary symbols, because, we don't know the scoping rules of these compilers.
We can only deal with mangled symbol names.
my $i; { my $i ; }
> (And remember namespaces when considering solutions)
where C<name> is a mangled symbol name like now or even C<$P\d+>. We have to know, if the symbol is a temporary or not for spilling. Lexical and globals have their store in the lex pad or in the stash, so for spilling we don't have to store these variables, we only need to refetch, where we now fetch from the spill array.
The .lexical and .global directives should use the appropriate lexical or global opcodes to deal with these symbols.
The unmangled name is just for diagnostics and will be stored in a different packfile segment.
>The request, mainly, is for imcc to handle sigil characters >from other languages which basically equates to exposing >a lot to imcc from the high-level language.
If you're looking for a "How do I use $foo in my imcc code?" then I have one of two answers:
1) You don't, doofus. Go fetch it out of the symbol table by name, with
var1 = global [foo; bar] "$foo:
or
var1 = local "$foo"
2) .alias is your friend!
.alias some_nice_symbol global [foo;bar] "$foo" .alias some_other_symbol local "$bar"
Either way, I don't think IMCC should have to deal with language symbols explicitly. -- Dan
--------------------------------------"it's like this"------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk
Dan Sugalski wrote: > Either way, I don't think IMCC should have to deal with language symbols > explicitly.
Zhat's true. But still we need to know, *what are* language symbols. I've stated several times that for the spilling code its essential to know, if a symbol has already a store in either lexicals or globals, so that we can just cut down life the range of such symbols, if spilling is needed.
>>Either way, I don't think IMCC should have to deal with language >>symbols explicitly.
>Zhat's true. But still we need to know, *what are* language symbols. >I've stated several times that for the spilling code its essential >to know, if a symbol has already a store in either lexicals or >globals, so that we can just cut down life the range of such >symbols, if spilling is needed.
Right, hence the option to either use global/local (or something like that) to load into safely named things, or adding in .alias to rename them to something safe.
Or, I suppose, we could go and move IMCC over to being AST-driven, in which case it turns into a simple text->AST mapping problem... :) -- Dan
--------------------------------------"it's like this"------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk