On 5/13/2021 3:01 PM, Stefan Monnier wrote:
>>>> Similarly, the loss of a few bits off the low end of a double tends to be
>>>> pretty much invisible in most cases, ...
>>> FWIW, I don't know of any language implementation which uses the
>>> low-bits of floats for tags (and hence gives up on IEEE-floats
>>> semantics).
>> Specifics of IEEE semantics matters as far as it tends to have a visible
>> effect on the results. If the results of the deviation are mostly invisible,
>> it is "generally acceptable".
>
> I think it's only "generically acceptable" in the sense that most casual
> users won't notice, but most real/serious users of floating point will
> sooner or later bump into and get *really* frustrated. And once this
> becomes known, relevant users will simply turn to other
> languages/implementations.
>
> IOW it works as long as your language implementation is toy/experimental.
> E.g. It would have worked for the original implementation of Javascript, but
> in today's uses of Javascript it would be a non-starter.
>
> It might work if you provide both IEEE float and "cheap/fast floats"
> under separate types, but sooner or later you'll end up discovering that
> there's enough pressure to make IEEE floats efficient that the
> "cheap/fast floats" aren't worth the trouble any more.
>
For most of the stuff I am doing, it seems to work without issue.
>>> It's definitely not an insane idea in theory, but "non-IEEE floats" are
>>> really not popular, it seems (and the fact that truncating the last few
>>> bits means basically "double rounding" doesn't help, of course).
>> Tagbits for flonums was semi-common AFAIK (at least more so than putting
>> everything else in a NaN).
>
> As I said, I'm not familiar with any implementation doing that (of
> course, I'm only familiar with a small part of the proglang world), so
> I'd be curious to see some references.
>
> E.g. as far as know all the Common-Lisp implementation use "flonums"
> represented as boxed floats (e.g. the `+` operation allocates a heap
> object in which to place the result when it's a float and return
> a (tagged) pointer to it).
>
Hard to skim and find that much of an overarching view on these matters.
People don't seem that inclined to list and describe this stuff.
From what I can gather:
SpiderMonkey: Uses NaN tagging
Chrome V8: Uses heap boxing
CPython: Pointer to heap object (no tagrefs)
Erlang: Apparently also uses heap boxing
...
There was a Scheme VM that used single-precision floats inside a 64-bit
tagged value.
Also saw a scheme VM which used a variant of NaN tagging where they
XORed the value with the NaN bit-pattern such that the pointer case
looked like a bare pointer.
...
This leaves BGBCC (and the BJX2 port of my BS2 language) as possibly an
outlier it seems by using a right-shifted value which discards the
low-order 2 bits.
Note that the x86-64 BGBScript2 VM had used a different scheme:
Middle-range Double values were represented at full precision;
Values with large/small exponents were shifted right 4 bits.
For the BJX2 port, a 2-bit right shift was simpler and faster.
Also, the BJX2 core is pretty slow (50MHz), so don't necessarily want to
do type-tagging on it in a way that would be so slow as to make
dynamically typed operations basically unusable.
As noted, older VMs of mine (including early forms of the original
BGBScript VM) had generally used heap boxing. However, I had moved away
from it for performance reasons.
Some versions had also used a "float24" format which consisted of a
single-precision float with the low 8 bits cut off and shoved into a
pointer. The *suck* with this one was pretty obvious though, so it was
fairly short-lived...
As noted, during the later evolution of this VM, it had moved over to a
primarily statically-typed / type-inferred core (loosely comparable to
the Java VM). The original BGBScript2 VM was also in a similar category
(using an explicitly-typed stack-oriented bytecode).
Some amount of the VM design in both VMs was influenced by the JVM, .NET
CLR, and also the IA-64 C++ ABI spec.
In the BS VMs, pointers remained tagged by default, whereas most static
(non-variant) values were stored in an non-tagged form. Generally, they
were built on a JVM-like type model (ILFDAX/ILDAX,
Int/Long/(Float)/Double/Address/X128).
Most other types were treated as subtypes:
Int: char, short, int, ...
Long: long, long long, ...
Float: Float, Half
Double: Double
Address: pointers, object, variant, ...
X128: Int128, Float128, Vec4F, ...
Variant types remained as a special case, either as an explicit request
(BS2 and C + extensions), or if type-inference failed and if no explicit
type was given.
As can be noted, my BJX2 emulator is also roughly similar technology to
what was used in the later BGBScript VMs.
And also, BGBCC (my C compiler), was originally itself a fork off of one
of the earlier BGBScript VMs. And, sort of developed in its own way
along a similar path.
As noted, it can also function as a BGBScript2 compiler (both C and BS2
share the same backend codegen and similar on this target).
If I did implement BS1L though, it would partly be a regression in some
ways to a much earlier form of the language. Though, as noted, it still
needs to be fast and lightweight enough to hopefully be usable on
something running at 50MHz with a relatively modest amount of RAM.
It is assumed partly that BS2 could partly serve as a bridge between C
and BS1L code (since BS1L would be built on a subset of BS2's typesystem).
Also, because some cases are still coming up where some form of
scripting language could be useful (which don't fit in ideally with
static compilation). But, also can't really afford a big/complicated VM
on this target.
...