On 5/11/2012 6:27 PM, Joe Kesselman wrote:
> On 5/11/2012 8:29 PM, BGB wrote:
>> in the case of the compiler ASTs, a DOM-like system was used internally,
>> rather than raw structures.
>
> Personally I would do a custom datastructure and give it an XML
> serializer, or some other adapter layer that lets you view it in terms
> of an XML infoset -- because trying to shove things into DOM form is
> going to be much less memory-efficient and slower to access than a more
> dedicated representation would be.
>
actually, at one point there was an interpreter of mine itself based on
directly interpreting said ASTs in DOM form, and yes, it was slow...
I don't actually know just how slow it was, but I realize now that an
earlier Scheme interpreter of mine which was running "fast" in
comparison (of the naive "directly execute source expressions" variety),
was in-fact running 10,000x slower than native, I suspect this thing was
very possibly around 100k or 1M times slower than native... (then again,
at the time, it also was using a memory manager where every type-check
also involved a linear search over the entire heap, ...).
the thing was basically a hack where I had wrote a parser which parsed a
JavaScript like syntax into DOM nodes, and fed it into a hacked-up
XML-RPC implementation.
this incredible slowness led me to later switch over to "wordcode" (like
bytecode except an array of 16-bit shorts), and later over to bytecode.
(later on I also switched from using bytecode to internally using
threaded-code, but bytecode remains as the "canonical" representation).
both then and now, a fair amount of type-checking is done using strings
and "strcmp()", as most types are identified by name (this strategy won
out due to being most convenient, and not actually all that expensive), ...
now the interpreter is much faster, so performance is no longer a major
issue.
as-is (in the present), yes, those ASTs can chew through memory
(especially for the C compiler front-end), but the present
implementation has a fair amount of optimizing, and so performance
doesn't actually seem to be all that bad in this case (the XML-related
code is not a significant time-waster in the profiler, including for my
C-compiler frontend, which is the main place the XML-based ASTs are
still used).
granted, yes, there is some internal trickery, like the attributes can
encode numbers directly (as doubles), rather than representing them as
strings, ...
luckily, RAM use isn't really a huge issue on modern systems.
I also don't really feel that raw structs would offer all that much
advantage in this case, since although it is a little easier to access
fields, the drawback is that different nodes would likely need different
structs, and would create additional issues related to serialization.
in terms of tradeoffs, there is not that much huge advantage
usability-wise of 'node->value' over 'dyxGeti(node, "value")', so it may
well be a reasonable tradeoff...
>> yes, but note the original stated purpose:
>> mostly for humans looking over debugging dumps.
>
> If it's for the humans, they will want to be able to use their preferred
> existing XML tools to process those dumps -- otherwise there's no
> advantage to using XML at all, and you might as well use whatever
> nonportable custom representation you prefer... which will probably be
> more readable that raw XML syntax since you can tune it for the needs of
> that specific task.
>
> Or, as a compromise, output XML and then provide a tool which translates
> it into your compact human-readable representation. Then folks who want
> to use text editor to view your version can use that tool, while others
> who prefer an editor which manipulates the XML tree -- or who want to
> use a stylesheet to render the data into another representation entirely
> -- will have that option.
>
it is possible, but as noted it probably would have been option-enabled
anyways, meaning that even if supported, probably some action would be
used to enable it (and it could also be turned back off again, probably
by an option which could be put into a config file or similar).
>>> Finally: XML's greatest value is that there are lots of tools already in
>>> place that support it. This won't be true of any new syntax.
>>>
>>
>> doesn't particularly matter in this case:
>
> XML is just another tool, and no tool is right for all purposes.
> Screwdrivers make poor hammers. Hammers make worse screwdrivers. If
> interoperability and toolability isn't your goal, XML may not be
> relevant for you; do what makes sense for your task.
>
fair enough, it is just used for this part of the system.
> I have no opinion on the suggested syntax as a representation for
> non-XML trees; I tend to either use raw data or indentation and/or
> delimiters (Lisp/Scheme parens, Algol-family braces, whatever). How well
> your proposal works is going to depend heavily on what kinds of data
> you're presenting and what people are trying to extract from it.
>
as noted before, it would be used for printing the internal DOM-like nodes.
given I am already using a system which is internally XML-based,
sticking with an XML-like syntax would make sense (or, at least,
something composed of tags and attributes). switching out to something
radically different would be a fairly major alteration.
many other parts of the system use a Lisp-like form, but they also use a
different representation internally as well (lists composed of cons-cells).
sadly, at present, parts of my VM which use S-Expressions for ASTs and
parts which use XML based ASTs are largely incompatible.
it would be nice sometimes if it were one or the other, but neither is
"clearly better" (S-Expressions are faster, but not very flexible, and
XML is more flexible, but also a little slower and more awkward to work
with). similarly, there is no known good way to merge them without
creating a horrible mess.
ironically, when S-Expressions are organized into a tagged structure
(similar to XML), they actually seem to use more memory than the
equivalent in XML/DOM-style nodes...
so, no ideal solutions here...