yeah.
this is why the explicit endian variables could also be implicitly
misaligned. it could also be possible to specify explicit alignment via
an attribute, but this isn't currently supported.
the ways it is implemented generally gloss over whatever the hardware
does (so, it can still do unaligned operations even on targets which
only support aligned loads/stores, by internally using byte operations
and shifts).
currently, my C compiler/VM doesn't do SIMD, but this might get
implemented eventually.
keywords were '__bigendian' and '__ltlendian'.
ex:
__bigendian int x;
or:
i=*(__bigendian int *)ptr;
possible explicit alignment value:
__declspec(align(2)) __bigendian int x;
or (if another keyword were added):
__align(2) __bigendian int x;
>> typically, for compile-time stuff, there is a mess of #define's and
>> #ifdef's for figuring out the target architecture and other things,
>> picking appropriate type-sizes and setting values for things like
>> whether or not the target supports misaligned access, ...
>>
>> I guess it could be nicer if more of this were standardized.
>
> Yes, preprocessor metaprogramming is not much better than template
> metaprogramming if to read it. So these things are better to keep
> away from code, in a separate library.
>
usually, this is a 'whatever_conf.h' file which is copied/pasted around,
sometimes with tweaks.
it also does explicit-size types, partly because I often use a compiler
which until very recently did not support 'stdint' stuff...
offsets are used because this saves needing to scan over the payload
lumps and rebuild the table. granted, the cost of this probably wouldn't
be huge, but the table will be needed either way, and if it is already
present in the data, it doesn't need to be built.
note that the file is not read-in/loaded sequentially, but is basically
loaded into the address space and used in-place.
it is Tag/Length/Value, yes.
the format is vaguely akin to if IFF and ASN.1 BER were fused together.
ASN.1 style tags are used mostly for compact structures (with typically
about 2 bytes of overhead), with TWOCC for intermediate structures (4 or
6 bytes overhead), and FOURCC for top-level structures (8 or 12 bytes of
overhead).
* 0x00-0x1F: Public Primitive (Class=0)
* 0x20-0x3F: Public Composite (Class=1)
* 0x40-0x5F: Private Primitive (Class=2)
* 0x60-0x7F: Private Composite (Class=3)
* 0x80-0x9F: Context Primitive (Class=4)
* 0xA0-0xBF: Context Composite (Class=5)
** ccct-tttt:
*** ccc=class, ttttt=tag
*** tag=0..30, tag encoded directly
*** tag=31, tag is escape coded.
* 0xC0-0xDF: Reserved
* 0xE0-0xFF: Special Markers
** 0xE0, End Of Data
** 0xE1, len:WORD24
** 0xE2, len:BYTE
*** Context Dependent Untagged Data
** 0xE3, len:WORD24, tag:TWOCC
** 0xE4, len:WORD24, tag:FOURCC
** 0xE5, len:BYTE, tag:TWOCC
** 0xE6, len:Word56, tag:FOURCC
** 0xE7, len:WORD24, tag:EIGHTCC
** 0xE8, len:WORD24, tag:SIXTEENCC
** 0xE9, len:Word56, tag:EIGHTCC
** 0xEA, len:WORD56, tag:SIXTEENCC
*** Tagged Markers
I have used variations on this design for a number of formats (it
actually started out mostly in some of my video codecs).
>>
>> *: the VM needs to be able to keep timing latencies bounded, which
>> basically weighs against doing anything in the VM where the time-cost
>> can't be easily predicted in advance. wherever possible, all operations
>> need to be kept O(1), with the operation either being able to complete
>> in the available time-step (generally 1us per "trace"), or the VM will
>> need to halt and defer execution until later (blocking is not allowed,
>> and any operations which may result in unexpected behaviors, such as
>> halting, throwing an exception, ... effectively need to terminate the
>> current trace, which makes them more expensive).
>>
>> for some related reasons, the VM is also using B-Trees rather than hash
>> tables in a few places (more predictable, if slower, than hashes, but
>> less memory waste than AVL or BST variants). likewise, because of their
>> structure, it is possible to predict in advance (based on the size of
>> the tree) approximately how long it will take to perform the operation.
>
> One interesting tree is splay that often works best in real application.
>
yeah.
splay is fast, but not necessarily all that predictable nor memory
efficient (it is more like AVL in the memory-efficiency sense, and
leaves open the possibility of an O(n) worst case).
B-Tree is not necessarily the fastest option, but should be fairly
predictable in this case.
like, you really don't want a random edge-case throwing a wrench in the
timing, and fouling up external electronics by not updating the IO pins
at the correct times or something.
nevermind if it is questionable to do high-level logic and also deal
with hardware-level IO on the same processor core, but alas... (why have
a separate ARM chip and an MCU, when you can save some money by just
doing everything on the main ARM chip?...).
>>> Above can't be done with embedded system so easily since it can affect price
>>> of unit to organize flashing the very same product differently for each
>>> market targeted. When access is sequential then polished Huffman decoding
>>> does actually rarely affect performance. So I have seen embedded systems
>>> keeping the text dictionaries Huffman encoded all time. If to keep texts
>>> Huffman encoded anyway then UCS-2 or UTF-16 are perfectly fine and there
>>> are no need for archaic tricks like Windows-1252 or Code Page 437.
>>
>> granted, but in this case, it is mostly for string literals, rather than
>> bulk text storage.
>
> The stings of application do not typically form some sort of bulk but a
> sort of dictionary of short texts.
>
not sure I follow.
in the case of the VM, they are basically a pair of string tables, one
for ASCII and UTF-8, and the other for UTF-16. each string is terminated
with a NUL character.
for the ASCII table, string references are in terms of byte offsets,
with it depending on context how the string is encoded.
if C is compiled to the VM, it really doesn't care, since it has good
old 'char *', so will use pointers into the string table.
the script-language does care, but will implicitly declare the type as
part of the process of loading the string into a register (and, in this
case, the VM will remember the type of string).
though, granted, in this VM, the string type will be handled by
implicitly using tag bits in the reference. this allows using a
reference directly to the string-table memory, without needing an
intermediate structure (so, it is sort of like a normal 'char *'
pointer, just with a hidden type-tag in the reference).
>> Windows-1252 covers most general use-cases for text (and is fairly easy
>> to convert to/from UTF-16, as for most of the range the characters map 1:1).
>> CP-437 is good mostly for things like ASCII art and text-based UIs.
>>
>> for literals, it will be the job of the compiler to sort out which
>> format to use.
>>
>> bulk storage will tend to remain in compressed UTF-8.
>> though a more specialized format could be good.
>>
>> I had good results before compressing short fragments (such as character
>> strings) with a combination of LZ77 and MTF+Rice Coding, which for small
>> pieces of data did significantly better than Deflate or LZMA. however,
>> the MTF makes it slower per-character than a Huffman-based option.
>>
>> basically, options like Deflate or LZMA are largely ineffective for
>> payloads much under 200-500 bytes or so, but are much more effective as
>> payloads get bigger.
>
> Those packing algorithms are all for larger texts. With relatively short
> one-liners one can perhaps make some special sub-string-packing but it
> will be computationally expensive to pack.
>
well, as noted, I had some limited success with MTF+Rice.
though, there isn't much that can be done with short character strings.