The Answer to Life, the Universe, and Everything

207 views
Skip to first unread message

Quadibloc

unread,
Jun 11, 2022, 9:31:48 PMJun 11
to
One advantage to 24-bit computers is that in addition to handling 6-bit
upper-case only character codes, they can also fit three eight-bit
characters in a word, and so use modern character codes like ASCII
and EBCDIC that have room for lower-case letters!

It's true that they eventually found a use for the extra eighth bit
in a byte with ASCII, by going to the 8859-1 code, with accented
letters for a lot of foreign languages. Or one could use the high
bit to indicate two-byte characters in a DBCS for Chinese or
Japanese.

But many computer systems didn't bother much with furrin
languages. And so there's that wasted bit! So instead of a
24-bit word that handles both 6-bit characters and 8-bit
characters...

maybe we should want a computer that can handle 6-bit
characters and 7-bit characters! (Especially if four-bit
BCD-coded decimal digits are not a concern.) So perhaps
there should have been more computers with a 42-bit
word length. (ASI did make _one_ comuter with a 21-bit
word length...)

John Savard

Stefan Monnier

unread,
Jun 11, 2022, 11:41:40 PMJun 11
to
Quadibloc [2022-06-11 18:31:46] wrote:
> maybe we should want a computer that can handle 6-bit
> characters and 7-bit characters! (Especially if four-bit
> BCD-coded decimal digits are not a concern.) So perhaps
> there should have been more computers with a 42-bit
> word length. (ASI did make _one_ comuter with a 21-bit
> word length...)

Fractional bit length anyone?


Stefan

Michael S

unread,
Jun 12, 2022, 9:05:56 AMJun 12
to
Itanium (42.(6) bits per instruction) appears to satisfy both yours and John's wishes simultaneously.

Stefan Monnier

unread,
Jun 12, 2022, 10:36:04 AMJun 12
to
Michael S [2022-06-12 06:05:54] wrote:
> Itanium (42.(6) bits per instruction) appears to satisfy both yours and
> John's wishes simultaneously.

Then I guess the Next Frontier is irrational bit length, then?
Unless Itanium's demise means we have to re-invent fractional bit
length first? Or should we go straight to imaginary bit length?


Stefan

David Brown

unread,
Jun 12, 2022, 10:58:04 AMJun 12
to
Such machines would be of no use without software support. Fortunately,
there is already a proposal for extending integer types in C that will
cover this (and more) :

<https://open-std.org/JTC1/SC22/WG21/docs/papers/2018/p0989r0.pdf>

Stefan Monnier

unread,
Jun 12, 2022, 11:25:56 AMJun 12
to
David Brown [2022-06-12 16:58:00] wrote:
> Such machines would be of no use without software support. Fortunately,
> there is already a proposal for extending integer types in C that will cover
> this (and more) :
>
> <https://open-std.org/JTC1/SC22/WG21/docs/papers/2018/p0989r0.pdf>

Sadly, it hasn't yet made it into a standard; furthermore this was being
discussed for C++, it's probably going to be many more years before it
makes it into C.


Stefan

Ivan Godard

unread,
Jun 12, 2022, 12:40:26 PMJun 12
to
Some languages have integral types with arbitrary static bounds, with
bounds checking. Mary is the only one I know that had biased bounds:
type twentieth range(1900, 1999);
fit in a byte.

MitchAlsup

unread,
Jun 12, 2022, 1:02:28 PMJun 12
to
PL/1 had widths applied and range checked.
ADA has an entire system of width application and range checking.

Ivan Godard

unread,
Jun 12, 2022, 2:02:05 PMJun 12
to
But not with biased representation as far as I know; please correct me.

Niklas Holsti

unread,
Jun 12, 2022, 2:03:10 PMJun 12
to
On 2022-06-12 19:40, Ivan Godard wrote:
> On 6/12/2022 7:58 AM, David Brown wrote:
>> On 12/06/2022 05:41, Stefan Monnier wrote:
>>> Quadibloc [2022-06-11 18:31:46] wrote:
>>>> maybe we should want a computer that can handle 6-bit
>>>> characters and 7-bit characters! (Especially if four-bit
>>>> BCD-coded decimal digits are not a concern.) So perhaps
>>>> there should have been more computers with a 42-bit
>>>> word length. (ASI did make _one_ comuter with a 21-bit
>>>> word length...)
>>>
>>> Fractional bit length anyone?
>>>
>>
>> Such machines would be of no use without software support.
>> Fortunately, there is already a proposal for extending integer types
>> in C that will cover this (and more) :
>>
>> <https://open-std.org/JTC1/SC22/WG21/docs/papers/2018/p0989r0.pdf>


Good fun, that.


> Some languages have integral types with arbitrary static bounds, with
> bounds checking. Mary is the only one I know that had biased bounds:
>     type twentieth range(1900, 1999);
> fit in a byte.


Some Ada compilers do biased representations, too, but I believe it is
not a requirement of the Ada standard.

Example source file twenth.ads:

package twenth is
type twentieth is range 1900 .. 1999 with Size => 8;
end twenth;

Compiling with the gcc-based GNAT Ada compiler, using "gnatmake twenth.ads":

gcc -c twenth.ads
twenth.ads:2:46: warning: size clause forces biased representation
for "twentieth"

This works also with a Size of 7 bits, the smallest possible size for
this numeric range even with bias.

Ivan Godard

unread,
Jun 12, 2022, 2:14:01 PMJun 12
to
I stand corrected. This didn't exist the last time I had anything to do
with Ada; thank you.

MitchAlsup

unread,
Jun 12, 2022, 2:33:52 PMJun 12
to
ADA has the equivalent to "type twentieth range(1900, 1999); ".

David Brown

unread,
Jun 12, 2022, 3:29:03 PMJun 12
to
On 12/06/2022 20:03, Niklas Holsti wrote:
> On 2022-06-12 19:40, Ivan Godard wrote:
>> On 6/12/2022 7:58 AM, David Brown wrote:
>>> On 12/06/2022 05:41, Stefan Monnier wrote:
>>>> Quadibloc [2022-06-11 18:31:46] wrote:
>>>>> maybe we should want a computer that can handle 6-bit
>>>>> characters and 7-bit characters! (Especially if four-bit
>>>>> BCD-coded decimal digits are not a concern.) So perhaps
>>>>> there should have been more computers with a 42-bit
>>>>> word length. (ASI did make _one_ comuter with a 21-bit
>>>>> word length...)
>>>>
>>>> Fractional bit length anyone?
>>>>
>>>
>>> Such machines would be of no use without software support.
>>> Fortunately, there is already a proposal for extending integer types
>>> in C that will cover this (and more) :
>>>
>>> <https://open-std.org/JTC1/SC22/WG21/docs/papers/2018/p0989r0.pdf>
>
>
> Good fun, that.
>

I expect many people read the starting sections, taking it somewhat
seriously - but only a few bother to get towards the end or look at the
publication date!


Quadibloc

unread,
Jun 12, 2022, 4:47:08 PMJun 12
to
On Sunday, June 12, 2022 at 1:29:03 PM UTC-6, David Brown wrote:

> I expect many people read the starting sections, taking it somewhat
> seriously - but only a few bother to get towards the end or look at the
> publication date!

2018-04-01
which is April 1, 2018.

Oh, dear, then the integer types in C won't get extended!

However, specifying 24 bit integers as "short long int" and extending
that principle further is far too complicated and confusing; one wants
an explicit method to specify the number of bits involved.

John Savard

Marcus

unread,
Jun 13, 2022, 12:54:27 AMJun 13
to
Since C99 we have stdint.h that provides fixed size integers, e.g:

int16_t, uint8_t, int64_t, etc.

I doubt that int24_t/uint24_t would find its way into the standard
unless it's supported in HW by all popular ISAs, though.

/Marcus

Marcus

unread,
Jun 13, 2022, 4:11:35 AMJun 13
to
On 2022-06-12, Quadibloc wrote:
> One advantage to 24-bit computers is that in addition to handling 6-bit
> upper-case only character codes, they can also fit three eight-bit
> characters in a word, and so use modern character codes like ASCII
> and EBCDIC that have room for lower-case letters!
>
> It's true that they eventually found a use for the extra eighth bit
> in a byte with ASCII, by going to the 8859-1 code, with accented
> letters for a lot of foreign languages. Or one could use the high
> bit to indicate two-byte characters in a DBCS for Chinese or
> Japanese.

UTF-8 is a thing.

> But many computer systems didn't bother much with furrin
> languages. And so there's that wasted bit! So instead of a
> 24-bit word that handles both 6-bit characters and 8-bit
> characters...
>
> maybe we should want a computer that can handle 6-bit
> characters and 7-bit characters! (Especially if four-bit
> BCD-coded decimal digits are not a concern.) So perhaps
> there should have been more computers with a 42-bit
> word length. (ASI did make _one_ comuter with a 21-bit
> word length...)

I still consider text processing an "unsolved problem" in modern CPU
architectures and programming languages. Traversing text character-by-
character, and possibly decoding along the way (as with UTF-8, for
certain operations), is just terribly inefficient. The predominant
paradigms in use are best suited for decades old un-pipelined CISC
(memory-memory) architectures.

Ideally you would want to be able to handle (most) text strings as
efficiently as integer numbers. E.g. copy, compare, branch-if-..., etc
with the same cost as a single register-register integer operation.

/Marcus

David Brown

unread,
Jun 13, 2022, 4:17:50 AMJun 13
to
The types int24_t/uint24_t /are/ in the standard, and have been since
C99 - along with all other intNN_t and related types for all bit sizes.
However, they are all optional (except int_least8_t, int_least16_t,
int_least32_t, int_least64_t, the matching int_fastNN_t types, and their
unsigned pairs). A C implementation must support the intNN_t type if
and only if it provides a type of that size and characteristic (no
padding bits, two's complement representation) as a standard or extended
integer type.

So if a C compiler for an architecture already provides 24-bit types
(and they turn up in some DSP's and other specialised devices), then
int24_t and uint24_t are required by C99.

(Conversely, an implementation that does /not/ support 8-bit, 16-bit,
32-bit and/or 64-bit types does not have to have types such as int32_t
that it does not support, without breaking conformity with the
standards. Again, you see real-world modern DSP's that do not have
int8_t or int16_t types.)


There is also the proposed "_BitInt(N)" feature for the next C standard,
C23. (clang/llvm has an implementation of approximately the same
feature as an extension.) The main target is FPGAs and other situations
where C code is used on specialist hardware. A _BitInt(32) is not
exactly the same as an int32_t (promotion rules are different, for a
start), but it will be very close.

<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2763.pdf>

(This is from June, not April :-) )

Terje Mathisen

unread,
Jun 13, 2022, 10:39:11 AMJun 13
to
You can _almost_ do this with SIMD compares to find the first
difference, then a (possibly wide/complicated) lookup of the
non-matching positions via a collating order table to determine <=>.

People have been able to make JSON or XML parsing close to an order of
magnitude faster this way, but it is not intellectually trivial. :-(

OTOH, people consistently misbehave even when comparing simple fp values
as soon as the first NaN turns up.

Terje


--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

anti...@math.uni.wroc.pl

unread,
Jun 13, 2022, 7:47:55 PMJun 13
to
Representation is matter for compiler. For example, Pascal compiler
in principle could allocate all numbers on heap and represent them
by (hidden) pointers, that is stupid but legal representation.
For range types biased representation is legal, but it is matter
for compiler developers to decide if it is worth the trouble.
AFAIK most (all??) Pascal compilers decided that biased representation
has less benefits than drawbacks, but at least for GNU Pascal
it was considered, with result "we do not want it now".

--
Waldek Hebisch

Quadibloc

unread,
Jun 14, 2022, 9:49:41 AMJun 14
to
On Monday, June 13, 2022 at 2:11:35 AM UTC-6, Marcus wrote:

> I still consider text processing an "unsolved problem" in modern CPU
> architectures and programming languages. Traversing text character-by-
> character, and possibly decoding along the way (as with UTF-8, for
> certain operations), is just terribly inefficient. The predominant
> paradigms in use are best suited for decades old un-pipelined CISC
> (memory-memory) architectures.
>
> Ideally you would want to be able to handle (most) text strings as
> efficiently as integer numbers. E.g. copy, compare, branch-if-..., etc
> with the same cost as a single register-register integer operation.

Much of what people want to _do_ with text pretty much has to be done
character by character.

My approach to solving the problem, therefore, is this:

Yes, it would be wasteful to tie up a modern CPU - just as it is wasteful
to tie up a 360/195 - processing text character by character. So, just as
the floating-point unit has its own pipeline, and the integer unit has another
pipeline... send character string instructions off to the character processing
unit.

It handles those instructions in its own slow way, but without tying up
the main CPU which continues to do integer and FP operations at
breakneck speed.

Whether the character box looks like a 6800 or like a 360/85 is a matter
of taste.

John Savard

BGB

unread,
Jun 15, 2022, 3:02:27 PMJun 15
to
Yeah, BGBCC already implements _BitInt(N) ...

Semantics are mostly that it either represents the value as one of the
supported native types, or as a large value-type object (for N larger
than 128 bits; with storage padded to a multiple of 128 bits).



Though, FWIW:
BGBCC also implements "short float" as a way to specify scalar Binary16
values.

Though, with the funkiness that Binary16 is 16-bit in memory, but
represented using Binary64 in registers (and in non-aliased local
variables), which has a non-zero risk of funky semantic effects (due to
the much larger dynamic range and precision).

Similar also applies for "float", where in both cases the size and
format of the in-memory representation depends on whether or not someone
took the address of the variable.


However, the difference is large enough that, where, a calculation
performed using Binary16 may produce "noticeably different" results from
one calculated using Binary64 being passed off as Binary16.

For now, had mostly been hand-waving this has been "not really too much
of an issue in practice".

This differs when using SIMD ops, where intermediate results are
actually stored in the specified formats.



There is not currently any C type for FP8 formats (FP8U or FP8S), but I
could potentially add these, say, maybe __float8u_t, __float8s_t.

Though, the use-case for scalar FP8 is unclear, existing use-cases as
mostly as a space-saving alternative to Binary16 vectors.

These could be added as-is with existing instructions, they would mostly
just require 3-op load/store sequences (byte load/store with two
converter ops).


Though, FP8U is used as one of the texture-formats for my LDTEX instruction.

Previously I was going to use RGB30A, but:
FP8U is cheaper to encode/decode than RGBA30A;
For textures with an alpha-channel, FP8U has better quality;
The loss of quality and dynamic range for opaque textures is minor.

FP8U is limited to positive values, but this isn't likely a big issue
for textures. In theory, it is still possible to use RGBA64F textures
for negative pixels.


However, at present, there is no direct (C level) support for FP8 values
(but whether or not they would be useful enough to justify actual types
is debatable, compiler intrinsics being "good enough").


Started debating for a moment whether A-law support could be added, but
this is a little too niche (at most it would maybe justify an intrinsic,
even this is debatable). Encoding A-law via going through the FPU could
probably be faster than the current process of doing it via integer ops
though.

...

Thomas Koenig

unread,
Jun 16, 2022, 6:26:27 AMJun 16
to
Terje Mathisen <terje.m...@tmsw.no> schrieb:
> Marcus wrote:

>> I still consider text processing an "unsolved problem" in modern CPU
>> architectures and programming languages. Traversing text character-by-
>> character, and possibly decoding along the way (as with UTF-8, for
>> certain operations), is just terribly inefficient. The predominant
>> paradigms in use are best suited for decades old un-pipelined CISC
>> (memory-memory) architectures.
>>
>> Ideally you would want to be able to handle (most) text strings as
>> efficiently as integer numbers. E.g. copy, compare, branch-if-..., etc
>> with the same cost as a single register-register integer operation.
>
> You can _almost_ do this with SIMD compares to find the first
> difference, then a (possibly wide/complicated) lookup of the
> non-matching positions via a collating order table to determine <=>.

So, what are the instructions that would be needed to make this easier?

A "find first non-matching byte" instruction could help, which
would do an xor of two registers and then count the number of
tailing zero bytes (up to a maximum). This would just be (for 64 bits)
an xor, a bytwise AND and an eight-bit count trailing zeros with a mask
generated by the maximum value.

Range detection might profit from a bytewise "in range" comparison
with an upper and lower bound given in two separate words (iff it is
any cheaper than the straightforward two comparisons and one and).

> People have been able to make JSON or XML parsing close to an order of
> magnitude faster this way, but it is not intellectually trivial. :-(

Coming from you, that sounds rather scary.

> OTOH, people consistently misbehave even when comparing simple fp values
> as soon as the first NaN turns up.

There is the article "Do Programmers Understand IEEE Floating Point?"
with the sobering observation "Many developers do not understand core
floating point behavior particularly well, yet believe they do."
https://users.cs.northwestern.edu/~pdinda/Papers/ipdps18.pdf

Terje Mathisen

unread,
Jun 16, 2022, 8:18:30 AMJun 16
to
Thomas Koenig wrote:
> Terje Mathisen <terje.m...@tmsw.no> schrieb:
>> Marcus wrote:
>
>>> I still consider text processing an "unsolved problem" in modern CPU
>>> architectures and programming languages. Traversing text character-by-
>>> character, and possibly decoding along the way (as with UTF-8, for
>>> certain operations), is just terribly inefficient. The predominant
>>> paradigms in use are best suited for decades old un-pipelined CISC
>>> (memory-memory) architectures.
>>>
>>> Ideally you would want to be able to handle (most) text strings as
>>> efficiently as integer numbers. E.g. copy, compare, branch-if-..., etc
>>> with the same cost as a single register-register integer operation.
>>
>> You can _almost_ do this with SIMD compares to find the first
>> difference, then a (possibly wide/complicated) lookup of the
>> non-matching positions via a collating order table to determine <=>.
>
> So, what are the instructions that would be needed to make this easier?
>
> A "find first non-matching byte" instruction could help, which
> would do an xor of two registers and then count the number of
> tailing zero bytes (up to a maximum). This would just be (for 64 bits)
> an xor, a bytwise AND and an eight-bit count trailing zeros with a mask
> generated by the maximum value.

Today you do that with SIMD compare, followed by extract mask (to get a
single bit from each compare) and then a find first 0/1 instruction. In
combination this is a bit too much overhead. Having Mill-style None
markers, and/or a fast way to generate write masks, as well as a
collapsing copy/store which elides None/masked bytes would also be helpful.

Not coincidentally, the Mill does all of these things pretty much
perfectly imho.

>
> Range detection might profit from a bytewise "in range" comparison
> with an upper and lower bound given in two separate words (iff it is
> any cheaper than the straightforward two comparisons and one and).

Yeah.

x86/SIMD got a huge chunk of silicon a decade+ ago, in the form of an
opcode which does 256 (?) simultaneous compare ops. I have never used
it, and don't know why it was included, but you can supposedly use it
for all sorts of strange stuff.
>
>> People have been able to make JSON or XML parsing close to an order of
>> magnitude faster this way, but it is not intellectually trivial. :-(
>
> Coming from you, that sounds rather scary.
>
>> OTOH, people consistently misbehave even when comparing simple fp values
>> as soon as the first NaN turns up.
>
> There is the article "Do Programmers Understand IEEE Floating Point?"
> with the sobering observation "Many developers do not understand core
> floating point behavior particularly well, yet believe they do."
> https://users.cs.northwestern.edu/~pdinda/Papers/ipdps18.pdf

Too true. :-(

Michael S

unread,
Jun 16, 2022, 11:43:31 AMJun 16
to
On Thursday, June 16, 2022 at 3:18:30 PM UTC+3, Terje Mathisen wrote:
>
> x86/SIMD got a huge chunk of silicon a decade+ ago, in the form of an
> opcode which does 256 (?) simultaneous compare ops. I have never used
> it, and don't know why it was included, but you can supposedly use it
> for all sorts of strange stuff.
>

"decade+ ago" sounds like SSE4.2 (Nov2008), but I am not sure that the
rest fits.
My impression was that on all Intel or AMD implementations up to this day
SSE4.2 is rather small chunk of silicon, on some cores even tiny.
On "big" cores they tend to do 32 comparisons per clock per execution port,
so when there are 2 or 3 SSE4.2-capable SIMD ALUs per core you can see a
throughput figures like 1 instruction per 4 clocks with latency around
10 clocks.
AMD Zen series appears to be the fastest with measured (by AgnerF) throughput
of 1/3 and latency of 8-11 clocks.
Unfortunately, I can not find any info about SSE4.2 performance of Intel
Alder Lake P cores (Golden Cove). It's not mentioned in Intel's otherwise
rather comprehensive instruction tables and Agner also did not Golden Cove
numbers yet.

Back at introduction (Nehalem) there were already two SSE4.2 ALUs, but the
throughput was a little lower than expected - 1 per 5 clocks. I would guess
that in pre-SandyB uArchs Intel had additional bottleneck around transfer of
results from SIMD to GPRs.

MitchAlsup

unread,
Jun 16, 2022, 1:09:32 PMJun 16
to
On Thursday, June 16, 2022 at 5:26:27 AM UTC-5, Thomas Koenig wrote:
> Terje Mathisen <terje.m...@tmsw.no> schrieb:
> > Marcus wrote:
>
> >> I still consider text processing an "unsolved problem" in modern CPU
> >> architectures and programming languages. Traversing text character-by-
> >> character, and possibly decoding along the way (as with UTF-8, for
> >> certain operations), is just terribly inefficient. The predominant
> >> paradigms in use are best suited for decades old un-pipelined CISC
> >> (memory-memory) architectures.
> >>
> >> Ideally you would want to be able to handle (most) text strings as
> >> efficiently as integer numbers. E.g. copy, compare, branch-if-..., etc
> >> with the same cost as a single register-register integer operation.
> >
> > You can _almost_ do this with SIMD compares to find the first
> > difference, then a (possibly wide/complicated) lookup of the
> > non-matching positions via a collating order table to determine <=>.
> So, what are the instructions that would be needed to make this easier?
>
> A "find first non-matching byte" instruction could help, which
> would do an xor of two registers and then count the number of
> tailing zero bytes (up to a maximum). This would just be (for 64 bits)
> an xor, a bytwise AND and an eight-bit count trailing zeros with a mask
> generated by the maximum value.
<
88110 had compare instructions that had "any byte different" and
"any halfword different"
>
> Range detection might profit from a bytewise "in range" comparison
> with an upper and lower bound given in two separate words (iff it is
> any cheaper than the straightforward two comparisons and one and).
> > People have been able to make JSON or XML parsing close to an order of
> > magnitude faster this way, but it is not intellectually trivial. :-(
> Coming from you, that sounds rather scary.
> > OTOH, people consistently misbehave even when comparing simple fp values
> > as soon as the first NaN turns up.
> There is the article "Do Programmers Understand IEEE Floating Point?"
> with the sobering observation "Many developers do not understand core
> floating point behavior particularly well, yet believe they do."
> https://users.cs.northwestern.edu/~pdinda/Papers/ipdps18.pdf
<
This is a education and management problem not an architecture problem.
<
And there are hundreds of papers like this illustrating how the typical
programmer faced with unit-calculation having FP just tries to "get the
job done" leaving all sorts of subtle bugs in the program.

JimBrakefield

unread,
Jun 16, 2022, 5:41:56 PMJun 16
to
On Thursday, June 16, 2022 at 5:26:27 AM UTC-5, Thomas Koenig wrote:
Rather than doing table lookup or character translate from a memory byte array,
do a single bit result from using 64, 128 or 256 bits from the register file (1, 2 or 4 registers).
In FPGA land called a LUT, here a 6, 7 or 8-bit LUT. Have not seen such an instruction
anywhere? Should pipeline or parallelize nicely?

MitchAlsup

unread,
Jun 16, 2022, 6:51:29 PMJun 16
to
What if you need {Alphabetic, numeric, white-space, Capital, Lower,
separator, separator-second }
<
We rarely need single bit lookups.
<
c = getchar();
if( parsetable[c] & alphabetic ) do
{ c = getchar(); }
while( parsetable[c] & ( alphabetic | numeric ) );

JimBrakefield

unread,
Jun 16, 2022, 7:00:35 PMJun 16
to
For 7-bit ASCII need two 64-bit registers for each LUT.
Don't like to use additional process state. 14 registers used for LUTs is do able.
However, the "LUT engine/functional unit" could easily contain registers for several LUTs
as well as complete byte translation table. Just need a prefetch op-code(s) to
load the LUT engine??

Thomas Koenig

unread,
Jun 17, 2022, 11:27:28 AMJun 17
to
Terje Mathisen <terje.m...@tmsw.no> schrieb:
> Marcus wrote:

>> I still consider text processing an "unsolved problem" in modern CPU
>> architectures and programming languages. Traversing text character-by-
>> character, and possibly decoding along the way (as with UTF-8, for
>> certain operations), is just terribly inefficient. The predominant
>> paradigms in use are best suited for decades old un-pipelined CISC
>> (memory-memory) architectures.
>>
>> Ideally you would want to be able to handle (most) text strings as
>> efficiently as integer numbers. E.g. copy, compare, branch-if-..., etc
>> with the same cost as a single register-register integer operation.
>
> You can _almost_ do this with SIMD compares to find the first
> difference, then a (possibly wide/complicated) lookup of the
> non-matching positions via a collating order table to determine <=>.
>
> People have been able to make JSON or XML parsing close to an order of
> magnitude faster this way, but it is not intellectually trivial. :-(

I belive I may have found the library you are referring to,
https://github.com/simdjson/simdjson . The numbers they give are
rather impressive. Unfortunately, it is C++ which means that
the first intellectual challenge is to find out where the
actual work is being done :-)

Terje Mathisen

unread,
Jun 17, 2022, 3:04:21 PMJun 17
to
Michael S wrote:
> On Thursday, June 16, 2022 at 3:18:30 PM UTC+3, Terje Mathisen wrote:
>>
>> x86/SIMD got a huge chunk of silicon a decade+ ago, in the form of an
>> opcode which does 256 (?) simultaneous compare ops. I have never used
>> it, and don't know why it was included, but you can supposedly use it
>> for all sorts of strange stuff.
>>
>
> "decade+ ago" sounds like SSE4.2 (Nov2008), but I am not sure that the
> rest fits.

Like I said, I read about it but never used it myself so I might be
mis-remembering all of it. :-(

> My impression was that on all Intel or AMD implementations up to this day
> SSE4.2 is rather small chunk of silicon, on some cores even tiny.
> On "big" cores they tend to do 32 comparisons per clock per execution port,
> so when there are 2 or 3 SSE4.2-capable SIMD ALUs per core you can see a
> throughput figures like 1 instruction per 4 clocks with latency around
> 10 clocks.
> AMD Zen series appears to be the fastest with measured (by AgnerF) throughput
> of 1/3 and latency of 8-11 clocks.
> Unfortunately, I can not find any info about SSE4.2 performance of Intel
> Alder Lake P cores (Golden Cove). It's not mentioned in Intel's otherwise
> rather comprehensive instruction tables and Agner also did not Golden Cove
> numbers yet.
>
> Back at introduction (Nehalem) there were already two SSE4.2 ALUs, but the
> throughput was a little lower than expected - 1 per 5 clocks. I would guess
> that in pre-SandyB uArchs Intel had additional bottleneck around transfer of
> results from SIMD to GPRs.
>

Terje Mathisen

unread,
Jun 17, 2022, 3:11:59 PMJun 17
to
Thomas Koenig wrote:
> Terje Mathisen <terje.m...@tmsw.no> schrieb:
>> Marcus wrote:
>
>>> I still consider text processing an "unsolved problem" in modern CPU
>>> architectures and programming languages. Traversing text character-by-
>>> character, and possibly decoding along the way (as with UTF-8, for
>>> certain operations), is just terribly inefficient. The predominant
>>> paradigms in use are best suited for decades old un-pipelined CISC
>>> (memory-memory) architectures.
>>>
>>> Ideally you would want to be able to handle (most) text strings as
>>> efficiently as integer numbers. E.g. copy, compare, branch-if-..., etc
>>> with the same cost as a single register-register integer operation.
>>
>> You can _almost_ do this with SIMD compares to find the first
>> difference, then a (possibly wide/complicated) lookup of the
>> non-matching positions via a collating order table to determine <=>.
>>
>> People have been able to make JSON or XML parsing close to an order of
>> magnitude faster this way, but it is not intellectually trivial. :-(
>
> I belive I may have found the library you are referring to,
> https://github.com/simdjson/simdjson . The numbers they give are

That sounds right.

> rather impressive. Unfortunately, it is C++ which means that
> the first intellectual challenge is to find out where the
> actual work is being done :-)
>
:-(

Shades of FFTW/Boost?

Thomas Koenig

unread,
Jun 18, 2022, 8:49:45 AMJun 18
to
Terje Mathisen <terje.m...@tmsw.no> schrieb:
Probably. It is possible to have all the inline stuff in a single
header, which is 32k lines and 1.3 MB (but, in all fairness,
also contains a lot of comments).

Fortunately, they have described their method of
character classification in an article, to be found at
https://arxiv.org/pdf/1902.08318.pdf]%C3%82%C2%A0on where they write

"Instead of a comparison, we use the AVX2 vpshufb in- struction to
acts as a vectorized table lookup to do a vectorized classification
[21]. The vpshufb instruction uses the least significant 4 bits
of each byte (low nibble) as an index into a 16-byte table.

[...]

By doing one lookup, followed by a 4-bit right shift and a second
lookup (using a different table), we can separate the characters
into one of two categories: struc tural characters and white-space
characters. The first lookup maps the low nibbles (least significant
4 bits) of each byte to a byte value; the second lookup maps
the high nibble (most significant 4 bits) of each byte to a byte
value. The two byte values are combined with a bitwise AND."

This limits the classification they can do. It works for JSON,
which has (by design or accident) a set of characters for which
this is enough, but not in the general case.

Terje Mathisen

unread,
Jun 19, 2022, 10:16:35 AMJun 19
to
This happens to exactly mirror some of my own code. :-)

I would have dearly loved to have just one additional bit, i.e.
Altivec's permute() used 5-bit indices into a pair of 16-byte registers.
With AVX-256 and AVX-512 (nee Larrabee), it would have been very obvious
to at least extend the 4-bit lookups into 5 and 6, at which point quite
a few table lookup algorithms become possible.

For bulk (throughput) programming I have considered cascading tables, so
that the second lookup use one of several possible tables based on waht
the first lookup returned, but this only works when you can do them in
parallel, i.e. do 2 or 4 second-stage lookups and use the output of the
first lookup to select which to use: Even using 5 regs to hold those
lookup tables, this does give us a full 6-bit classifer, and we even get
to decide which bits to use for each part.

I have considered such an algorithm for a new word count implementation,
where the current 64kB classification table lookup would be replaced by
byte shuffle/permute operations, but that would mean giving up on one of
the (imho) best features of the current version: The ability to specify
on the command line which characters are separators, which one or two
chars define a newline and which chars are possible inside a word.

Stephen Fuld

unread,
Jun 21, 2022, 7:56:01 PMJun 21
to
This paper raised a number of questions in my mind. If their claim that
JSON parsing takes up to 95% of some CPUs, it is an example of what was
mentioned earlier about new requirements leading to new instructions.
So is there a better way than "repurposing" the SIMD instructions.
Terje mentioned adding a version of AltaVec's permute instruction.


1. Is there a better way, perhaps a more "purpose built" instruction?
If so, should it be specialized to JSON or is there a more generally
useful instruction?

2. What about an attached processor that builds the parse tree from the
string representation "autonomously"?

3. How does this done with MY 66000's VVM?


--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Marcus

unread,
Jun 22, 2022, 3:03:34 PMJun 22
to
My feelings exactly.


> Terje mentioned adding a version of AltaVec's permute instruction.
>
>
> 1.    Is there a better way, perhaps a more "purpose built" instruction?
> If so, should it be specialized to JSON or is there a more generally
> useful instruction?
>
> 2.    What about an attached processor that builds the parse tree from
> the string representation "autonomously"?
>
> 3.    How does this done with MY 66000's VVM?
>
>

My bet is that if you did something along 1 & 2, with support for things
like parsing, string comparisons and perhaps regex and similar common
string processing techniques, you should be able to get a many-fold
performance increase and power consumption reduction for many text
intensive workloads. In some domains (e.g. web servers, JSON/text-based
APIs, and database engines), it just may be worth the effort.

Exactly how such a machine would work, I don't know. My gut feeling is
that you'd want a way to reference and cache immutable strings (in a
special purpose "string store") from the main instruction stream, and
have asynchronous/concurrent string processing going on in an attached
processor. Each string, once pulled in to the string store, can be
preprocessed if needed to attach meta data and classifications, so that
certain processing operations can be accelerated (e.g. strings could be
sorted, hashed and checked for uniqueness to speed up string
comparisons).

(Just thoughts - nothing tested)

/Marcus

MitchAlsup

unread,
Jun 22, 2022, 4:23:00 PMJun 22
to
On Tuesday, June 21, 2022 at 6:56:01 PM UTC-5, Stephen Fuld wrote:
> On 6/18/2022 5:49 AM, Thomas Koenig wrote:
> > Terje Mathisen <terje.m...@tmsw.no> schrieb:

> >
> > Fortunately, they have described their method of
> > character classification in an article, to be found at
> > https://arxiv.org/pdf/1902.08318.pdf]%C3%82%C2%A0on
<
> This paper raised a number of questions in my mind. If their claim that
> JSON parsing takes up to 95% of some CPUs, it is an example of what was
> mentioned earlier about new requirements leading to new instructions.
> So is there a better way than "repurposing" the SIMD instructions.
> Terje mentioned adding a version of AltaVec's permute instruction.
>
If whatever arrives in JSON format takes 95% of the CPU just to parse
it CANNOT be very important! On the other hand a parser that can get
95% of a CPU on a cycle by cycle basis is running exceptionally well.
<
So that sentence is at best misleading.
>
> 1. Is there a better way, perhaps a more "purpose built" instruction?
> If so, should it be specialized to JSON or is there a more generally
> useful instruction?
>
> 2. What about an attached processor that builds the parse tree from the
> string representation "autonomously"?
>
> 3. How does this done with MY 66000's VVM?
>
Probably something like a YACC parser, using a character indirection table
and My 66000 TableTransfer (TT = switch) instruction.

Stephen Fuld

unread,
Jun 22, 2022, 6:57:15 PMJun 22
to
On 6/22/2022 1:22 PM, MitchAlsup wrote:
> On Tuesday, June 21, 2022 at 6:56:01 PM UTC-5, Stephen Fuld wrote:
>> On 6/18/2022 5:49 AM, Thomas Koenig wrote:
>>> Terje Mathisen <terje.m...@tmsw.no> schrieb:
>
>>>
>>> Fortunately, they have described their method of
>>> character classification in an article, to be found at
>>> https://arxiv.org/pdf/1902.08318.pdf]%C3%82%C2%A0on
> <
>> This paper raised a number of questions in my mind. If their claim that
>> JSON parsing takes up to 95% of some CPUs, it is an example of what was
>> mentioned earlier about new requirements leading to new instructions.
>> So is there a better way than "repurposing" the SIMD instructions.
>> Terje mentioned adding a version of AltaVec's permute instruction.
>>
> If whatever arrives in JSON format takes 95% of the CPU just to parse
> it CANNOT be very important! On the other hand a parser that can get
> 95% of a CPU on a cycle by cycle basis is running exceptionally well.
> <
> So that sentence is at best misleading.

I tracked down the original paper referenced in the one above. It is at

https://vldb.org/pvldb/vol11/p1576-palkar.pdf

It points to several implementations of fast JSON parsers, but doesn't
address the source of the high CPU usage numbers, or exactly what they mean.

It does say that the main source of slowdown is that the typical parser
processes its input one character at a time, typically spending a few
instructions per input character. The use of SIMD instructions is
designed to minimize that, by processing several characters in parallel.




>>
>> 1. Is there a better way, perhaps a more "purpose built" instruction?
>> If so, should it be specialized to JSON or is there a more generally
>> useful instruction?
>>
>> 2. What about an attached processor that builds the parse tree from the
>> string representation "autonomously"?
>>
>> 3. How does this done with MY 66000's VVM?
>>
> Probably something like a YACC parser, using a character indirection table
> and My 66000 TableTransfer (TT = switch) instruction.

Based on what I can see from the papers, that would lose to an
architecture with SIMD instructions. I was hoping that VVM would offer
a better way, but it appears that, for this class of problem, doesn't. :-(

MitchAlsup

unread,
Jun 22, 2022, 9:17:25 PMJun 22
to
while( c != EOF )
{ // parse individual tokens.
switch( table[c] )
{
case ALPHABETIC:
p = buffer[0];
while( table[ c=getchar() ] & ALPHANUMERIC ) *p++ = c;
symbol( buffer, p - &buffer );
case NUMERIC:
p = buffer[0];
while( table[ c=getchar() ] & NUMERIC_CONTINUATION ) *p++ = c;
number( buffer, p - &buffer );

The inner loops of each of these (important cases) can be as wide as the
cache port with a little Predication for getchar().

Thomas Koenig

unread,
Jun 26, 2022, 10:55:46 AMJun 26
to
Stephen Fuld <sf...@alumni.cmu.edu.invalid> schrieb:

> 1. Is there a better way, perhaps a more "purpose built" instruction?
> If so, should it be specialized to JSON or is there a more generally
> useful instruction?

An SIMD "found in bitmap" instruction would go a long way, I think
(already discussesd here). Let's say you have 256-bit-registers.

Look at the value of each byte in the source register and set each
byte in the target register to 0x00 or 0xff if the corresponding
bit in a 256-bit "mask" register is set.

Find first non-matching byte in an SIMD register.

>
> 2. What about an attached processor that builds the parse tree from the
> string representation "autonomously"?

Memory to memory? Unless it's really spezialized for one special
type of parsing, it would have to be programmed somehow. Would
it make sense to put something like lex and yacc in hardware?
I guess it could be done; the tables could be in static RAM,
and the algorithms written in hardware. I have no idea at
all how well this would work, or how difficult it would be.

Stephen Fuld

unread,
Jun 26, 2022, 11:29:38 AMJun 26
to
Yeah, but . . .

I am assuming that your would have the code at "ALPHABETIC" be a VVM
loop to take advantage of the cache port width.

But I think the benefits of this, while there, are not as great as they
could/should be, since many identifiers are quite short (e.g. "if',
"end", "for", etc.). Similarly for NUMERIC, as you have noted, most of
the immediate values are quite short.

Also, a question for you. If we did something like you suggest above,
and are parsing, for example, abc+def, when the "+" is encountered, and
the code exits the VVM loop, does its load hit in the VVM buffer? What
about when you the start a new VVM loop to get the "def"? Does it pick
up the existing buffer, or start a new one with a wide cache transfer?

I am far from an expert in this area, and please, others chime in here,
but what about adding a new instruction that takes a character in one
source register, and outputs in a destination register, some bits,
similar to your result of a Compare instruction, that indicate, if set,
things like alphabetic, numeric, null, upper case, etc. Then you might
be able to handle parsing the entire, say, line, with a single VVM loop.
Such an instruction would also speed up functions like "isnumeric",
convert to upper case, etc. It may be appropriate to accept a second
source to the instruction that could specify one of perhaps several
schemes included for other than ASCII or UTF8.

This might allow VVM to be performance competitive with SIMD schemes
such as the one Thomas proposed.

MitchAlsup

unread,
Jun 26, 2022, 11:56:50 AMJun 26
to
What if the language was Arabic, Sanskrit, or Chinese where the mapping
from character to meaning was entirely different. That is:: at least the
string to be searched has to be given to the instruction as an operand.

Stephen Fuld

unread,
Jun 26, 2022, 12:33:17 PMJun 26
to
That was the idea for the second source (see below), to allow for
alternatives. But I guess there might be an issue (in almost any
solution that tries to go beyond character by character) for things like
Arabic that run right to left. Loading the "next" address might be
problematic.

Stefan Monnier

unread,
Jun 26, 2022, 12:46:17 PMJun 26
to
>> I am far from an expert in this area, and please, others chime in here,
>> but what about adding a new instruction that takes a character in one
>> source register, and outputs in a destination register, some bits,
>> similar to your result of a Compare instruction, that indicate, if set,
>> things like alphabetic, numeric, null, upper case, etc.
> <
> What if the language was Arabic, Sanskrit, or Chinese where the mapping
> from character to meaning was entirely different. That is:: at least the
> string to be searched has to be given to the instruction as an operand.

Thinking of it in those high-level terms doesn't seem like it will lead
anywhere. Better start from actual code of lexers/parsers (many/most of
which don't parse human language anyway).


Stefan

MitchAlsup

unread,
Jun 26, 2022, 1:38:19 PMJun 26
to
On Sunday, June 26, 2022 at 11:33:17 AM UTC-5, Stephen Fuld wrote:
> On 6/26/2022 8:56 AM, MitchAlsup wrote:

> >> I am far from an expert in this area, and please, others chime in here,
> >> but what about adding a new instruction that takes a character in one
> >> source register, and outputs in a destination register, some bits,
> >> similar to your result of a Compare instruction, that indicate, if set,
> >> things like alphabetic, numeric, null, upper case, etc.
> > <
> > What if the language was Arabic, Sanskrit, or Chinese where the mapping
> > from character to meaning was entirely different. That is:: at least the
> > string to be searched has to be given to the instruction as an operand.
<
> That was the idea for the second source (see below), to allow for
> alternatives. But I guess there might be an issue (in almost any
> solution that tries to go beyond character by character) for things like
> Arabic that run right to left. Loading the "next" address might be
> problematic.
<
Back in 1982 we were working on cash register applications when the
corporation decided it wanted to be bale to display Arabic and Hebrew
from programs written in BASIC.
<
We had the device driver do the interpretation of L->R versus R->L
so some ASCII text would go out in the normal L-R direction, and
when Arabic or Hebrew was encountered, each letter would push
the preceding letter to the right. Looks strange on the display, but
the customers liked it immensely.
<
The data remained in little endian order.

Quadibloc

unread,
Jun 27, 2022, 5:08:52 AMJun 27
to
On Sunday, June 26, 2022 at 9:56:50 AM UTC-6, MitchAlsup wrote:
> On Sunday, June 26, 2022 at 10:29:38 AM UTC-5, Stephen Fuld wrote:

> > I am far from an expert in this area, and please, others chime in here,
> > but what about adding a new instruction that takes a character in one
> > source register, and outputs in a destination register, some bits,
> > similar to your result of a Compare instruction, that indicate, if set,
> > things like alphabetic, numeric, null, upper case, etc.

> What if the language was Arabic, Sanskrit, or Chinese where the mapping
> from character to meaning was entirely different. That is:: at least the
> string to be searched has to be given to the instruction as an operand.

Chinese, of course, requires more than 256 characters. The solution for most
other cases, though, would be to provide a "translate" instruction - the one that
the System/360 had was adaptable to just that purpose - you could translate
a character string to a string of bytes which coded character types if you liked.

John Savard

Stephen Fuld

unread,
Jun 28, 2022, 12:00:22 PMJun 28
to
Good point! The original paper was about parsing JSON, which is
required to be UTF8. While, if we can provide a more general solution
at low cost is "a good thing", let's not get so carried away with
generality that we make solving the primary problem impossible or
impracticable.

I still think my original proposal would allow a huge speedup for
parsing JSON on MY66000 by allowing the use of VVM for many more
characters within a single VVM loop.

MitchAlsup

unread,
Jun 28, 2022, 1:06:20 PMJun 28
to
I am left wondering::
a) after JSON gets parsed what is done with it ?
b) does what comes out not run for long enough that JSON parsing
....ends up as noise ?

Stephen Fuld

unread,
Jun 28, 2022, 2:10:55 PMJun 28
to
I don't know the answer to those questions. It would take someone with
more knowledge of what the servers are doing than I have to provide the
answers. I was relying on the assertion in the original paper and its
references that the parsing could take up to 95% of the CPU time.

Stefan Monnier

unread,
Jun 28, 2022, 3:42:14 PMJun 28
to
> I am left wondering::
> a) after JSON gets parsed what is done with it ?
> b) does what comes out not run for long enough that JSON parsing
> ....ends up as noise ?

IIUC nowadays JSON is heavily used to send datastructures between
separate microservers as a safer alternative to sending pointers to
shared datastructures between threads.

So sometimes they may send a lot more data then the part actually used
for no other reason than because it works ans is convenient.


Stefan

MitchAlsup

unread,
Jun 28, 2022, 3:48:48 PMJun 28
to
On Tuesday, June 28, 2022 at 2:42:14 PM UTC-5, Stefan Monnier wrote:
> > I am left wondering::
> > a) after JSON gets parsed what is done with it ?
> > b) does what comes out not run for long enough that JSON parsing
> > ....ends up as noise ?
> IIUC nowadays JSON is heavily used to send datastructures between
> separate microservers as a safer alternative to sending pointers to
> shared datastructures between threads.
<
Certainly much of this gets cached, too.
>
> So sometimes they may send a lot more data then the part actually used
> for no other reason than because it works ans is convenient.
<
But doesn't the work carried by JSON consume a lot of cycles making the
parsing cycles noise?
<
That is: it takes a long time to compile a Linux kernel, but nobody cares
because you use that image for years.
>
>
> Stefan

Terje Mathisen

unread,
Jun 28, 2022, 4:43:35 PMJun 28
to
Even the original 8088 had XLAT, an opcode to translate bytes in AL into
the corresponding value stored in a 256-byte table addressed by BX.

The idea was that you could translate a full buffer with very tight
code, you just needed SI -> source, DI -> dest (can be the same as SI),
CX = buffer size, BX -> translation table:

next:
lodsb ; Load a byte into AL, update SI
xlat ; al = bx[al]
stosb ; Store back, update DI
loop next ; Decrement CX, loop unless decremented to zero.

I.e. 4 instructions and 5 total bytes!

Stephen Fuld

unread,
Jun 28, 2022, 4:58:16 PMJun 28
to
Yes. The reason I didn't suggest something equivalent was that I was
trying to avoid the need for the load of the translated byte from the
256 byte table. My suggestion, while not as general, should be faster.

MitchAlsup

unread,
Jun 28, 2022, 5:02:26 PMJun 28
to
On Tuesday, June 28, 2022 at 3:43:35 PM UTC-5, Terje Mathisen wrote:
> Quadibloc wrote:
> > On Sunday, June 26, 2022 at 9:56:50 AM UTC-6, MitchAlsup wrote:
> >> On Sunday, June 26, 2022 at 10:29:38 AM UTC-5, Stephen Fuld wrote:
> >
> >>> I am far from an expert in this area, and please, others chime in here,
> >>> but what about adding a new instruction that takes a character in one
> >>> source register, and outputs in a destination register, some bits,
> >>> similar to your result of a Compare instruction, that indicate, if set,
> >>> things like alphabetic, numeric, null, upper case, etc.
> >
> >> What if the language was Arabic, Sanskrit, or Chinese where the mapping
> >> from character to meaning was entirely different. That is:: at least the
> >> string to be searched has to be given to the instruction as an operand.
> >
> > Chinese, of course, requires more than 256 characters. The solution for most
> > other cases, though, would be to provide a "translate" instruction - the one that
> > the System/360 had was adaptable to just that purpose - you could translate
> > a character string to a string of bytes which coded character types if you liked.
<
I was thinking about this the other day, and wondered why we don't let various
programmers around the world to program in their won language and when one
runs into プリントf() one knows that it is printf(). This is merely 'ln -s' applied at
the symbol table.
<
> Even the original 8088 had XLAT, an opcode to translate bytes in AL into
> the corresponding value stored in a 256-byte table addressed by BX.
>
> The idea was that you could translate a full buffer with very tight
> code, you just needed SI -> source, DI -> dest (can be the same as SI),
> CX = buffer size, BX -> translation table:
>
> next:
> lodsb ; Load a byte into AL, update SI
> xlat ; al = bx[al]
> stosb ; Store back, update DI
> loop next ; Decrement CX, loop unless decremented to zero.
>
> I.e. 4 instructions and 5 total bytes!
<
This "no workie so well" when characters require 2 bytes or decoding
to determine if the character is 1 or 2 bytes.

Stefan Monnier

unread,
Jun 28, 2022, 5:38:02 PMJun 28
to
>> IIUC nowadays JSON is heavily used to send datastructures between
>> separate microservers as a safer alternative to sending pointers to
>> shared datastructures between threads.
> Certainly much of this gets cached, too.

I don't think so.

>> So sometimes they may send a lot more data then the part actually used
>> for no other reason than because it works ans is convenient.
> But doesn't the work carried by JSON consume a lot of cycles making the
> parsing cycles noise?

JSON does not "carry any work" it's just a big tree datastructure
represented as text. The receiving process reads it and then it uses
the part of it that it needs and often just disregards most of what
it received (because it's easier to do it that way than to somewhat
describe to the sender which parts are needed).

GraphQL tries to reduce this waste by providing a standardized way for
the client to request specific parts, but even so, the server often
returns a lot more data than the client really needs.


Stefan

Stefan Monnier

unread,
Jun 28, 2022, 5:41:56 PMJun 28
to
> This "no workie so well" when characters require 2 bytes or decoding
> to determine if the character is 1 or 2 bytes.

AFAIK most lexers don't directly work on "characters encoded as utf-8",
for that reason. They work at the level of bytes, or on sequences of
"entities" which can be characters or tokens, returned by a lower
level lexer.


Stefan

MitchAlsup

unread,
Jun 28, 2022, 5:46:43 PMJun 28
to
On Tuesday, June 28, 2022 at 4:38:02 PM UTC-5, Stefan Monnier wrote:
> >> IIUC nowadays JSON is heavily used to send datastructures between
> >> separate microservers as a safer alternative to sending pointers to
> >> shared datastructures between threads.
> > Certainly much of this gets cached, too.
> I don't think so.
> >> So sometimes they may send a lot more data then the part actually used
> >> for no other reason than because it works ans is convenient.
> > But doesn't the work carried by JSON consume a lot of cycles making the
> > parsing cycles noise?
> JSON does not "carry any work" it's just a big tree datastructure
> represented as text. The receiving process reads it and then it uses
> the part of it that it needs and often just disregards most of what
> it received (because it's easier to do it that way than to somewhat
> describe to the sender which parts are needed).
<
So, are you saying that the requestor sends over an encyclopedia* and
the provider uses 2 paragraphs. Yes, that sounds really efficient. Yep,
Oh Boy is that a good idea.
<
(*) specification
>
> GraphQL tries to reduce this waste by providing a standardized way for
> the client to request specific parts, but even so, the server often
> returns a lot more data than the client really needs.
<
Does the client even know what he is looking for ?
>
>
> Stefan

MitchAlsup

unread,
Jun 28, 2022, 5:47:34 PMJun 28
to
Most lexers also represent a scant faction of the time spend using what
got lexed.
>
> Stefan

EricP

unread,
Jun 28, 2022, 6:26:25 PMJun 28
to
rant:ON {

I looked at the JSON wikipedia page - it is almost as dumb as XML.
The notion of converting all integer binary values to text
to communicate between compatible applications when
all machines use 2's compliment little endian or can
trivially convert is just plain stupid. Same for ieee floats.

Or repeatedly sending the key names as text *per field*
instead of references to a prior dictionary definition.

And the argument that "oh well you can edit the file" is crap because
once you have a binary format spec you can trivially build an editor.
Or a binary file decoder/encoder if you are really set on editing text.
And often they are sending this over sockets to transaction servers
so the ability to edit in Notepad is irrelevant.

Also having a spec for key:value pair formats tells you nothing
about the format each application requires for { aggregates },
or field order, or which fields are mandatory or optional.
Once you define this sometimes you can skip key names, just have values.

Apologies to those forced to use JSON but this is jaw droppingly dumb.
Why does anyone bother with this convert-to-text foolishness?
Maybe they are uncomfortable programming with "binary values".

} rant:OFF



Timothy McCaffrey

unread,
Jun 28, 2022, 8:03:13 PMJun 28
to
I agree with everything you said, however :)

This is the world of microservices and/or serverless applications. Having the JSON be human editable allows
the developer to manually build some unit test cases (JSON is sometimes used to configure things as well).
Overall performance is not the overriding priority here, the point is to be able to update your app on the fly
(in pieces, if necessary) with no downtime. Compromises were made to achieve this goal :(.

It all kind of makes sense if you get into it, and the lack of type/syntax checking goes right along with
using Javascript in the first place (I do not like JS because it seems like they tried to find the worst
aspects of programming languages over the last 60 years and put them in one place0.

But, I agree, the whole thing seems like a train wreck in progress.

- Tim

MitchAlsup

unread,
Jun 28, 2022, 8:29:30 PMJun 28
to
On Tuesday, June 28, 2022 at 7:03:13 PM UTC-5, timca...@aol.com wrote:

> It all kind of makes sense if you get into it, and the lack of type/syntax checking goes right along with
> using Javascript in the first place (I do not like JS because it seems like they tried to find the worst
> aspects of programming languages over the last 60 years and put them in one place0.
<
You forgot the trademark symbol: ™

MitchAlsup

unread,
Jun 28, 2022, 8:34:16 PMJun 28