Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

neat trick for effecting behaviour change from a subclass

14 views
Skip to first unread message

Rainer Weikusat

unread,
Nov 7, 2022, 5:06:02 PM11/7/22
to
Some code I happen to be working with serializes Perl data structures to
JSON. Among other things, this is used for data communication for a
web-based terminal app and hence, it's a bit performance critical. Part
of this code is a loop

for (keys(%$v)) {
}

which is used to serialize Perl hashes. Serializing to JSON supports two
modes, a plain one which just generates long strings and a pretty one
whose output is supposed to be human-readable and hence, includes
structural indentation.

Due to the nature of Perl hash traversal, the
order of keys is unpredictable and likely to change between different
invocations of the same script. For pretty-printing, that's undesirable
because its confusing to humans. Hence, the pretty-printer should sort these
keys. Due to performance considerations (according to a quick test, a
Perl method call is about 1.5 times slower than a plain subroutine
call), I didn't want to add a general method call here.

What I did instead was the following: Define a subroutine

sub my_keys
{
keys(%{$_[0]})
}

and use that in place of keys in the loop head. This gets resolved down
to the glob at compile time and then invokes whatever is in the
subroutine slot of the glob at the time of execution. The top-level
pretty printer data addition method then does

local *my_keys = sub {
sort(keys(%{$_[0]}))
};

This cause the formatting method several layers deeper in the callchain
to invoke this subroutine and hence, sort they keys, when being used from
a pretty-printer object.

E. Choroba

unread,
Nov 7, 2022, 5:54:24 PM11/7/22
to
Do you mean you don't use any of the JSON modules available in core Perl or CPAN?

Ch.

Rainer Weikusat

unread,
Nov 8, 2022, 6:43:46 AM11/8/22
to
"E. Choroba" <cho...@matfyz.cz> writes:
> On Monday, November 7, 2022 at 11:06:02 PM UTC+1, Rainer Weikusat wrote:

[...]

> Do you mean you don't use any of the JSON modules available in core Perl or CPAN?

That was hardly the point of the post but I generally don't use stuff
just because someone uploaded it to a web server unless it

- saves a signifcant amount of work (>> 1000 LOC)
- has an at least remotely sane interface and implementation

If I use it, I'll end up having to maintain it. This will - piecemeal -
take an indeterminate amount of time and effort. Which means I do use
Net::SSL (despite that's seriously patchy) and I don't use date-time,
JSON or OO-system modules.

Bernie Cosell

unread,
Nov 8, 2022, 11:46:49 AM11/8/22
to
Rainer Weikusat <rwei...@talktalk.net> wrote:

} Some code I happen to be working with serializes Perl data structures to
} JSON. Among other things, this is used for data communication for a
} web-based terminal app and hence, it's a bit performance critical. Part
} of this code is a loop
}
} for (keys(%$v)) {
} }
}
} which is used to serialize Perl hashes. Serializing to JSON supports two
} modes, a plain one which just generates long strings and a pretty one
} whose output is supposed to be human-readable and hence, includes
} structural indentation.
}
} Due to the nature of Perl hash traversal, the
} order of keys is unpredictable and likely to change between different
} invocations of the same script.

Wouldn't the simple

for (sort keys(%$v))
{
}

do the job, or am I missing something?

/Bernie\
--
Bernie Cosell Fantasy Farm Fibers
ber...@fantasyfarm.com Pearisburg, VA
--> Too many people, too few sheep <--

Rainer Weikusat

unread,
Nov 8, 2022, 12:11:28 PM11/8/22
to
Bernie Cosell <ber...@fantasyfarm.com> writes:
> Rainer Weikusat <rwei...@talktalk.net> wrote:
>
> } Some code I happen to be working with serializes Perl data structures to
> } JSON. Among other things, this is used for data communication for a
> } web-based terminal app and hence, it's a bit performance critical. Part
> } of this code is a loop
> }
> } for (keys(%$v)) {
> } }
> }
> } which is used to serialize Perl hashes. Serializing to JSON supports two
> } modes, a plain one which just generates long strings and a pretty one
> } whose output is supposed to be human-readable and hence, includes
> } structural indentation.
> }
> } Due to the nature of Perl hash traversal, the
> } order of keys is unpredictable and likely to change between different
> } invocations of the same script.
>
> Wouldn't the simple
>
> for (sort keys(%$v))
> {
> }
>
> do the job, or am I missing something?

The non-introductory part of the text would be my guess. :-) Depending
on the context, the key set should either be sorted or be used as is and
this preferably with little or no overhead for the second case, ie,
without always calling a method to get the key set.

E. Choroba

unread,
Nov 8, 2022, 5:23:17 PM11/8/22
to
On Tuesday, November 8, 2022 at 12:43:46 PM UTC+1, Rainer Weikusat wrote:
> - saves a signifcant amount of work (>> 1000 LOC)

Cpanel::JSON::XS is more than 2500 lines of Perl and 5000 lines of XS. But, as the saying goes, YMMV.

Ch.

Rainer Weikusat

unread,
Nov 9, 2022, 1:38:29 PM11/9/22
to
That J::Random::Bored::Sysadmins::Handoptimized::C::Module for solving a
fairly simple task is insanely huge doesn't mean solving the task will
require a comparable amount of code.

Rainer Weikusat

unread,
Nov 13, 2022, 2:19:23 PM11/13/22
to
To add some context to that: The core of the code I'm using consists of
a JSON parser (I claim to be complete and correct) of 288 lines of code
and a JSON serializer of 210. It beggars belief that someone managed to
spend more than 7500 lines of code just on that.

E. Choroba

unread,
Nov 13, 2022, 2:48:23 PM11/13/22
to
Try running your code against the test suite of the module to see how complete and correct it is. Or try to benchmark it to see which one is faster. And regardless of the results, it would be great if you could share your code with the rest of the world.

Ch.

Rainer Weikusat

unread,
Nov 13, 2022, 5:03:23 PM11/13/22
to
"E. Choroba" <cho...@matfyz.cz> writes:
> On Sunday, November 13, 2022 at 8:19:23 PM UTC+1, Rainer Weikusat wrote:
>> Rainer Weikusat <rwei...@talktalk.net> writes:
>> > "E. Choroba" <cho...@matfyz.cz> writes:
>> >> On Tuesday, November 8, 2022 at 12:43:46 PM UTC+1, Rainer Weikusat wrote:
>> >>> - saves a signifcant amount of work (>> 1000 LOC)
>> >>
>> >> Cpanel::JSON::XS is more than 2500 lines of Perl and 5000 lines of XS. But, as the saying goes, YMMV.
>> >
>> > That J::Random::Bored::Sysadmins::Handoptimized::C::Module for solving a
>> > fairly simple task is insanely huge doesn't mean solving the task will
>> > require a comparable amount of code.
>> To add some context to that: The core of the code I'm using consists of
>> a JSON parser (I claim to be complete and correct) of 288 lines of code
>> and a JSON serializer of 210. It beggars belief that someone managed to
>> spend more than 7500 lines of code just on that.
>
> Try running your code against the test suite of the module to see how
> complete and correct it is. Or try to benchmark it to see which one is
> faster.

As that's all Perl, it's doubtlessly going to be slower than any C
implementation (except a very poor one). But it's fast enough for the
problems it has been applied to so far.

> And regardless of the results, it would be great if you could
> share your code with the rest of the world.

I have one of these nice 'assuming you're dream were commercially
valuable, they'd belong to us' US employment contracts and hence, that's
not an option.

:-)

$Bill

unread,
Nov 13, 2022, 9:10:33 PM11/13/22
to
On 11/13/2022 14:03, Rainer Weikusat wrote:
>
>>> To add some context to that: The core of the code I'm using consists of
>>> a JSON parser (I claim to be complete and correct) of 288 lines of code
>>> and a JSON serializer of 210. It beggars belief that someone managed to
>>> spend more than 7500 lines of code just on that.

I kinda agree. I didn't bother checking when I needed a parser for JSON, so
I wrote my own JSON to Perl hash parser, and indented JSON printer and Perl
to JSON converter and each of them is a few hundred Perl lines. Doing the same
for HTML is a lot more complicated, but I did that too. While I was at it,
I wrote an ICS to JSON converter. It's amazing how close to Perl hashes that
JSON is (I'd be surprised if the JSON authors didn't model it after Perl).

I tend to depend as little as possible on other people's code for my own
projects. Plus it's fun to do it yourself.

E. Choroba

unread,
Nov 14, 2022, 6:15:24 PM11/14/22
to
If you want to see what your code missed, you can try running the test suite of Cpanel::JSON::XS against your code. If such things never occur in the JSONs you need to process, you're lucky.

Ch.

Rainer Weikusat

unread,
Nov 15, 2022, 7:49:49 AM11/15/22
to
Chances are that the JSON specification is smaller than this module and
it's really not difficult to implement. The only thing that's a bit
hairy are Unicode surrogates. There are simply no "such things" which
could occur in it (assuming the usual assumptions about whitespace are
not being made).

There's little wonder that the people who prefer complicated, chaotic
shit that's incomprehensible to both humans and computers immediately
invented YAML as Ersatz-XML once the latter miscreation had finally
fallen out of use.

Rainer Weikusat

unread,
Nov 15, 2022, 6:04:15 PM11/15/22
to
Rainer Weikusat <rwei...@talktalk.net> writes:
> "E. Choroba" <cho...@matfyz.cz> writes:
>> On Monday, November 14, 2022 at 3:10:33 AM UTC+1, $Bill wrote:
>>> On 11/13/2022 14:03, Rainer Weikusat wrote:
>>> >
>>> >>> To add some context to that: The core of the code I'm using consists of
>>> >>> a JSON parser (I claim to be complete and correct) of 288 lines of code
>>> >>> and a JSON serializer of 210. It beggars belief that someone managed to
>>> >>> spend more than 7500 lines of code just on that.
>>> I kinda agree. I didn't bother checking when I needed a parser for JSON, so
>>> I wrote my own JSON to Perl hash parser, and indented JSON printer and Perl
>>> to JSON converter and each of them is a few hundred Perl lines. Doing the same
>>> for HTML is a lot more complicated, but I did that too. While I was at it,
>>> I wrote an ICS to JSON converter. It's amazing how close to Perl hashes that
>>> JSON is (I'd be surprised if the JSON authors didn't model it after Perl).
>>>
>>> I tend to depend as little as possible on other people's code for my own
>>> projects. Plus it's fun to do it yourself.
>>
>> If you want to see what your code missed, you can try running the test
>> suite of Cpanel::JSON::XS against your code. If such things never
>> occur in the JSONs you need to process, you're lucky.
>
> Chances are that the JSON specification is smaller than this module and
> it's really not difficult to implement. The only thing that's a bit
> hairy are Unicode surrogates. There are simply no "such things" which
> could occur in it (assuming the usual assumptions about whitespace are
> not being made).

To elaborate on this a little: A JSON-something is composed of a
sequence of typed token with optional whitespace between them. The type
of a token can always be determined by examining its first
character. All tokens except strings, numbers and literals are composed
of a single character. There are three literals, true, false and null. A
string always starts and ends with a ". A number always ends with a
digit. Hence, lexical analysis generally works as follows:

1. Skip over horizontal whitespace.
2. Look at the next character to determine the type of the next
token. There are three possible cases here:

2a) The next character cannot start a token => error.
2b) The token type is not valid in the given context => error.
2c) Process the valid token.

Token start characters and associated types are:

{ object start
} object end
[ array start
] array end
" string
, item separator
: key-value separtor
f
n
t literal
-
0
1
2
3
4
5
6
7
8
9 number

A JSON document is a JSON value. A JSON value is either

1. A literal.
2. A string.
3. A number.
4. An array.
5. An object.

An array is possibly empty, a comma-separated list of JSON values
enclosed by [].

An object is a possibly empty, comma-sepratated list of key-value pairs
enclosed by {}.

A key-value pair is a token sequence <string><colon><JSON value>.

And that's it (minus the inner string syntax).

Eli the Bearded

unread,
Nov 16, 2022, 6:08:36 PM11/16/22
to
In comp.lang.perl.misc, Rainer Weikusat <rwei...@talktalk.net> wrote:
> To elaborate on this a little: A JSON-something is composed of a
> sequence of typed token with optional whitespace between them. The type
> of a token can always be determined by examining its first
> character. All tokens except strings, numbers and literals are composed
> of a single character. There are three literals, true, false and null. A
> string always starts and ends with a ". A number always ends with a
> digit. Hence, lexical analysis generally works as follows:
>
> 1. Skip over horizontal whitespace.
> 2. Look at the next character to determine the type of the next
> token. There are three possible cases here:
>
> 2a) The next character cannot start a token => error.
> 2b) The token type is not valid in the given context => error.
> 2c) Process the valid token.
>
> Token start characters and associated types are:
>
> { object start

Tab damage.

> } object end
> [ array start
> ] array end
> " string
> , item separator
> : key-value separtor
> f
> n
> t literal
> -
> 0
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9 number

Seems to me you've got at least one bug here.

> A JSON document is a JSON value. A JSON value is either
>
> 1. A literal.
> 2. A string.
> 3. A number.
> 4. An array.
> 5. An object.
>
> An array is possibly empty, a comma-separated list of JSON values
> enclosed by [].
>
> An object is a possibly empty, comma-sepratated list of key-value pairs
> enclosed by {}.

JSON is a lot stricter about commas than many other new languages, like
Perl. It's not hard to imagine someone getting it wrong based on a spec
as limited as yours. [1,2,3] -> okay [1,2,3,] -> invalid

> A key-value pair is a token sequence <string><colon><JSON value>.
>
> And that's it (minus the inner string syntax).

What dragons could be lurking in there?

Elijah
------
nf jevggra gur fcrp nobir qbrf abg nyybj sybngvat cbvag ahzoref

Rainer Weikusat

unread,
Nov 17, 2022, 5:37:20 AM11/17/22
to
I don't think so.

>
>> A JSON document is a JSON value. A JSON value is either
>>
>> 1. A literal.
>> 2. A string.
>> 3. A number.
>> 4. An array.
>> 5. An object.
>>
>> An array is possibly empty, a comma-separated list of JSON values
>> enclosed by [].
>>
>> An object is a possibly empty, comma-sepratated list of key-value pairs
>> enclosed by {}.
>
> JSON is a lot stricter about commas than many other new languages, like
> Perl. It's not hard to imagine someone getting it wrong based on a spec
> as limited as yours. [1,2,3] -> okay [1,2,3,] -> invalid

Well, where's the value after the final separator in your last example?

>> A key-value pair is a token sequence <string><colon><JSON value>.
>>
>> And that's it (minus the inner string syntax).
>
> What dragons could be lurking in there?

Escape sequences, but that's rather straight-forward. Apart form that:
What I already mentioned: Unicode surrogate pairs, requiring a small
state machine (two states) to parse/ analyze.

Eli the Bearded

unread,
Nov 17, 2022, 9:29:41 PM11/17/22
to
In comp.lang.perl.misc, Rainer Weikusat <rwei...@talktalk.net> wrote:
> Eli the Bearded <*@eli.users.panix.com> writes:
>> Seems to me you've got at least one bug here.
> I don't think so.

:r! getarticle-nntp '<eli$22111...@qaz.wtf>' | tail -1 | /usr/games/rot13
as written the spec above does not allow floating point numbers

Elijah
------
thought all Javascript numbers were floats

Rainer Weikusat

unread,
Nov 18, 2022, 10:46:24 AM11/18/22
to
Eli the Bearded <*@eli.users.panix.com> writes:
> In comp.lang.perl.misc, Rainer Weikusat <rwei...@talktalk.net> wrote:
>> Eli the Bearded <*@eli.users.panix.com> writes:
>>> Seems to me you've got at least one bug here.
>> I don't think so.
>
> :r! getarticle-nntp '<eli$22111...@qaz.wtf>' | tail -1 | /usr/games/rot13
> as written the spec above does not allow floating point numbers

,----
| A JSON-something is composed of a sequence of typed token with optional
| whitespace between them. The type of a token can always be determined by
| examining its first character.
|
| [...]
|
| Token start characters and associated types are:
|
| [...]
|
| -
| 0
| 1
| 2
| 3
| 4
| 5
| 6
| 7
| 8
| 9 number
`----

That's a statement about characters which can start a token whose type
is number. It doesn't say anything about characters after the first. I
should perhaps have mentioned that I also didn't describe the syntax for
numbers. As regex, it's

/^-?(0|[1-9][0-9]*)(\.[0-9]+)?([eE][+-][0-9]+)?$/

Source: https://www.json.org/json-en.html (errors made in translation
are mine).


Rainer Weikusat

unread,
Nov 18, 2022, 1:01:28 PM11/18/22
to
Rainer Weikusat <rwei...@talktalk.net> writes:

[JSON syntax for numbers]


> /^-?(0|[1-9][0-9]*)(\.[0-9]+)?([eE][+-][0-9]+)?$/
>
> Source: https://www.json.org/json-en.html (errors made in translation
> are mine).

There's a () missing in the first term. It should be (0|(?:[1-9][0-9]*)).
0 new messages