Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

substring manipulation

18 views
Skip to first unread message

Rainer Weikusat

unread,
Feb 26, 2023, 5:24:26 PM2/26/23
to
Problem: I have a string of an unknown length accessible as $_ and need
to collect a string whose length is stored in a variable. This length
may be <, > or == the length of the current work string. The string or
substring I need from the current work string has to be removed from it
and its actual length subtracted from the length in the variable.

Initially, I tried to do this with ($c_sz being the length variable)

s/^(.{1,$c_sz})//s;
$$self[BODY] .= $1;
$c_sz -= length($1);

Unfortunately, this doesn't work because regex quantfiers are rather
annoyingly (It's 2023, folks. RAM is cheap) limited to what can be
represented as positve, signed 16-bit integer - 1, ie, 32766.

OTOH, the substr-operator returns a so-called lvalue which means that
the following works:

for (substr($_, 0, $c_sz)) {
$$self[BODY] .= $_;
$c_sz -= length();

$_ = '';
}

Eric Pozharski

unread,
Feb 27, 2023, 5:33:20 AM2/27/23
to
with <87o7pg3...@doppelsaurus.mobileactivedefense.com> Rainer
Weikusat wrote:

*SKIP*
> s/^(.{1,$c_sz})//s;
> $$self[BODY] .= $1;
> $c_sz -= length($1);
>
> Unfortunately, this doesn't work because regex quantfiers are rather
> annoyingly (It's 2023, folks. RAM is cheap) limited to what can be
> represented as positve, signed 16-bit integer - 1, ie, 32766.

Your perl is 32bit. Get over it.

% perl -wle '
$aa = "x" x 90_000;
$aa =~ m[(.{1,80000})];
print length $1 '
Quantifier in {,} bigger than 65534 in regex; marked by <-- HERE
in m/(.{1,80000 <-- HERE })/ at -e line 3.

> OTOH, the substr-operator returns a so-called lvalue which means that
> the following works:
>
> for (substr($_, 0, $c_sz)) {
> $$self[BODY] .= $_;
> $c_sz -= length();
> $_ = '';
> }

It's a pity that pseudo-looping is the only way to get aliasing. This
is as ugly:

% perl -wle '
$_ = "kflv-kvla-oprt";
( $ab, substr( $_, 0, 5 ) ) = ( substr( $_, 0, 5 ), "" );
print "($ab)";
print "($_)" '
(kflv-)
(kvla-oprt)

--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom

Ben Bacarisse

unread,
Feb 27, 2023, 6:58:52 AM2/27/23
to
Rainer Weikusat <rwei...@talktalk.net> writes:

> Problem: I have a string of an unknown length accessible as $_ and need
> to collect a string whose length is stored in a variable. This length
> may be <, > or == the length of the current work string. The string or
> substring I need from the current work string has to be removed from it
> and its actual length subtracted from the length in the variable.
>
> Initially, I tried to do this with ($c_sz being the length variable)
>
> s/^(.{1,$c_sz})//s;
> $$self[BODY] .= $1;
> $c_sz -= length($1);
>
> Unfortunately, this doesn't work because regex quantfiers are rather
> annoyingly (It's 2023, folks. RAM is cheap) limited to what can be
> represented as positve, signed 16-bit integer - 1, ie, 32766.

This should work:

s/^(${\(".?" x $c_sz)})//s;

but I suggest it only as a Perl joke!

--
Ben.

Rainer Weikusat

unread,
Feb 27, 2023, 10:59:49 AM2/27/23
to
s/^((??{".?" x $c_sz}))//s

also works as a less contorted way of creating a pattern at match time
without storing it in an intermediate variable.

In case someone also wants to know and doesn't want to pick it appart
himself:

".?" x $c_sz

is an expression returning a string of $c_sz non-greey any character
matchers.

\(".?" x $c_z)

creates a reference to a read-only scalar whose value is the string
returned by the expression in brackes.

${\(".?" x $c_sz)}

dereferences this reference which yields the string. It's interpolated
into the s/// before matching because ordinay string interpolation is
done on the pattern part.

While this was an entertaining puzzle, I don't think this kind of
semantic terrorism has a place in actual code.

Rainer Weikusat

unread,
Feb 27, 2023, 11:13:49 AM2/27/23
to
Eric Pozharski <why...@pozharski.name> writes:
> with <87o7pg3...@doppelsaurus.mobileactivedefense.com> Rainer
> Weikusat wrote:
>
> *SKIP*
>> s/^(.{1,$c_sz})//s;
>> $$self[BODY] .= $1;
>> $c_sz -= length($1);
>>
>> Unfortunately, this doesn't work because regex quantfiers are rather
>> annoyingly (It's 2023, folks. RAM is cheap) limited to what can be
>> represented as positve, signed 16-bit integer - 1, ie, 32766.
>
> Your perl is 32bit. Get over it.

It's not.

rw@brushfire:~/work/mad-http$ perl -e 'print ((1 << 63) + 15, "\n")'
9223372036854775823

This is just an abitrary limit compiled into it.


[...]

>> OTOH, the substr-operator returns a so-called lvalue which means that
>> the following works:
>>
>> for (substr($_, 0, $c_sz)) {
>> $$self[BODY] .= $_;
>> $c_sz -= length();
>> $_ = '';
>> }
>
> It's a pity that pseudo-looping is the only way to get aliasing.

Not really. This works as well:

--------
my $a = 'abcdefghijklmnopqrstuvwxyz';
my $b;

{
local *_ = \substr($a, 0, 5);
$b = $_;
$_ = 'emilia';
}

print("$a\n$b\n");
-------

It's also not really pseudo-anything

for (<list>) {
<stmt>;
<stmt>;
<stmt>;
}

aliases $_ to each element of the list and then executes whatever is in
the associated block. A list with one element is as good a list as any
other.


Eric Pozharski

unread,
Mar 1, 2023, 5:33:13 AM3/1/23
to
with <87fsard...@doppelsaurus.mobileactivedefense.com> Rainer
Weikusat wrote:
> Eric Pozharski <why...@pozharski.name> writes:
>> with <87o7pg3...@doppelsaurus.mobileactivedefense.com> Rainer
>> Weikusat wrote:

*SKIP*
>>> Unfortunately, this doesn't work because regex quantfiers are rather
>>> annoyingly (It's 2023, folks. RAM is cheap) limited to what can be
>>> represented as positve, signed 16-bit integer - 1, ie, 32766.
>> Your perl is 32bit. Get over it.
> It's not.
> rw@brushfire:~/work/mad-http$ perl -e 'print ((1 << 63) + 15, "\n")'
> 9223372036854775823
> This is just an abitrary limit compiled into it.

This implies that whoever built your perl has explicitly set this
"arbitrary limit". Somehow I doubt it. May I see output of this:

% perl -MConfig -wE 'say $Config{use64bitall} // "foo"'
foo

I haven't looked how upper limit on capture groups is set upon building.
Should I?

*SKIP*
> It's also not really pseudo-anything
> for (<list>) {
> <stmt>;
> <stmt>;
> <stmt>;
> }
*SKIP*

Well, how would you identify this construct then:

for ( $aa=42 ) { $_*=2 }

Rainer Weikusat

unread,
Mar 1, 2023, 10:18:55 AM3/1/23
to
Eric Pozharski <why...@pozharski.name> writes:
> with <87fsard...@doppelsaurus.mobileactivedefense.com> Rainer
> Weikusat wrote:
>> Eric Pozharski <why...@pozharski.name> writes:
>>> with <87o7pg3...@doppelsaurus.mobileactivedefense.com> Rainer
>>> Weikusat wrote:
>
> *SKIP*
>>>> Unfortunately, this doesn't work because regex quantfiers are rather
>>>> annoyingly (It's 2023, folks. RAM is cheap) limited to what can be
>>>> represented as positve, signed 16-bit integer - 1, ie, 32766.
>>> Your perl is 32bit. Get over it.
>> It's not.
>> rw@brushfire:~/work/mad-http$ perl -e 'print ((1 << 63) + 15, "\n")'
>> 9223372036854775823
>> This is just an abitrary limit compiled into it.
>
> This implies that whoever built your perl has explicitly set this
> "arbitrary limit". Somehow I doubt it.

It's documented as such:

n and m are limited to non-negative integral values less than a
preset limit defined when perl is built. This is usually 32766
on the most common platforms.

[...]

>> It's also not really pseudo-anything
>> for (<list>) {
>> <stmt>;
>> <stmt>;
>> <stmt>;
>> }
> *SKIP*
>
> Well, how would you identify this construct then:
>
> for ( $aa=42 ) { $_*=2 }

I already wrote that. foreach is something like mapc in lisp: It takes a
block and a list as argument. It then aliases $_ to each element on the
list in turn and executes the block once everytime a new list element
has been aliased. A list of one element is just a list. Even a
list of 0 elements could be used for something:

----------
This is a funky comment for ();

Eric Pozharski

unread,
Mar 2, 2023, 1:38:23 PM3/2/23
to
with <87y1ogf...@doppelsaurus.mobileactivedefense.com> Rainer
Weikusat wrote:
> Eric Pozharski <why...@pozharski.name> writes:
>> with <87fsard...@doppelsaurus.mobileactivedefense.com> Rainer
>> Weikusat wrote:
>>> Eric Pozharski <why...@pozharski.name> writes:
>>>> with <87o7pg3...@doppelsaurus.mobileactivedefense.com> Rainer
>>>> Weikusat wrote:

>> *SKIP*
>>>>> Unfortunately, this doesn't work because regex quantfiers are rather
>>>>> annoyingly (It's 2023, folks. RAM is cheap) limited to what can be
>>>>> represented as positve, signed 16-bit integer - 1, ie, 32766.
>>>> Your perl is 32bit. Get over it.
>>> It's not.
*SKIP*
> It's documented as such:
>
> n and m are limited to non-negative integral values less than a
> preset limit defined when perl is built. This is usually 32766
> on the most common platforms.

Well, I've done some research, this is what v5.34.0 has to offer:

*n* and *m* are limited to non-negative integral values less than a
preset limit defined when perl is built. This is usually 65534 on the
most common platforms.

And in spite of being definetely 32bit

% perl -MConfig -wE 'say $Config{archname} // "foo"'
i586-linux-thread-multi

it realy does that many

% perl -wE '/.{200000}/'
Quantifier in {,} bigger than 65534 in regex; marked by <-- HERE
in m/.{200000 <-- HERE }/ at -e line 1.

I stand corrected -- s/32bit/really old/

>>> It's also not really pseudo-anything
>>> for (<list>) {
>>> <stmt>;
>>> <stmt>;
>>> <stmt>;
>>> }
>> Well, how would you identify this construct then:
>>
>> for ( $aa=42 ) { $_*=2 }
> I already wrote that. foreach is something like mapc in lisp: It takes
> a block and a list as argument. It then aliases $_ to each element on
> the list in turn and executes the block once everytime a new list
> element has been aliased. A list of one element is just a list. Even a
> list of 0 elements could be used for something:
> ----------
> This is a funky comment for ();

Well. Turns out, looping over *explicit* one-element list has no
any meaning. Thank you for this insight.
0 new messages