Does this apply in the case of SCALARS?
It would seem that dereferencing a SCALAR reference would create a temporary of the original
and I think this is the case.
Perl seems to know array and hash references pretty well and, internally, dereferencing them does
not seem to incur overhead of making a temporary copy of the original array or hash, the element is
directly accessed, as if the reference were a pointer.
Thats fine, I have no problem at all with this. I am only interrested in SCALAR.
I was curious about the function 'substr'. The first parameter is EXPRESSION. It seems to be an
EXPRESSION evaluator. On its face, it will not take a reference, but it will take a dereferenced SCALAR.
But I wonder if, based on the EXPRESSION, if it knows it is dereferencing a SCALAR, and does not make a temporary copy.
In the case of the regular expression engine, I wonder the same thing, although with this there may be other properties
that would cause the discrepancies shown below.
For example 'pos()=' and '= pos()' might incur much overhead, and that may explain it as it could produce unknown temporaries
in the engine.
If I run this code segment 100 times, the substr is averaging 10 times faster than the regex. I can't explain it.
Thanks!
// substr
return substr($$refscalar, 20, 30);
// regex
$savpos = pos($$refscalar);
pos($$refscalar) = 20;
while ($$refscalar =~ /(.{10})/gs) {
pos($$refscalar) = $savpos;
return $1;
}
I'm sorry, this should be ^10
No.
BugBear
To expand a little: the items in @_ are aliases, not copies, so if you
manipulate them directly you will not be taking a copy. However, as soon
as you do something like
my ($x, $y, $z) = @_;
you've copied all your arguments, and if one of those was a huge great
string you've just done a big memcpy.
Of course, manipulating @_ directly is obscure, and you run the risk of
modifying your arguments unexpectedly. Under normal circumstances (when
you're not expecting to deal with truly enormous strings) taking a copy
is the right thing to do. When it isn't, if you are going to modify the
arguments passed it is probably clearer to make the caller pass a
reference explicitly.
Ben
--
Razors pain you / Rivers are damp
Acids stain you / And drugs cause cramp. [Dorothy Parker]
Guns aren't lawful / Nooses give
Gas smells awful / You might as well live. b...@morrow.me.uk
b> s...@netherlands.co wrote:
>> ActiveState perl guidelines state its better to pass by reference for best performance.
>> Does this apply in the case of SCALARS?
b> No.
i beg to differ. when passing around large scalars (e.g. long pieces of
text), passing by ref is better. typically args are copied from @_ and
that will be slower with large scalars. so when i work with large
scalars i tend to pass them by ref. you do have to make sure about
modifications not affecting the original or not caring about that.
but i also don't like giving good advice to sln as he is a troll who
doesn't listen. his xml parser is insane.
uri
--
Uri Guttman ------ u...@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
this should be ^10
>
>// regex
>$savpos = pos($$refscalar);
>pos($$refscalar) = 20;
>while ($$refscalar =~ /(.{10})/gs) {
> pos($$refscalar) = $savpos;
> return $1;
>}
>
>
I have some heartening information about substr that confirms my suspicions.
The pos() is disapointing and still unexplained.
I tested substr with scalar text of 300k bytes. There was a minimum 7,000 calls made, using
as above, a dereferenced scalar. This seems to represent 2.1 GIGABYTES of data on its face.
Check that please.
The series was something like this:
LOOP @7,000 times
------------------
my $t = '';
$t = substr($$refscalar, 20, 10);
$t = substr($$refscalar, 20, 10);
$t = substr($$refscalar, 20, 10);
THIS:
$t = substr($$refscalar, 20, 10);
THAT:
$t = substr($$refscalar, 20, 10);
ENDLOOP:
For the entire 2.1 gigs of data, the difference between THIS and THAT is only .03 seconds !!!!
Its apparent that the function substr() does NOT create a temporary (ie: memcpy)
on the C side, but instead operates on the resultant scalars pointer to directly
access the data via a pointer!!
If you know how this to be true or how it works please let me know.
Thanks!