arrange form data in same order as on form

bbxrider

unread,

Nov 13, 2003, 6:10:59 PM11/13/03

to

is there a way to sort (or other method) the 'method=post' data fields from
a form into
the same order they appear in the form
when i use the following code there doesn't appear to be any particular
order to how they are arranged

read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
@pairs = split(/&/, $buffer);
foreach $pair (@pairs) {
($name, $value) = split(/=/, $pair);
$value =~ tr/+/ /;
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
$FORM{$name} = $value;

Purl Gurl

unread,

Nov 13, 2003, 6:26:55 PM11/13/03

to

bbxrider wrote:

(snipped)

> is there a way to sort (or other method) the 'method=post'
> data fields from a form into the same order they appear in
> the form when i use the following code there doesn't appear
> to be any particular order to how they are arranged

> $FORM{$name} = $value;

What difference does storage order make? You will
use your data in whatever order you choose.

Purl Gurl
--
Amazing Perl Scripts!
http://www.purlgurl.net/~callgirl/android.html

Ben Morrow

unread,

Nov 13, 2003, 6:40:05 PM11/13/03

to

Firstly, you *really* should be using CGI or CGI::Lite rather than
parsing stdin yourself.

If the browser doesn't return them in order, then the only way to put
them back in order is to know what order they were in on the original
form. One way of doing this might be to give them names beginning with
'01', '02' etc.: if you generate the form yourself you can automate
this.

Note that hashes are inherently unordered: if you create a hash with
more than one key there is *no* guarantee about the order you will get
the keys back in when you list them. If you need there to be, then
either keep a separate array of keys to hold the order, or use
Tie::IxHash from CPAN which does that for you.

Ben

--
$.=1;*g=sub{print@_};sub r($$\$){my($w,$x,$y)=@_;for(keys%$x){/main/&&next;*p=$
$x{$_};/(\w)::$/&&(r($w.$1,$x.$_,$y),next);$y eq\$p&&&g("$w$_")}};sub t{for(@_)
{$f&&($_||&g(" "));$f=1;r"","::",$_;$_&&&g(chr(0012))}};t # b...@morrow.me.uk
$J::u::s::t, $a::n::o::t::h::e::r, $P::e::r::l, $h::a::c::k::e::r, $.

A. Sinan Unur

unread,

Nov 13, 2003, 6:40:59 PM11/13/03

to

"bbxrider" <bxtr...@comcast.net> wrote in
news:7cUsb.139690$mZ5.963708@attbi_s54:

> is there a way to sort (or other method) the 'method=post' data
> fields from a form into the same order they appear in the form
> when i use the following code there doesn't appear to be any
> particular order to how they are arranged

why do you care?

> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
> @pairs = split(/&/, $buffer);

this code is buggy. either write your own, taking into account all the fine
points of the specs, or just use CGI.pm. but don't use someone else's buggy
code.

Sinan.

--
A. Sinan Unur
as...@c-o-r-n-e-l-l.edu
Remove dashes for address
Spam bait: mailto:u...@ftc.gov

Eric J. Roode

unread,

Nov 13, 2003, 7:01:41 PM11/13/03

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

"bbxrider" <bxtr...@comcast.net> wrote in
news:7cUsb.139690$mZ5.963708@attbi_s54:

> is there a way to sort (or other method) the 'method=post' data

> fields from a form into
> the same order they appear in the form
> when i use the following code there doesn't appear to be any
> particular order to how they are arranged

The CGI spec does not guarantee that the form variables will be submitted
in any particular order, so you're out of luck.

> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
> @pairs = split(/&/, $buffer);
> foreach $pair (@pairs) {
> ($name, $value) = split(/=/, $pair);
> $value =~ tr/+/ /;
> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
> $FORM{$name} = $value;

This is exceedingly bad code. Unless you really know what you're doing,
and have strong reasons not to, you should use the CGI module.

We keep seeing this exact same bad code posted to this newsgroup. Out of
curiosity, where did you copy it from?

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP7Qbf2PeouIeTNHoEQKb6gCfbgrhGAcLpyRLTC5cUW4U1AsVIsQAn3Ev
bAiVGyJb/3J4v/fhU4Yi9w1q
=vDoO
-----END PGP SIGNATURE-----

Gunnar Hjalmarsson

unread,

Nov 13, 2003, 7:22:47 PM11/13/03

to

A. Sinan Unur wrote:
> "bbxrider" <bxtr...@comcast.net> wrote in
> news:7cUsb.139690$mZ5.963708@attbi_s54:
>

>>read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>@pairs = split(/&/, $buffer);
>
> this code is buggy.

Do you know the context in which that code is used? If not, you can't
reasonably tell whether it's "buggy".

> either write your own, taking into account all the fine
> points of the specs,

That's important only if you are writing a general purpose function or
module. What makes you think that that is what OP is about to do?

> or just use CGI.pm.

That is one option.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Ben Morrow

unread,

Nov 13, 2003, 7:40:10 PM11/13/03

to

Gunnar Hjalmarsson <nor...@gunnar.cc> wrote:
> A. Sinan Unur wrote:
> > "bbxrider" <bxtr...@comcast.net> wrote in
> > news:7cUsb.139690$mZ5.963708@attbi_s54:
> >
> >>read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
> >>@pairs = split(/&/, $buffer);
> >
> > this code is buggy.
>
> Do you know the context in which that code is used? If not, you can't
> reasonably tell whether it's "buggy".

The OP is attempting to interpret a CGI POST request, as stated in the
Original Post.

> > either write your own, taking into account all the fine
> > points of the specs,
>
> That's important only if you are writing a general purpose function or
> module. What makes you think that that is what OP is about to do?

It is important in all circumstances where the data being received is
not entirely under your control; eminently the case in a CGI
environment.

> > or just use CGI.pm.
>
> That is one option.

...and by far the best[1], unless you are (a) very clever and (b) need to
do something particular that CGI.pm doesn't do for you.

The whole *point* of CPAN is so that people don't have to keep failing
to solve the same difficult problems over and over again.

Ben

[1] modulo equivalent alternatives, such as CGI::Lite.

--
Like all men in Babylon I have been a proconsul; like all, a slave ... During
one lunar year, I have been declared invisible; I shrieked and was not heard,
I stole my bread and was not decapitated.
~ b...@morrow.me.uk ~ Jorge Luis Borges, 'The Babylon Lottery'

Gunnar Hjalmarsson

unread,

Nov 13, 2003, 7:59:16 PM11/13/03

to

Ben Morrow wrote:

> Gunnar Hjalmarsson wrote:
>> A. Sinan Unur wrote:

>>> bbxrider wrote:
>>>>
>>>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'}); @pairs =
>>>> split(/&/, $buffer);
>>>
>>> this code is buggy.
>>
>> Do you know the context in which that code is used? If not, you
>> can't reasonably tell whether it's "buggy".
>
> The OP is attempting to interpret a CGI POST request, as stated in
> the Original Post.

I meant context in a more narrow sense, of course.

Still don't understand what it is that makes the above code "buggy".

>>> either write your own, taking into account all the fine points
>>> of the specs,
>>
>> That's important only if you are writing a general purpose
>> function or module. What makes you think that that is what OP is
>> about to do?
>
> It is important in all circumstances where the data being received
> is not entirely under your control; eminently the case in a CGI
> environment.

What's important in those circumstances is that you validate the data
properly, run the program in taint mode, etc. Using CGI.pm does not
take care of everything, right?

>>> or just use CGI.pm.
>>
>> That is one option.
>
> ...and by far the best[1], unless you are (a) very clever and (b)
> need to do something particular that CGI.pm doesn't do for you.
>
> The whole *point* of CPAN is so that people don't have to keep
> failing to solve the same difficult problems over and over again.

I'm not questioning the advantages with code reuse in general. I'm
just (once again) reacting to the aggressive way, sometimes not to the
point, in which some people here argue for using CGI.pm.

Eric Schwartz

unread,

Nov 13, 2003, 8:09:03 PM11/13/03

to

Gunnar Hjalmarsson <nor...@gunnar.cc> writes:
>>>> bbxrider wrote:
>>>>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>

> Still don't understand what it is that makes the above code "buggy".

The read() may not read $ENV{'CONTENT_LENGTH'} bytes into $buffer, and
there's no attempt made to detect or handle this event. Without going
to the effort of reading the original post I don't know for sure, but
I'd bet there's at least three or four other instances where CGI.pm
handles things correctly that the OP's code does not.

> What's important in those circumstances is that you validate the data
> properly, run the program in taint mode, etc. Using CGI.pm does not
> take care of everything, right?

No, but it removes one axis of variability from the list of things
that could be buggy. Given the option of using known-good code and
hacking something up yourself, why (other than learning excercises,
which are surely valuable) would you not use the tested and verified
code?

-=Eric
--
Come to think of it, there are already a million monkeys on a million
typewriters, and Usenet is NOTHING like Shakespeare.
-- Blair Houghton.

Gunnar Hjalmarsson

unread,

Nov 13, 2003, 8:38:42 PM11/13/03

to

Eric Schwartz wrote:
> Gunnar Hjalmarsson <nor...@gunnar.cc> writes:
>>>>> bbxrider wrote:
>>>>>
>>>>>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>
>> Still don't understand what it is that makes the above code
>> "buggy".
>
> The read() may not read $ENV{'CONTENT_LENGTH'} bytes into $buffer,
> and there's no attempt made to detect or handle this event.

Some kind of exception handling is most often useful, but the lack of
it isn't exactly a _bug_, is it?

> Without going to the effort of reading the original post I don't
> know for sure, but I'd bet there's at least three or four other
> instances where CGI.pm handles things correctly that the OP's code
> does not.

None of us knows which of those "things" that are _applicable_ in OP's
program.

>> What's important in those circumstances is that you validate the
>> data properly, run the program in taint mode, etc. Using CGI.pm
>> does not take care of everything, right?
>
> No, but it removes one axis of variability from the list of things
> that could be buggy. Given the option of using known-good code and
> hacking something up yourself, why (other than learning
> excercises, which are surely valuable) would you not use the tested
> and verified code?

I very much dislike the aggressive way in which some people here
advocate the use of CGI, and the lack of faith that is shown in
people's own judge. The described attitude makes me suspicious and
less inclined to listen. How about that for a reason? :)

A. Sinan Unur

unread,

Nov 13, 2003, 9:05:52 PM11/13/03

to

Gunnar Hjalmarsson <nor...@gunnar.cc> wrote in news:bp1c15$1in3ke$1@ID-
184292.news.uni-berlin.de:

> Eric Schwartz wrote:
>> Gunnar Hjalmarsson <nor...@gunnar.cc> writes:
>>>>>> bbxrider wrote:
>>>>>>
>>>>>>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>>
>>> Still don't understand what it is that makes the above code
>>> "buggy".
>>
>> The read() may not read $ENV{'CONTENT_LENGTH'} bytes into $buffer,
>> and there's no attempt made to detect or handle this event.
>
> Some kind of exception handling is most often useful, but the lack of
> it isn't exactly a _bug_, is it?

OK, you can call it something else then. Let's assume you don't care about
that. There is still the fact that

>>>>>>> @pairs = split(/&/, $buffer);

will miss pairs separated by a semicolon. In addition, parameter names are
not unescaped. What happens when the query string given is

?param=;

> I very much dislike the aggressive way in which some people here
> advocate the use of CGI, and the lack of faith that is shown in
> people's own judge.

As Eric Roode pointed out, the same exact code has been posted here
numerous times (e.g. http://groups.google.com/groups?hl=en&lr=&ie=UTF-8
&oe=UTF-8&safe=off&selm=4096148f.0310161157.9400327%40posting.google.com)
so I assumed the OP was not relying on his own judgement, but using someone
else's code. In that case, he is better off using CGI.pm.

A. Sinan Unur

unread,

Nov 13, 2003, 9:07:39 PM11/13/03

to

"Eric J. Roode" <REMOVE...@comcast.net> wrote in
news:Xns9432C1A592...@216.196.97.136:

> "bbxrider" <bxtr...@comcast.net> wrote in
> news:7cUsb.139690$mZ5.963708@attbi_s54:

...

>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>> @pairs = split(/&/, $buffer);
>> foreach $pair (@pairs) {
>> ($name, $value) = split(/=/, $pair);
>> $value =~ tr/+/ /;
>> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>> $FORM{$name} = $value;

...

> We keep seeing this exact same bad code posted to this newsgroup. Out
> of curiosity, where did you copy it from?

It seems to date back to 1996 or even earlier:

http://tinyurl.com/uxq2

Eric Schwartz

unread,

Nov 13, 2003, 9:03:31 PM11/13/03

to

Gunnar Hjalmarsson <nor...@gunnar.cc> writes:
> Some kind of exception handling is most often useful, but the lack of
> it isn't exactly a _bug_, is it?

Is this buggy?

open(FH, '>', "/root/file");
print FH @data;

I'd say heck yeah, because there's no checking that FH was properly
opened, and that's the exact same class of bug we're talking about in
the OP.

>> Without going to the effort of reading the original post I don't
>> know for sure, but I'd bet there's at least three or four other
>> instances where CGI.pm handles things correctly that the OP's code
>> does not.
>
> None of us knows which of those "things" that are _applicable_ in OP's
> program.

Does it matter? As long as there are any, it means that using CGI.pm
(or some equivalent, such as CGI::Lite, or what-have-you) is a better
solution than rolling it on your own. Again, I make exceptions for
doing it for personal learning purposes, because it's good to learn
how to do things, but it's just nuts to not use a module in a
production environment.

>> Given the option of using known-good code and
>> hacking something up yourself, why (other than learning
>> excercises, which are surely valuable) would you not use the tested
>> and verified code?
>
> I very much dislike the aggressive way in which some people here
> advocate the use of CGI, and the lack of faith that is shown in
> people's own judge. The described attitude makes me suspicious and
> less inclined to listen. How about that for a reason? :)

I know you're being snarky, but I'll answer it honestly: it sucks as a
reason. It's juvenile and immature, and overlooks the real benefits
in favour of rebellion for its own sake. It's not that I have a lack
of faith in people's judgement, it's that I have never-- not *once*--
seen hand-rolled CGI parsing code on this newsgroup that wasn't buggy.

Most of the arguments against using CGI.pm, including "it's too slow",
aren't based on facts, they're based on suppositions, and "well, it
stands to reason". Sometimes they're based on facts, and then people
usually agree that that's a bad case to use it in, but I'd
conservatively estimate that 98% of the time it's the way to go.

I'll grant you, there are times when it's a good idea to roll your own
CGI parsing routines. Personal study is one. I can't think of any
situation for which I'd want to CGI.pm that it doesn't already work
for, but if there is one, that's a good reason-- though a better idea,
IMO, is to fix CGI.pm and thus make everyone's life that much better.
Sometimes you don't need the HTML generation routines, and in that
case there are modules like CGI::Lite and others that do the parsing
job for you.

In the end, there are a gazillion ways to get it wrong, and only a
very few ways to do it right. And people being people, the sort of
person who thinks they're saving time and effort by not using CGI.pm
(or equivalent) is not as a rule the sort of person that's going to
take the very painstaking approach of reading the RFCs and following
them correctly, leading to the

10 Hey, it works for me.
20 <tweak>
30 Oh, crap, now it's broke. <code>
40 GOTO 10

loop we're all so painfully familiar with.

bbxrider

unread,

Nov 13, 2003, 9:24:04 PM11/13/03

to

thanks for all the help and opinions
i'm just self learning perl and found some code at
http://www.cgi101.com/class/
and some other searching google groups
actually i dont even know what cgi.pm and cgi lite are but will surely find
out
i dont' mean to try and just steal code, but have found that seeing, using
and understanding examples
really accelerates my learing curve
what i've since found is that the variable containing the form input is in
fact in the same order as the form
this code keeps the original order
foreach (split(/[&;]/, $buffer)) {
s/\+/ /g ;
($name, $value)= split('=', $_, 2) ;
$name=~ s/%([0-9A-Fa-f]{2})/chr(hex($1))/ge ;
$value=~ s/%([0-9A-Fa-f]{2})/chr(hex($1))/ge ;
print "$name = $value";
$buffer{$name}.= "\0" if defined($in{$name}) ; # concatenate
multiple vars
$buffer{$name}.= $value ;
}
i have already set up sql query based inserts that expect the data fields in
order and since there are 67
on the form i want to be able to reuse that code
again thanks for the help, this is a great forum and would hope to return
the help someday

"Eric J. Roode" <REMOVE...@comcast.net> wrote in message
news:Xns9432C1A592...@216.196.97.136...

Gunnar Hjalmarsson

unread,

Nov 13, 2003, 10:15:35 PM11/13/03

to

A. Sinan Unur wrote:
> There is still the fact that
>
>>>>>>>>@pairs = split(/&/, $buffer);
>
> will miss pairs separated by a semicolon.

Not applicable. Look at OP's original post again. Parameters submitted
via forms using the POST method are not separated by semicolons.

> In addition, parameter names are
> not unescaped. What happens when the query string given is
>
> ?param=;

Nothing, since the code does not parse query strings.

I know that you know better than that, Sinan. But now we are talking
about the 'sacred cow' CGI.pm, so the usually educated, logical posts
from you are suddenly replaced with incorrect or questionable
statements. It's anything but convincing.

Gunnar Hjalmarsson

unread,

Nov 13, 2003, 10:15:44 PM11/13/03

to

Eric Schwartz wrote:

> Gunnar Hjalmarsson writes:
>> I very much dislike the aggressive way in which some people here
>> advocate the use of CGI, and the lack of faith that is shown in
>> people's own judge. The described attitude makes me suspicious
>> and less inclined to listen. How about that for a reason? :)
>
> I know you're being snarky, but I'll answer it honestly: it sucks
> as a reason. It's juvenile and immature, and overlooks the real
> benefits in favour of rebellion for its own sake.

Maybe true, when you look at it from a rational angle. But it's human.

See my latest reply to Sinan. (This is my last post in this thread, I
promise. :) )

Ben Morrow

unread,

Nov 13, 2003, 10:25:02 PM11/13/03

to

[please don't top-post]

"bbxrider" <bxtr...@comcast.net> wrote:
> thanks for all the help and opinions i'm just self learning perl and
> found some code at http://www.cgi101.com/class/ and some other
> searching google groups actually i dont even know what cgi.pm and
> cgi lite are but will surely find out

For CGI, type 'perldoc CGI' at a command prompt. CGI::Lite you would
need to install. Note that the names of Perl modules are case-sensitive.

> i dont' mean to try and just steal code, but have found that seeing,
> using and understanding examples really accelerates my learing curve

Absolutely. Reading decent code is one of the best ways to learn. You
do have to be sure your source is reliable, though: there is one hell
of a lot of very bad Perl floating around the web.

> what i've since found is that the variable containing the form input
> is in fact in the same order as the form this code keeps the
> original order

I don't think this is guaranteed, by which I mean that it may happen
to work for you with your browser during this phase of the moon, but
under other circumstances it may well not. If you need to keep
separate track of the different paramaters, give them different
names. Change whatever generates them to put a number on the end, or
something.

Ben

--
For the last month, a large number of PSNs in the Arpa[Inter-]net have been
reporting symptoms of congestion ... These reports have been accompanied by an
increasing number of user complaints ... As of June,... the Arpanet contained
47 nodes and 63 links. [ftp://rtfm.mit.edu/pub/arpaprob.txt] * b...@morrow.me.uk

A. Sinan Unur

unread,

Nov 13, 2003, 10:29:49 PM11/13/03

to

Gunnar Hjalmarsson <nor...@gunnar.cc> wrote in news:bp1hn3$1is5k8$1@ID-
184292.news.uni-berlin.de:

> A. Sinan Unur wrote:
>
>> In addition, parameter names are not unescaped.
>> What happens when the query string given is
>>
>> ?param=;
>
> Nothing, since the code does not parse query strings.

Hmmmm ... Let's say I have been doing too much 'task-switching' today,
leading to muddled thinking.

> But now we are talking about the 'sacred cow' CGI.pm, so the
> usually educated, logical posts from you are suddenly replaced with
> incorrect or questionable statements.

It is not that ... Simply jumbled neural paths or something. I do
apologize.

Keith Keller

unread,

Nov 13, 2003, 10:49:55 PM11/13/03

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2003-11-14, Gunnar Hjalmarsson <nor...@gunnar.cc> wrote:
>
> Not applicable. Look at OP's original post again. Parameters submitted
> via forms using the POST method are not separated by semicolons.

What if he wants to support GET in the future?

> Nothing, since the code does not parse query strings.

What if he wants to parse query strings in the future?

- --keith

- --
kkeller...@wombat.san-francisco.ca.us
(try just my userid to email me)
AOLSFAQ=http://wombat.san-francisco.ca.us/cgi-bin/fom

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)

iD8DBQE/tFDihVcNCxZ5ID8RApBfAJ90LhcK0PGkyyZv/7q5Mhfag4/rogCeLNcn
MIWcADFGVC1a1mzcW+V1Nvc=
=GtN2
-----END PGP SIGNATURE-----

bbxrider

unread,

Nov 14, 2003, 1:04:34 AM11/14/03

to

what is 'top-post' ???
actually don't understand how the eric roode and first sinan unur posts were
not subordinated to the post immediately above them,
i simply use reply-to-group and it always subordinates to the post i'm
responding to

"Ben Morrow" <use...@morrow.me.uk> wrote in message
news:bp1hue$mj7$2...@wisteria.csv.warwick.ac.uk...

David H. Adler

unread,

Nov 14, 2003, 3:37:23 AM11/14/03

to

In article <bp1c15$1in3ke$1...@ID-184292.news.uni-berlin.de>, Gunnar
Hjalmarsson wrote:

> Eric Schwartz wrote:

[re: CGI.pm]

>> No, but it removes one axis of variability from the list of things
>> that could be buggy. Given the option of using known-good code and
>> hacking something up yourself, why (other than learning
>> excercises, which are surely valuable) would you not use the tested
>> and verified code?
>
> I very much dislike the aggressive way in which some people here
> advocate the use of CGI, and the lack of faith that is shown in
> people's own judge. The described attitude makes me suspicious and
> less inclined to listen. How about that for a reason? :)

Looking at the question and the answer, in isolation at least, I'm going
to assume the use of "reason" there is irony. :-)

dha

--
David H. Adler - <d...@panix.com> - http://www.panix.com/~dha/
'Don't be tempted to veer off!'
- Paul McGann

Alan J. Flavell

unread,

Nov 14, 2003, 6:52:38 AM11/14/03

to

On Fri, 14 Nov 2003, Ben Morrow wrote:

> Absolutely. Reading decent code is one of the best ways to learn.

Well, yes, but there's a massive difference between the elaborate code
that might be found in a well-tested and peer-reviewed module,
intended to deal well with all possible situations that it's going to
encounter in the Real World(tm), on the one hand; and a
straightforward little script to use that module, checking that all is
well but otherwise simply baling out when it recognises that it's not.

Or in clear text: CGI.pm internally appears to be contorted code, but
there's generally good reasons for what it does and how it does it;
however, it's probably not the kind of code that the average *user* of
CGI.pm should be seeking to emulate.

> You do have to be sure your source is reliable, though: there is one
> hell of a lot of very bad Perl floating around the web.

That too, for sure. But that's a different axis of evaluation.

> > what i've since found is that the variable containing the form input
> > is in fact in the same order as the form this code keeps the
> > original order
>
> I don't think this is guaranteed,

I would ask anyone interested in the following to read all of it,
carefully, or not at all. Half-measures are inadvisable.

Point 1. Read
http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1 , item 2.

in the paragraph beginning 'A "multipart/form-data" message contains a
series of parts'.

Thus, both of the mandatory submission formats specify that the items
are required to be submitted in the same order that they appeared in
the form.

Point 2. Client agents don't necessarily conform to the spec
(although most of them do nowadays).

Point 3. In Perl, f you get your submitted name/value pairs from the
module as a "hash", then of course the ordering has been lost by then.

However, in every other respect, the hash is very much the "natural"
way to represent these things in Perl.

Point 4. The whole point of defining the values by name/value pairs
is surely to make them accessible by name rather than by position?
If the designers of HTML forms had wanted to implement positional
parameters, they could have done so (in fact they already did - check
the <ISINDEX> element, now deprecated, from earlier versions of HTML).

My conclusion: although the HTML4 spec requires the name/value pairs
to be transmitted in same order they appear in the form, it seems to
me that it's utterly pointless to want to rely on all client software
actually doing that. I've often met writers of scripts who seemed
completely obsessed with needing to process the items in the same
order as which they were present in the form, but on closer study I've
never found any justification for doing so, and as soon as the writer
agreed to drop their insistence that they "needed" this, they found
their scripts were easier to write, with no loss of functionality.

While I'm sure that someone could devise a requirement that depended
on the ordering, I can't see any advantage in doing so.

IMHO and YMMVWV.

You may very well want to re-write the form e.g with existing inputs
filled-in and waiting for further input from the user - but the right
way to do that is probably to use the same code to write the original
empty form as re-writes the partially completed form, and that code
will certainly know what is the proper ordering of the items on the
HTML form itself. But when the boss says the items have to come in a
different order on the web page, there will be no need for a major
rewrite of the code to take that into account, if you've written code
that isn't sensitive to the ordering in the first place.

> by which I mean that it may happen
> to work for you with your browser during this phase of the moon, but
> under other circumstances it may well not.

Something like that; but by gaining the benefits of the hash
representation, one also discards any supposed benefits there might
have been in the original ordering, so - as I say - it seems to me to
be the wrong approach anyway.

> If you need to keep separate track of the different paramaters, give
> them different names. Change whatever generates them to put a number
> on the end, or something.

If you want to iterate through the name/value pairs that are present,
then just iterate through the keys of the hash. Write the code so
that the ordering doesn't matter. The resulting code is likely to be
simpler than trying to re-create the problem of positional parameters
all over again - would be my advice.

Eric J. Roode

unread,

Nov 14, 2003, 7:20:19 AM11/14/03

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

"bbxrider" <bxtr...@comcast.net> wrote in
news:81Xsb.144719$ao4.463347@attbi_s51:

> thanks for all the help and opinions
> i'm just self learning perl and found some code at
> http://www.cgi101.com/class/

I just spent some time perusing this site. It's not a bad site overall,
as far as an introduction to CGI programming goes. The way they
introduce processing of input variables is fine -- but I wish they had
moved immediately on to using CGI.pm, instead of saving it until chapter
17. That code is okay for learning, but is awful for any real work.

> actually i dont even know what cgi.pm and cgi lite are but will surely
> find out

Yes, you should. CGI.pm is a module that comes with the Perl
distribution. It automates much of the dirty work behind processing CGI
forms, plus it has some security checks to protect you from DOS attacks.

> i dont' mean to try and just steal code, but have found that seeing,
> using and understanding examples
> really accelerates my learing curve

Absolutely. Borrowing and adapting others' code is a great way to learn.
Just be aware of the limitations of the code you're using! :-)

> what i've since found is that the variable containing the form input
> is in fact in the same order as the form

Most (all?) browsers do submit the variables in the same order that they
appear on the form, but this is NOT guaranteed. Besides, why do you need
them to be in any particular order? They all have names.

> this code keeps the original order
> foreach (split(/[&;]/, $buffer)) {
> s/\+/ /g ;
> ($name, $value)= split('=', $_, 2) ;
> $name=~ s/%([0-9A-Fa-f]{2})/chr(hex($1))/ge ;
> $value=~ s/%([0-9A-Fa-f]{2})/chr(hex($1))/ge ;
> print "$name = $value";
> $buffer{$name}.= "\0" if defined($in{$name}) ; # concatenate
> multiple vars
> $buffer{$name}.= $value ;
> }

Yes, this is much better. However, be aware that CGI.pm does all of this
for you. Less typing, and it's already debugged for you.

> i have already set up sql query based inserts that expect the data
> fields in order and since there are 67
> on the form i want to be able to reuse that code

Well, all of your form variables are named, right? So process them in
name order.

> again thanks for the help, this is a great forum and would hope to
> return the help someday

You're welcome.

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP7TImGPeouIeTNHoEQJT4QCfbqG0ESDylR8pTZDPjeaCDAh4Rf0AmgP+
1ZIw0EXmWZEP5GzNZNCgZz06
=fDlp
-----END PGP SIGNATURE-----

Eric J. Roode

unread,

Nov 14, 2003, 7:26:09 AM11/14/03

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gunnar Hjalmarsson <nor...@gunnar.cc> wrote in news:bp19n3$1iabf3$1@ID-
184292.news.uni-berlin.de:

>
> Still don't understand what it is that makes the above code "buggy".

[OP's posted code]:

> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
> @pairs = split(/&/, $buffer);

> foreach $pair (@pairs) {
> ($name, $value) = split(/=/, $pair);
> $value =~ tr/+/ /;
> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
> $FORM{$name} = $value;

1. The read() may fail. No check is made to see if it does.

2. This code does not handle GET requests.

3. CGI parameters may be separated by semicolons instead of ampersands.

4. If a faulty browser fails to encode "=" with a % escape, and that "="
is part of a form variable value, this code will drop that portion of the
value. I've seen browsers do this. split() should use the limit
parameter.

5. No limit is placed on the quantity of data read, opening the script to
possible DOS attack.

> I'm not questioning the advantages with code reuse in general. I'm
> just (once again) reacting to the aggressive way, sometimes not to the
> point, in which some people here argue for using CGI.pm.

Surely you can't be questioning the value of CGI.pm over the above code?
I have more respect for you than that, Gunnar! :-)

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP7TJ9GPeouIeTNHoEQJRbgCfXwD+RAL7yELVGwmJ53xPd4TSaNEAoPkD
xN+aqh2FBYWsF6sXTLfZD3xw
=G1nw
-----END PGP SIGNATURE-----

Gunnar Hjalmarsson

unread,

Nov 14, 2003, 9:21:27 AM11/14/03

to

Eric J. Roode wrote:

> Gunnar Hjalmarsson wrote:
>> Still don't understand what it is that makes the above code
>> "buggy".
>
> [OP's posted code]:
>
>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>> @pairs = split(/&/, $buffer);
>> foreach $pair (@pairs) {
>> ($name, $value) = split(/=/, $pair);
>> $value =~ tr/+/ /;
>> $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>> $FORM{$name} = $value;

Note that my initial comment only referred to the two first of those
lines.

> 1. The read() may fail. No check is made to see if it does.
>
> 2. This code does not handle GET requests.
>
> 3. CGI parameters may be separated by semicolons instead of
> ampersands.
>
> 4. If a faulty browser fails to encode "=" with a % escape, and
> that "=" is part of a form variable value, this code will drop that
> portion of the value. I've seen browsers do this. split() should
> use the limit parameter.
>
> 5. No limit is placed on the quantity of data read, opening the
> script to possible DOS attack.

Thanks for that list over CGI.pm features.

To me, a piece of code that does what it's _intended_ to do is not
"buggy". It may have _limitations_, but limitations and bugs are not
the same thing.

If I want my program to print today's date in ISO 8601 format, I may
use this code:

my $time = time;
sub myDate {
my @t = (gmtime $time)[3..5];
sprintf '%d-%02d-%02d', $t[2] += 1900, ++$t[1], $t[0];
}
print myDate();

I could have used your Time::Format module instead, but if I don't
need a variety of date and time formats in my program, I wouldn't
likely have done so.

Time::Format includes some nice tools for time formating, no doubt.
Nevertheless, that fact wouldn't make you claim that my myDate()
function is "buggy", right?

A. Sinan Unur

unread,

Nov 14, 2003, 9:37:04 AM11/14/03

to

Gunnar Hjalmarsson <nor...@gunnar.cc> wrote in news:bp2opd$1jtuap$1@ID-
184292.news.uni-berlin.de:

>> Gunnar Hjalmarsson wrote:

> To me, a piece of code that does what it's _intended_ to do is not
> "buggy". It may have _limitations_, but limitations and bugs are not
> the same thing.

On the other hand, there is usually a difference between what the author
of the code intends it to do and what the user of the code thinks it
does. In the case of OP's code, it had not been written by him (as I had
surmised) and we cannot expect the OP to have had full understanding of
the 'limitations' of the code. Hence my suggestion to either roll his own
paying attention to details (if this is for a learning exercise) or use
CGI.pm if he just wants to parse a form and feel safe.

> If I want my program to print today's date in ISO 8601 format, I may
> use this code:
>
> my $time = time;
> sub myDate {
> my @t = (gmtime $time)[3..5];
> sprintf '%d-%02d-%02d', $t[2] += 1900, ++$t[1], $t[0];
> }
> print myDate();

...

> Time::Format includes some nice tools for time formating, no doubt.
> Nevertheless, that fact wouldn't make you claim that my myDate()
> function is "buggy", right?

Is it possible to bring a web server down using your myDate function?

Gunnar Hjalmarsson

unread,

Nov 14, 2003, 10:00:24 AM11/14/03

to

A. Sinan Unur wrote:
> Gunnar Hjalmarsson wrote:
>> To me, a piece of code that does what it's _intended_ to do is
>> not "buggy". It may have _limitations_, but limitations and bugs
>> are not the same thing.
>
> On the other hand, there is usually a difference between what the
> author of the code intends it to do and what the user of the code
> thinks it does. In the case of OP's code, it had not been written
> by him (as I had surmised) and we cannot expect the OP to have had
> full understanding of the 'limitations' of the code.

If you don't know what you are doing, don't do it. I can agree on
that, not least when it comes to CGI.

> Hence my suggestion to either roll his own paying attention to
> details (if this is for a learning exercise) or use CGI.pm if he
> just wants to parse a form and feel safe.

"Safe"??? That's another annoying thing with the arguments used by the
'CGI.pm fan club'. Very often you give the impression that by using
CGI.pm, you don't need to bother about anything, since other very
experienced programmers have already taken care of it for you.

You know very well that there are security implications with CGI
scripts, whether you use CGI.pm or not. So why on earth do you talk
about feeling "safe"?

>> Time::Format includes some nice tools for time formating, no
>> doubt. Nevertheless, that fact wouldn't make you claim that my
>> myDate() function is "buggy", right?
>
> Is it possible to bring a web server down using your myDate
> function?

Probably not. But it can be done with a CGI script, even if CGI.pm is
used to parse form data.

A. Sinan Unur

unread,

Nov 14, 2003, 10:17:10 AM11/14/03

to

Gunnar Hjalmarsson <nor...@gunnar.cc> wrote in news:bp2r2h$1jhuen$1@ID-
184292.news.uni-berlin.de:

> A. Sinan Unur wrote:
>> Gunnar Hjalmarsson wrote:

...

>> Hence my suggestion to either roll his own paying attention to
>> details (if this is for a learning exercise) or use CGI.pm if he
>> just wants to parse a form and feel safe.
>
> "Safe"??? That's another annoying thing with the arguments used by the
> 'CGI.pm fan club'. Very often you give the impression that by using
> CGI.pm, you don't need to bother about anything, since other very
> experienced programmers have already taken care of it for you.

Well, maybe I should have fully spelt it out. I meant "feel safe that the
nuts and bolts of parsing the form is properly taken care of". I did not
mean to imply that just by sticking a use CGI; you never have to worry
about the security implications of running a program using untrusted
data. But then, that is not a Perl issue.

>>> Time::Format includes some nice tools for time formating, no
>>> doubt. Nevertheless, that fact wouldn't make you claim that my
>>> myDate() function is "buggy", right?
>>
>> Is it possible to bring a web server down using your myDate
>> function?
>
> Probably not. But it can be done with a CGI script, even if CGI.pm is
> used to parse form data.

It can be done in a CGI script regardless of the programming language and
libraries used. But the culprit should not be that you blindly copied
code that has been in circulation at least since 1996 instead of using a
peer-reviewed module.

Gunnar Hjalmarsson

unread,

Nov 14, 2003, 10:44:16 AM11/14/03

to

A. Sinan Unur wrote:
> [Bringing a web server down] can be done in a CGI script regardless

> of the programming language and libraries used. But the culprit
> should not be that you blindly copied code that has been in
> circulation at least since 1996 instead of using a peer-reviewed
> module.

Maybe we can finally reach an agreement about this? :)

IMO, the keyword above is "blindly". You should of course never copy
and use *any* code fragment if you don't know how it works. Doing so
cannot be an acceptable alternative to using an established module.

Isn't the real problem that many beginners copy pieces of code that
they don't *understand*, and use them in production code? If so,
wouldn't it be better to say just that, rather than claiming that
every occurrence of code that parses form data is bad or buggy by
definition?

A. Sinan Unur

unread,

Nov 14, 2003, 11:25:13 AM11/14/03

to

"bbxrider" <bxtr...@comcast.net> wrote in
news:Sf_sb.145049$9E1.742137@attbi_s52:

> what is 'top-post' ???

Google is your friend.

http://www.xs4all.nl/%7ewijnands/nnq/nquote.html (see Q.7)

Sinan.

Purl Gurl

unread,

Nov 14, 2003, 11:35:50 AM11/14/03

to

Gunnar Hjalmarsson wrote:

> A. Sinan Unur wrote:

> > [Bringing a web server down] can be done in a CGI script regardless
> > of the programming language and libraries used. But the culprit
> > should not be that you blindly copied code that has been in
> > circulation at least since 1996 instead of using a peer-reviewed
> > module.

> IMO, the keyword above is "blindly". You should of course never copy
> and use *any* code fragment if you don't know how it works. Doing so
> cannot be an acceptable alternative to using an established module.

> Isn't the real problem that many beginners copy pieces of code that
> they don't *understand*, and use them in production code? If so,
> wouldn't it be better to say just that, rather than claiming that
> every occurrence of code that parses form data is bad or buggy by
> definition?

Use of modules is blindly copying code without understanding. This
Perl 5 Cargo Cult practice led to this term, "Copy And Paste Babies."

Rather ironic, yes?

You are debating with Perl 5 Cargo Cultists, a pointless endeavor
which will never result in truth being told. You might as well
be trying to convince Saddam Hussien to embrace democratic ideals
of freedom, liberty and justice.

Use of Stein's CGI.pm module is amongst your worst possible
programming choices.

Purl Gurl
--
Roberta The Remarkable Robot
http://www.purlgurl.net/~callgirl/roberta/roberta.cgi
Roberta's Operator's Manual
http://www.purlgurl.net/~callgirl/roberta/help.html

A. Sinan Unur

unread,

Nov 14, 2003, 11:47:55 AM11/14/03

to

Gunnar Hjalmarsson <nor...@gunnar.cc> wrote in news:bp2tkt$1kh0be$1@ID-
184292.news.uni-berlin.de:

> A. Sinan Unur wrote:
>> [Bringing a web server down] can be done in a CGI script regardless
>> of the programming language and libraries used. But the culprit
>> should not be that you blindly copied code that has been in
>> circulation at least since 1996 instead of using a peer-reviewed
>> module.
>
> Maybe we can finally reach an agreement about this? :)
>
> IMO, the keyword above is "blindly". You should of course never copy
> and use *any* code fragment if you don't know how it works. Doing so
> cannot be an acceptable alternative to using an established module.

Agreed.

> Isn't the real problem that many beginners copy pieces of code that
> they don't *understand*, and use them in production code? If so,
> wouldn't it be better to say just that, rather than claiming that
> every occurrence of code that parses form data is bad or buggy by
> definition?

Well, I have not claimed every occurence of such code is buggy by
definition. I reacted to the read and query string parsing bugs (later
retracted my objection to the latter). In this specific instance, I was
reacting to code that I have seen posted numerous times with no
indication that the poster was aware of potential pitfalls.

Sinan.

Alan J. Flavell

unread,

Nov 14, 2003, 11:13:22 AM11/14/03

to

On Fri, 14 Nov 2003, Gunnar Hjalmarsson wrote:

> To me, a piece of code that does what it's _intended_ to do is not
> "buggy". It may have _limitations_, but limitations and bugs are not
> the same thing.

I don't think there's any real disagreement over that, unless the
limitation under discussion was in the department of "inability of the
code to protect itself against dangerous input from the client", in
which case I'd rate it as not only a limitation but also a bug.

> If I want my program to print today's date in ISO 8601 format, I may
> use this code:

However, it's a fact of programming life that the initial design and
implementation often represents only a tiny fraction of the software's
total lifetime support implications. So a program that can only
produce a single date format might very well later be called upon to
produce a different format, or to correctly report the time in someone
else's timezone, or whatever. So an initial design which is capable
of being easily extended to do these things may offer some real
advantages over one that will need additional one-off code development
to achieve the same result, in terms of later maintenance commitments.

Case in point: a few days after the end of European daylight savings
time this year, I had occasion to deal with a USAn videoconference
booking system. It thought that the clock time in the UK was BST (it
was not) and numerically the same as in Geneva(CH) (it was not) and
an hour away from the time in Hamburg(DE) (it got that much right).

When I reported the discrepancy, I was told "the software can be
tweaked". I'm sure it can, but why would it need to? Computer
systems in the various locations _know_ the correct time and timezone
for any supported locale - their sysadmins do not need to "tweak"
them. Evidently the company that implemented the videoconferencing
server had re-invented a square wheel, no?

Gunnar Hjalmarsson

unread,

Nov 14, 2003, 11:48:40 AM11/14/03

to

Purl Gurl wrote:

> Gunnar Hjalmarsson wrote:
>> You should of course never copy and use *any* code fragment if
>> you don't know how it works. Doing so cannot be an acceptable
>> alternative to using an established module.
>

> Use of modules is blindly copying code without understanding. This
> Perl 5 Cargo Cult practice led to this term, "Copy And Paste
> Babies."
>
> Rather ironic, yes?

Don't see the irony. Copying a piece of code out from the context in
which is was intended to work is very different from using a CPAN
module and calling its methods in accordance with the documentation.
Unlike the former piece of code, the intended purpose of the module is
that it can be incorporated in a program even if the user don't
understand all its internals.

Purl Gurl

unread,

Nov 14, 2003, 12:15:39 PM11/14/03

to

Gunnar Hjalmarsson wrote:

> Purl Gurl wrote:
> > Gunnar Hjalmarsson wrote:

(snipped)

> >> You should of course never copy and use *any* code fragment if
> >> you don't know how it works. Doing so cannot be an acceptable
> >> alternative to using an established module.

> > Use of modules is blindly copying code without understanding. This
> > Perl 5 Cargo Cult practice led to this term, "Copy And Paste Babies."

> > Rather ironic, yes?

> Don't see the irony.

Then you are "blindly" copying code.

This irony is clear to a mind's eye with clear vision.

> Unlike the former piece of code, the intended purpose of the module is
> that it can be incorporated in a program even if the user don't
> understand all its internals.

Is not your statement the same as before, a reference to
copying code and not understanding how it works?

"You should of course never copy and use *any* code fragment if
you don't know how it works."

Rather ironic, yes?

Unless you have read and fully understand all six-thousand plus
lines of Stein's module, his quarter-megabyte module, unless you
completely and fully understand every bit of code in his module,
with usage, then you are copying and pasting code you do not
understand which is one premise of Perl 5 Cargo Cultists' critiques.

Rather ironic, yes?

Using modules without understanding is to do precisely what
Perl 5 Cargo Cultists rant about, this use of "cargo cult"
which is precisely what modules are, "cargo cult."

Cargo Cultists labeling code they don't like, "cargo cult,"
usually with never knowing why they don't like specific code.

Rather oxymoronic, yes?

Purl Gurl
--
Corvette Mako Sharks! 56 Chevy Napco 4X4!
http://www.purlgurl.net/~godzilla/

Eric J. Roode

unread,

Nov 14, 2003, 12:21:27 PM11/14/03

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gunnar Hjalmarsson <nor...@gunnar.cc> wrote in news:bp2opd$1jtuap$1@ID-
184292.news.uni-berlin.de:

> Eric J. Roode wrote:
>> Gunnar Hjalmarsson wrote:
>>> Still don't understand what it is that makes the above code
>>> "buggy".
>>
>> [OP's posted code]:
>>
>>> read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>> @pairs = split(/&/, $buffer);

...

>
> Note that my initial comment only referred to the two first of those
> lines.

Well, even just those two lines are the subject of four of my five
arguments against the whole code block. :-)

> To me, a piece of code that does what it's _intended_ to do is not
> "buggy". It may have _limitations_, but limitations and bugs are not
> the same thing.

Agreed.

> If I want my program to print today's date in ISO 8601 format, I may
> use this code:
>
> my $time = time;
> sub myDate {
> my @t = (gmtime $time)[3..5];
> sprintf '%d-%02d-%02d', $t[2] += 1900, ++$t[1], $t[0];
> }
> print myDate();
>
> I could have used your Time::Format module instead, but if I don't
> need a variety of date and time formats in my program, I wouldn't
> likely have done so.
>
> Time::Format includes some nice tools for time formating, no doubt.
> Nevertheless, that fact wouldn't make you claim that my myDate()
> function is "buggy", right?

Your example is a bit simplistic. It is indeed simple to roll one's own
date-formatting code. Your code above has no obvious bugs that jump out
and catch my attention. It is limited in that its format is hard-coded,
but so what? That maybe sufficient for your needs, and as you point out,
a limitation is not a bug.

However, the OP (and hundreds of others like him) were apparently under
the impression that their code would be sufficient to "parse CGI input
parameters". In many cases it would, but in many cases not. And it is
not so simple to write robust CGI input handling code. It's not rocket
science -- but it's a silly wheel to reinvent.

<imho>
It's foolish to write twenty or thirty lines of robust CGI-parsing code
and include it in every CGI you write. It's more foolish to write five
or ten lines of crappy CGI-parsing code and include it in every CGI
program you write. It's much less foolish to write your own robust CGI-
parsing code, wrap it up in a nice module, and use that module from your
own CGI programs.

It's even less foolish to just use the already-written, combat-tested
CGI.pm module. It's a no-brainer.
</imho>

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP7UOl2PeouIeTNHoEQLzYACgx8IVkq5OBGar98dChVQ46a8dggQAoLTb
wZwJIm1P6iVuyABxxUFgK3j1
=hXcR
-----END PGP SIGNATURE-----

Purl Gurl

unread,

Nov 14, 2003, 12:25:58 PM11/14/03

to

Purl Gurl wrote:

> Gunnar Hjalmarsson wrote:
> > A. Sinan Unur wrote:

(snipped)

> Use of Stein's CGI.pm module is amongst your worst possible
> programming choices.

Looking through this inane thread, I will add usage of Stein's
module automatically sets you up for a denial of service attack.

A person complained about denial of service attack using so called
"cargo cult" code, yet Stein's module contains this major security
blunder, quite automatically. This is a well known security blunder
addressed as far back as middle nineties, addressed by many
examples of inanely labeled "cargo cult" code, long before Stein
wrote his boondoggle module.

Stein's module contains a hidden security flaw, one of many,
which has been common knowledge for well over a decade.

Did I not say, "...truth will never be told...." in a previous article?

Purl Gurl
--
Most Entertaining Android In Existence!
http://www.purlgurl.net/~callgirl/roberta/roberta.cgi

Darin McBride

unread,

Nov 14, 2003, 12:29:11 PM11/14/03

to

Purl Gurl wrote:

> Gunnar Hjalmarsson wrote:
>
>> Purl Gurl wrote:
>> > Gunnar Hjalmarsson wrote:
>
> (snipped)
>
>> >> You should of course never copy and use *any* code fragment if
>> >> you don't know how it works. Doing so cannot be an acceptable
>> >> alternative to using an established module.
>
>> > Use of modules is blindly copying code without understanding. This
>> > Perl 5 Cargo Cult practice led to this term, "Copy And Paste Babies."
>
>> > Rather ironic, yes?
>
>> Don't see the irony.
>
> Then you are "blindly" copying code.
>
> This irony is clear to a mind's eye with clear vision.

Interesting definition.

>> Unlike the former piece of code, the intended purpose of the module is
>> that it can be incorporated in a program even if the user don't
>> understand all its internals.
>
> Is not your statement the same as before, a reference to
> copying code and not understanding how it works?
>
> "You should of course never copy and use *any* code fragment if
> you don't know how it works."
>
> Rather ironic, yes?

No. Copying code and tweaking it is quite different from using code
that was intended to be used, in the way it was intended to be used.

> Unless you have read and fully understand all six-thousand plus
> lines of Stein's module, his quarter-megabyte module, unless you
> completely and fully understand every bit of code in his module,
> with usage, then you are copying and pasting code you do not
> understand which is one premise of Perl 5 Cargo Cultists' critiques.
>
> Rather ironic, yes?

I assume you don't use perl at all, then, right? Do you understand all
the megabytes of C code with which the standard perl functions are made
up from?

> Using modules without understanding is to do precisely what
> Perl 5 Cargo Cultists rant about, this use of "cargo cult"
> which is precisely what modules are, "cargo cult."

First time I've ever seen the term "Cargo Cultists". Care to define
the term?

Purl Gurl

unread,

Nov 14, 2003, 12:41:07 PM11/14/03

to

Darin McBride wrote:

> Purl Gurl wrote:
> > Gunnar Hjalmarsson wrote:
> >> Purl Gurl wrote:
> >> > Gunnar Hjalmarsson wrote:

(snipped)

> I assume you don't use perl at all, then, right?

This is right. I know nothing about Perl, have never
used Perl and have never written a Perl program.

Gunnar Hjalmarsson

unread,

Nov 14, 2003, 12:34:23 PM11/14/03

to

A. Sinan Unur wrote:
> Gunnar Hjalmarsson wrote:

>> Isn't the real problem that many beginners copy pieces of code
>> that they don't *understand*, and use them in production code? If
>> so, wouldn't it be better to say just that, rather than claiming
>> that every occurrence of code that parses form data is bad or
>> buggy by definition?
>
> Well, I have not claimed every occurence of such code is buggy by
> definition.

No, you haven't. My apologies for that.

Gunnar Hjalmarsson

unread,

Nov 14, 2003, 12:34:28 PM11/14/03

to

Alan J. Flavell wrote:
> Gunnar Hjalmarsson wrote:
>> To me, a piece of code that does what it's _intended_ to do is
>> not "buggy". It may have _limitations_, but limitations and bugs
>> are not the same thing.
>
> I don't think there's any real disagreement over that, unless the
> limitation under discussion was in the department of "inability of
> the code to protect itself against dangerous input from the
> client", in which case I'd rate it as not only a limitation but
> also a bug.
>
>> If I want my program to print today's date in ISO 8601 format, I
>> may use this code:
>
> However, it's a fact of programming life that the initial design
> and implementation often represents only a tiny fraction of the
> software's total lifetime support implications. So a program that
> can only produce a single date format might very well later be
> called upon to produce a different format, or to correctly report
> the time in someone else's timezone, or whatever. So an initial
> design which is capable of being easily extended to do these things
> may offer some real advantages over one that will need additional
> one-off code development to achieve the same result, in terms of
> later maintenance commitments.

Absolutely. That's things to consider when deciding whether to use a
module, but it has nothing to do with the question if the alternative
contains bugs or not.

Purl Gurl

unread,

Nov 14, 2003, 3:10:19 PM11/14/03

to

Purl Gurl wrote:

> Purl Gurl wrote:
> > Gunnar Hjalmarsson wrote:
> > > A. Sinan Unur wrote:

(snipped)

> > Use of Stein's CGI.pm module is amongst your worst possible
> > programming choices.

> Looking through this inane thread, I will add usage of Stein's
> module automatically sets you up for a denial of service attack.

> A person complained about denial of service attack using so called
> "cargo cult" code, yet Stein's module contains this major security
> blunder, quite automatically.

I am enjoying this.

Here is code the boys here label as "cargo cult" and indicate
to never use. Nonetheless, this code, upon which Stein's module
is partially based, this code automatically protects against a
denial of service attack where Stein's module automatically
creates potential for a denial of service attack.

Pretty darn funny, especially knowing this code dates back to
and is written for Perl 4 versions.

http://cgi-lib.berkeley.edu/

Keep on telling people to use modules, without warning
those people about the dangers. Absolutely! No need to
question nor understand modules because they are peer
reviewed and safe for use!

Have I discussed how Stein's module destroys upload files
and generates memory leaks so severe a system will grind
to a complete halt?

I am sure I have discussed how Stein's module inflicts
an efficiency loss, typically one-thousand-three-hundred
percent and never less than eight-hundred percent.

Yep, those cargo cult modules are technological wonders
leaving a decent programmer, wondering.

So tell me, just what is "cargo cult" and what is not?

Purl Gurl Queen Of Perl Heretics.
--
Learn My Native Tongue, Choctaw!
http://www.purlgurl.net/~choctaw/

Gunnar Hjalmarsson

unread,

Nov 14, 2003, 3:47:12 PM11/14/03

to

Purl Gurl wrote:
> A person complained about denial of service attack using so called
> "cargo cult" code, yet Stein's module contains this major security
> blunder, quite automatically.

CGI.pm does not by default limit the amount of data that can be read
from STDIN, which is something that I believe some people aren't aware
of. Is that what you are referring to?

Purl Gurl

unread,

Nov 14, 2003, 4:16:15 PM11/14/03

to

Gunnar Hjalmarsson wrote:

> Purl Gurl wrote:

> > A person complained about denial of service attack using so called
> > "cargo cult" code, yet Stein's module contains this major security
> > blunder, quite automatically.

> CGI.pm does not by default limit the amount of data that can be read
> from STDIN, which is something that I believe some people aren't aware
> of. Is that what you are referring to?

Yes, precisely.

Brenner's method, which is labeled as "cargo cult" by participants
here, does this automatically.

Over the years, I have read literally thousands of examples of
scripts using the CGI.pm module, very few, perhaps a dozen,
make use of MAX POST.

Although I have not read thousands of examples using CGI.pm
in this group, those I have read, only a small handful use
MAX POST syntax.

This is a good example of the boys here taunting use of a
module, which in turn creates potential for serious problems.

Why does this happen? Simple. Almost all of the boys here
are "Copy And Paste Babies" who know very little about these
modules they insist all use.

Makes no difference to me which methods others use. However
it does annoy me to see so much use of cargo cult in this
group and so many adamantly suggesting use of cargo cult.

This makes no difference to me because I know well enough
to write good code or to learn how a module works before
every considering using that module.

I never use CGI.pm because I am familiar with what it does,
good or bad, with the latter being prevalent.

Purl Gurl
--
Amazing Perl Scripts!
http://www.purlgurl.net/~callgirl/android.html

Gunnar Hjalmarsson

unread,

Nov 14, 2003, 4:13:58 PM11/14/03

to

Purl Gurl wrote:
> Gunnar Hjalmarsson wrote:

>> CGI.pm does not by default limit the amount of data that can be
>> read from STDIN, which is something that I believe some people
>> aren't aware of. Is that what you are referring to?
>
> Yes, precisely.
>
> Brenner's method, which is labeled as "cargo cult" by participants
> here, does this automatically.
>
> Over the years, I have read literally thousands of examples of
> scripts using the CGI.pm module, very few, perhaps a dozen, make
> use of MAX POST.
>
> Although I have not read thousands of examples using CGI.pm in this
> group, those I have read, only a small handful use MAX POST syntax.

I have the same impression.

Eric J. Roode

unread,

Nov 14, 2003, 6:26:02 PM11/14/03

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Purl Gurl <purl...@purlgurl.net> wrote in news:3FB51026.BD0A5716
@purlgurl.net:

> Purl Gurl wrote:
>
>> Gunnar Hjalmarsson wrote:
>> > A. Sinan Unur wrote:
>
> (snipped)
>
>> Use of Stein's CGI.pm module is amongst your worst possible
>> programming choices.
>
> Looking through this inane thread, I will add usage of Stein's
> module automatically sets you up for a denial of service attack.
>
> A person complained about denial of service attack using so called
> "cargo cult" code, yet Stein's module contains this major security
> blunder, quite automatically.

Stein's module also contains an easy way to avoid the security hole, and
the documentation contains a discussion of the security issues. Not so
for the code that I originally complained about.

Is this "security hole" your only complaint with CGI.pm?

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP7VkwWPeouIeTNHoEQJOSQCfYQhx0Z/gGhmw/xdavzkWtrbcuI8An0Ns
Fwt88I6RmSxq4gl7d/io7rLd
=ctF0
-----END PGP SIGNATURE-----

James Willmore

unread,

Nov 14, 2003, 6:50:21 PM11/14/03

to

THIS is top posting. Please don't do this.

On Fri, 14 Nov 2003 06:04:34 GMT
"bbxrider" <bxtr...@comcast.net> wrote:
> what is 'top-post' ???
> actually don't understand how the eric roode and first sinan unur
> posts were not subordinated to the post immediately above them,
> i simply use reply-to-group and it always subordinates to the post
> i'm responding to

Please read the posting guidelines for this group
http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
Cinemuck, n.: The combination of popcorn, soda, and melted
chocolate which covers the floors of movie theaters. -- Rich
Hall, "Sniglets"

Tad McClellan

unread,

Nov 14, 2003, 8:17:20 PM11/14/03

to

bbxrider <bxtr...@comcast.net> wrote:

> what is 'top-post' ???

http://www.catb.org/~esr/jargon/html/T/top-post.html

--
Tad McClellan SGML consulting
ta...@augustmail.com Perl programming
Fort Worth, Texas

Tintin

unread,

Nov 14, 2003, 11:06:14 PM11/14/03

to

"Gunnar Hjalmarsson" <nor...@gunnar.cc> wrote in message
news:bp2opd$1jtuap$1...@ID-184292.news.uni-berlin.de...

>
> To me, a piece of code that does what it's _intended_ to do is not
> "buggy". It may have _limitations_, but limitations and bugs are not
> the same thing.
>
> If I want my program to print today's date in ISO 8601 format, I may
> use this code:
>
> my $time = time;
> sub myDate {
> my @t = (gmtime $time)[3..5];
> sprintf '%d-%02d-%02d', $t[2] += 1900, ++$t[1], $t[0];
> }
> print myDate();
>
> I could have used your Time::Format module instead, but if I don't
> need a variety of date and time formats in my program, I wouldn't
> likely have done so.
>
> Time::Format includes some nice tools for time formating, no doubt.
> Nevertheless, that fact wouldn't make you claim that my myDate()
> function is "buggy", right?

Your analogy is not a good one. An ISO8601 date format has very rigid
parameters, whereas CGI data is by its very nature, variable.

Gunnar Hjalmarsson

unread,

Nov 14, 2003, 11:55:32 PM11/14/03

to

Tintin wrote:

> Gunnar Hjalmarsson wrote:
>> To me, a piece of code that does what it's _intended_ to do is
>> not "buggy". It may have _limitations_, but limitations and bugs
>> are not the same thing.
>>
>> If I want my program to print today's date in ISO 8601 format, I
>> may use this code:
>>
>> my $time = time;
>> sub myDate {
>> my @t = (gmtime $time)[3..5];
>> sprintf '%d-%02d-%02d', $t[2] += 1900, ++$t[1], $t[0];
>> }
>> print myDate();
>>
>> I could have used your Time::Format module instead, but if I
>> don't need a variety of date and time formats in my program, I
>> wouldn't likely have done so.
>>
>> Time::Format includes some nice tools for time formating, no
>> doubt. Nevertheless, that fact wouldn't make you claim that my
>> myDate() function is "buggy", right?
>
> Your analogy is not a good one. An ISO8601 date format has very
> rigid parameters, whereas CGI data is by its very nature, variable.

True, but all potential variations are not applicable in all programs
that parse CGI data. For instance, if you want that a program only
parses POSTed data, it's not buggy because it isn't prepared to handle
potential variations in data submitted via GET. Limited? Yes.
Unflexible? Yes. Buggy? No.

The only point with my example was to illustrate that distinction.
Call a spade a spade! :)

Tintin

unread,

Nov 15, 2003, 1:37:39 AM11/15/03

to

"Gunnar Hjalmarsson" <nor...@gunnar.cc> wrote in message

news:bp4c2u$1k3umr$1...@ID-184292.news.uni-berlin.de...

I agree with you about not calling code buggy to a certain degree. I
suppose you could argue that various Microsoft products that don't conform
to standards are limited and not buggy because they are deliberately
designed that way, however, the typical newbie or person that writes
"limited" CGI parsing code, generally does not write it deliberately with
limitations. In most cases, I think it is fair to say they are writing code
which they think works for all occasions.

Eric J. Roode

unread,

Nov 15, 2003, 6:34:32 AM11/15/03

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gunnar Hjalmarsson <nor...@gunnar.cc> wrote in news:bp4c2u$1k3umr$1@ID-
184292.news.uni-berlin.de:

> True, but all potential variations are not applicable in all programs
> that parse CGI data. For instance, if you want that a program only
> parses POSTed data, it's not buggy because it isn't prepared to handle
> potential variations in data submitted via GET. Limited? Yes.
> Unflexible? Yes. Buggy? No.

I just can't believe that anyone would advocate writing one's own limited
CGI parsing code from scratch, against using the robust, flexible CGI.pm
off the shelf.

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP7YPeGPeouIeTNHoEQL4WwCcDCElVH4KVgAhcWhfYDH5SIAquzUAoJWF
JAjzO/Q+EBQWtA9mhvGYZslH
=dmtq
-----END PGP SIGNATURE-----

Gunnar Hjalmarsson

unread,

Nov 15, 2003, 9:10:33 AM11/15/03

to

Eric J. Roode wrote:
> Gunnar Hjalmarsson wrote:

>> ... all potential variations are not applicable in all programs

>> that parse CGI data. For instance, if you want that a program
>> only parses POSTed data, it's not buggy because it isn't prepared
>> to handle potential variations in data submitted via GET.
>> Limited? Yes. Unflexible? Yes. Buggy? No.
>
> I just can't believe that anyone would advocate writing one's own
> limited CGI parsing code from scratch, against using the robust,
> flexible CGI.pm off the shelf.

One situation where doing so makes sense is when efficiency matters.

I have a program, where I believe it would be indefensible to have it
load CGI.pm. Maybe that's why I'm so sensible about this. :)

Gunnar Hjalmarsson

unread,

Nov 15, 2003, 9:10:27 AM11/15/03

to

Tintin wrote:
> I agree with you about not calling code buggy to a certain degree.

> ... however, the typical newbie or person that writes "limited" CGI

> parsing code, generally does not write it deliberately with
> limitations. In most cases, I think it is fair to say they are
> writing code which they think works for all occasions.

Probably true. In those cases they have probably copied and tweaked
code that they don't fully understand. *That* is what's blameworthy,
not necessarily the code in itself.

Randal L. Schwartz

unread,

Nov 15, 2003, 10:32:06 AM11/15/03

to

>>>>> "Gunnar" == Gunnar Hjalmarsson <nor...@gunnar.cc> writes:

Gunnar> One situation where doing so makes sense is when efficiency matters.

Gunnar> I have a program, where I believe it would be indefensible to have it
Gunnar> load CGI.pm. Maybe that's why I'm so sensible about this. :)

You *do* realize that CGI.pm uses a "compile as you go" mechanism?
Very little of the file is loaded unless you specifically ask for it.
Do not be confused by its sheer size.

I'd bet it'd be hard to get something that is even *twice* as efficient
that has all the security provisions and knowledge accumulated over
the years in CGI.pm.

Please show me your code that is more than twice as efficient as CGI.pm,
and yet as still as secure.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<mer...@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

Alan J. Flavell

unread,

Nov 15, 2003, 10:46:24 AM11/15/03

to

On Sat, 15 Nov 2003, Gunnar Hjalmarsson wrote:

> I have a program, where I believe it would be indefensible to have it
> load CGI.pm.

Then it's probably indefensible to run it from the traditional CGI in
the first place: you should be looking to run it from mod_perl or
other persistent environment, where the overhead of loading CGI.pm is
no longer of any relevance since it's not being done per-invocation
any more.

> Maybe that's why I'm so sensible about this. :)

"sensitive", maybe. "sensible"? - I'd have to reserve judgment until
I saw the full implications, including the security review and some
sensible assessment of the implications for long-term maintainability.

But since I probably couldn't afford the effort to do that security
review and maintainability assessment, I'd probably go with CGI.pm
anyway. I fear this is going to stir up the trolls again, but they're
fairly well plonked, so I'm just going to have my say and then leave
it at that.

cheers

Purl Gurl

unread,

Nov 15, 2003, 12:15:41 PM11/15/03

to

Randal L. Schwartz wrote:

> Gunnar Hjalmarsson wrote:

(snipped)

> > One situation where doing so makes sense is when efficiency matters.

> > I have a program, where I believe it would be indefensible to have it

> > load CGI.pm. Maybe that's why I'm so sensible about this. :)

> I'd bet it'd be hard to get something that is even *twice* as efficient
> that has all the security provisions and knowledge accumulated over
> the years in CGI.pm.

Custom read and parse routines are easy to write and not only offer
better security than Stein's module but also offer security features
which Stein's module does not offer and most likely will never offer.

A custom written read and parse routine can be easily tailored to
very specific needs, easily tailored to provide facilities which
are not available under Stein's module, and provide easy access
for modifications, additions and fine tuning.

All decent custom read and parse routines will be a minimum of
eight-hundred percent more efficient than CGI.pm and in almost
all cases, compared to typical usage of CGI.pm, will be an
average of one-thousand-three-hundred percent more efficient.
In some cases, efficieny improvement over CGI.pm can reach
two-thousand percent, although moderately uncommon.

Those average figures have been exemplified many times over
the years, in this newsgroup, exemplified with methods which
can be replicated by anyone, independently.

Purl Gurl

unread,

Nov 15, 2003, 1:29:09 PM11/15/03

to

Purl Gurl wrote:

> Randal L. Schwartz wrote:
> > Gunnar Hjalmarsson wrote:

(snipped)

> Custom read and parse routines are easy to write and not only offer
> better security than Stein's module but also offer security features
> which Stein's module does not offer and most likely will never offer.

Here is a very simple example of a security measure which
minimizes the impact of a denial of service attack or just
a one time lame brain event.

You cannot do this with Stein's module, not easily, certainly.

For this example, expected maximum input data length is
ten-thousand with an overhead of two-hundred-fifty-six
to prevent false positives.

An .htaccess file is initialized this way:

Order Allow,Deny
Allow from all
Deny from 0.0.0.0

Clearly this .htaccess file can be directory specific
or server wide, with a simple path inclusion.

if ($ENV{'CONTENT_LENGTH'} > 10256)
{
open (DENY, ">>.htaccess") || die; # (your favorite error/email/notification handling);
print DENY "Deny from ", substr ($ENV{REMOTE_ADDR}, 0, rindex ($ENV{REMOTE_ADDR}, ".")), "\n";
close (DENY);
exit;
}
else
{
read and parse routine...

With Stein's module, a denial of service attack or a simple one time
event, is automatically allowed. This is inexcusable based on these
claims of "time-tested" inclusions in the CGI.pm module. Mule manure.

A custom read and parse routine, as exemplified, can minimize these
types of attacks, with the first instance indicator. Additionally,
this method can also "deny" based on frequency interval of access;
too fast, too many, denial regardless of content length.

Stein's module not only cannot do this, his module allows attacks
automatically, even when MAX POST is used. These types of attacks
are still successful because the data must be read before his
MAX POST syntax will kick in. His MAX POST accomplishes nothing
and actually adds strength to an attack by invoking his _huge_
module, over and over; a self-attacking syndrome.

pfffttt...

Purl Gurl
--
Home Performed And Recorded Classic Rock Midis
http://www.purlgurl.net/~callgirl/rockmusi.html

Gunnar Hjalmarsson

unread,

Nov 15, 2003, 1:36:40 PM11/15/03

to

Randal L. Schwartz wrote:

> Gunnar Hjalmarsson writes:
>> One situation where doing so makes sense is when efficiency matters.
>>

>> I have a program, where I believe it would be indefensible to have it

>> load CGI.pm. Maybe that's why I'm so sensible about this. :)
>
> You *do* realize that CGI.pm uses a "compile as you go" mechanism?

Yep. Actually it was you who called my attention to it a few months
ago. :)

> I'd bet it'd be hard to get something that is even *twice* as efficient
> that has all the security provisions and knowledge accumulated over
> the years in CGI.pm.
>
> Please show me your code that is more than twice as efficient as CGI.pm,
> and yet as still as secure.

I don't claim it to be as secure as CGI.pm, but I believe that the
security of the program *as a whole* is sufficient. (Neither do I
claim it to serve as a general purpose code for parsing CGI data, of
course.)

This is the code I'm currently using *in that particular program*:

if ($ENV{'REQUEST_METHOD'} eq 'POST') {
read (STDIN, $rlmain::data, $ENV{'CONTENT_LENGTH'});
} else {
$rlmain::data = $ENV{'QUERY_STRING'};
}
$rlmain::data =~ tr/+/ /;
for (split /[&;]/, $rlmain::data) {
my ($name, $value) = split /=/;
$name = 'ringid' if lc $name eq 'ringid';
$name = 'siteid' if lc $name eq 'siteid';
$name = 'offset' if lc $name eq 'offset';
$value =~ s/%(..)/pack("c",hex($1))/ge;
$value =~ tr/\r//d; # Windows fix
$rlmain::data{$name} = $value;
}

Comments:

- I should probably have it check the size of STDIN and whether the
read() statement is successful.

- The program does not acknowledge any field names that don't match
/^\w+$/, so I don't unescape the names.

- The program does not contain any multi-value fields.

- The program is run in taint mode.

This is the web site for the program: http://www.ringlink.org/

Gunnar Hjalmarsson

unread,

Nov 15, 2003, 1:36:43 PM11/15/03

to

Alan J. Flavell wrote:
> Gunnar Hjalmarsson wrote:
>> I have a program, where I believe it would be indefensible to
>> have it load CGI.pm.
>
> Then it's probably indefensible to run it from the traditional CGI
> in the first place:

Some may claim it is. (For some reason that comment wasn't unexpected.
;-) )

> you should be looking to run it from mod_perl or other persistent
> environment, where the overhead of loading CGI.pm is no longer of
> any relevance since it's not being done per-invocation any more.

I have already done that, so the program is prepared to be (and is
actually in a few cases) run under mod_perl. However, there are
hundreds or 1,000+ users, and most of them don't have access to
mod_perl...

>> Maybe that's why I'm so sensible about this. :)
>
> "sensitive", maybe.

Hmm.. Yes, of course. It wasn't my intention to claim that I'm
sensible, even if *I* think I am. :)

> "sensible"? - I'd have to reserve judgment until I saw the full
> implications, including the security review and some sensible
> assessment of the implications for long-term maintainability.

Even if I provided a link in my reply to Randal, I ask you to please
not do that, Alan, at least not yet...

I started to write that program more than three years ago, and at that
time my programming experience basically consisted of having modified
a couple of Matt's Scripts. :) One thing that bothers me is all those
global scalar variables, so I'm sure you wouldn't find the program
easily maintained. Sooner or later I'll do a redesign, but I'll wait
until I have learned the basics of OOP.

Purl Gurl

unread,

Nov 15, 2003, 2:13:33 PM11/15/03

to

Gunnar Hjalmarsson wrote:

> Randal L. Schwartz wrote:
> > Gunnar Hjalmarsson writes:

(snipped)

> > Please show me your code that is more than twice as efficient as CGI.pm,
> > and yet as still as secure.

> I don't claim it to be as secure as CGI.pm, but I believe that the
> security of the program *as a whole* is sufficient. (Neither do I
> claim it to serve as a general purpose code for parsing CGI data, of
> course.)

> This is the code I'm currently using *in that particular program*:

> if ($ENV{'REQUEST_METHOD'} eq 'POST') {
> read (STDIN, $rlmain::data, $ENV{'CONTENT_LENGTH'});
> } else {
> $rlmain::data = $ENV{'QUERY_STRING'};

> $value =~ tr/\r//d; # Windows fix

A quick comment on this line above. All browsers submit

\r\n

when ENTER is pressed and the cursor is inside a text area box.
This is not specific to Windows, under those conditions.

Your method is a good example of designing a read and parse
routine to meet specific needs, to meet a specific form action.

This method you display will be a minimum eight-hundred percent
more efficient than CGI.pm and will be an average, a very large
average, one-thousand-three-hundred percent more efficient.
That is one-hundred-thirty times faster, not just twice as fast.

Adding security features to your method, is so very easy.
A quick example for html enabled form input,

$value =~ s/`/`/g;

This changes a backtick system command to an html entity.
Any number of needs can be attained with simple substitutions
or transliteration as you use.

Point is designing code to meet a specific task, is the most
efficient and most adapatable method available.

Stein's module simply tries to do too much; it is bloatware.

Because of his module being bloatware, it is exceptionally
easy for Stein to include errors and bugs along with overlooking
major security flaws. There is simply too much code, some
six-thousand lines plus, for a programmer to keep track of
what is happening. This is well exemplified by his long
and frequent history of revisions.

You do not need Godzilla to squash a fly.

Purl Gurl
--
Corvette Mako Sharks! 56 Chevy Napco 4X4!
http://www.purlgurl.net/~godzilla/

Gunnar Hjalmarsson

unread,

Nov 15, 2003, 3:07:53 PM11/15/03

to

Thanks for your comments!

Purl Gurl wrote:
> Gunnar Hjalmarsson wrote:
>>

>> $value =~ tr/\r//d; # Windows fix
>
> A quick comment on this line above. All browsers submit
>
> \r\n
>
> when ENTER is pressed and the cursor is inside a text area box.
> This is not specific to Windows, under those conditions.

Blank lines were added on Windows, unlike Unix/Linux, when submitting
multiple-line entries via textarea fields, also when I wasn't dealing
with data that had been read from a file. I haven't digged into it
very deep, but the above line does make a difference.

> Adding security features to your method, is so very easy. A quick
> example for html enabled form input,
>
> $value =~ s/`/`/g;

As regards that aspect of security, maybe I should have added that
data submitted by users who are not logged-in, and which ends up on
generated HTML pages, is converted by this sub:

sub htmlize {
$_[0] =~ s/&/&/g;
$_[0] =~ s/"/"/g;
$_[0] =~ s/</</g;
$_[0] =~ s/>/>/g;
return $_[0];
}

That conversion is done at a later stage, since (non-converted) HTML
is occationally included in email messages.

Wouldn't that take care of the risk with backticks as well?

Purl Gurl

unread,

Nov 15, 2003, 4:30:57 PM11/15/03

to

Gunnar Hjalmarsson wrote:

> Purl Gurl wrote:
> > Gunnar Hjalmarsson wrote:

(snipped)

> As regards that aspect of security, maybe I should have added that
> data submitted by users who are not logged-in, and which ends up on
> generated HTML pages, is converted by this sub:

> sub htmlize {
> $_[0] =~ s/&/&/g;
> $_[0] =~ s/"/"/g;
> $_[0] =~ s/</</g;
> $_[0] =~ s/>/>/g;
> return $_[0];
> }

> That conversion is done at a later stage, since (non-converted) HTML
> is occationally included in email messages.

> Wouldn't that take care of the risk with backticks as well?

As you know, the degree of risk of input data is directly
related to "what" a program does. If a program does not call
any functions susceptable to backtick syntax, no problem.

A classic example, a simple example, is a hit counter. Those
scripts exhibit very little security risk, and usually do not
work very well to boot!

Contrasting this, our Chahta Chat is susceptable to hostile
html tags. Nonetheless, we want our visitors to be able to
enjoy fancy fonts, colors, pictures and all that.

For html, like you, processing outside a read and parse takes
care of this.

my (@bad_word_list) = ("<applet", "<blockquote", "<body", "<dl", "<form",
"<head", "<html", "<ol", "<object", "<plaintext",
"<script", "<strike", "<xmp", "<ul", "<h1", "<h2",
"<h3", "<h4", "<h5", "<h6", "|/", "<embed",
"face=symbol", "face=system", "strnps", ... others

Simple matter of looping and looking. Some html tags will result
in a visitor being automatically banished. Others, a stern warning.

Before someone jumps on this, leading spaces are processed out
before looping takes place; < applet becomes <applet then processed.
Don't even bother trying. You cannot mess up our chat.

We just had a visitor drop in earlier, to test if he would be
banished for hitting a cgi script too fast, after I posted
an article about this. There are those here whose only intent
is to cause harm to others. Obviously I have been alerted to
what he was doing. Obviously our androids may elect to banish
him in the future. Jeeeshh... such stupidity. I cannot fathom
sitting in front of your computer clicking "Play Blackjack"
as fast as you can. He should be reseaching NASDAQ penny
stocks and working on becoming wealthy, or very poor.

Almost all security sites suggest allowing only that data you want,
otherwords disallow everything and make exceptions for safe data.

However, another approach is to disallow just enough data characters
to defeat many combinations. Disallowing a backtick does defeat a
very large amount of hacks. Most of the hack may appear, but without
a backtick, it won't work. Disallowing a right slash / prevents
directory changes as with ../ syntax. You really have to know
your programming and hacking to do this successfully.

What is important is to design your processing for a specific task
rather than use a broad sweeping brush to catch everything. Doing
that is very inefficient.

Your use of &lt and &gt is a good example. Doesn't matter what is
inside html tags because they won't work because of your parsing.
You have avoided having to parse out some html and allow other.

This is a serious weakness with Perl 5 modules. Most are useful,
many are well written. Nonetheless, a large majority of modules
incorporate too many features rendering them useless. Those are
modules which try to "second guess" how they will be used, and
literally always fail, eventually.

Back to Stein's module. If he was a bit better at planning,

read_parse.pm
form_action.pm
html.pm
nph.pm
some_other.pm

He could reduce the size of his module to one-fourth with
effective planning and even smaller with very good planning.
This allows more features per module, a reasonable amount,
without a need a create a monster which often causes more
problems than are resolved; bull in a china shop.

A philosophy often exhibited here is all programs should
be able to handle all circumstances. It is this thinking
which leads to worthless code and worthless modules. This
is a type of thinking, "conform code to all circumstances"
rather than controlling circumstances then coding accordingly,
which is what your code exhibits, to your benefit.

There is no substitute for good planning and good custom
written code nor is there any substitute for using your
limited lifetime, effectively; time cannot be purchased.

http://pennystocks.org/s2_main.htm

Purl Gurl
--
Purl Gurl Net, Delivering Rock N Roll And Fun
At Two Megabits Per Second
http://www.purlgurl.net/

Eric J. Roode

unread,

Nov 15, 2003, 6:08:24 PM11/15/03

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gunnar Hjalmarsson <nor...@gunnar.cc> wrote in news:bp5cl7$1l1qb6$1@ID-
184292.news.uni-berlin.de:

> Eric J. Roode wrote:
>> I just can't believe that anyone would advocate writing one's own
>> limited CGI parsing code from scratch, against using the robust,
>> flexible CGI.pm off the shelf.
>
> One situation where doing so makes sense is when efficiency matters.
>
> I have a program, where I believe it would be indefensible to have it
> load CGI.pm. Maybe that's why I'm so sensible about this. :)

If I recall correctly, CGI.pm has only about 200 lines of code that gets
compiled when the module is first loaded. If the time it takes to
compile those 200 lines makes a difference in the execution of your
program, then I suspect Perl/CGI is the wrong technology to be using.
:-) I'd suggest mod_perl, FastCGI, or maybe even writing the CGI input
parsing code in C and loading it via XS.

What sort of timing did you use to determine that CGI.pm was slowing you
down? I keep hearing that CGI.pm is slow and inefficient, but I have
never seen any numbers to back it up, and in my (admittedly anecdotal)
experience, I haven't seen a problem with it.

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP7ayEmPeouIeTNHoEQJBJwCgr5zedNlpZf1OdzUPrl3kGEmb0+MAoNvP
+M/xpcGcBuUXU0lupBZ1755w
=oAwu
-----END PGP SIGNATURE-----

Gunnar Hjalmarsson

unread,

Nov 15, 2003, 6:00:29 PM11/15/03

to

Purl Gurl wrote:
> Gunnar Hjalmarsson wrote:
>> maybe I should have added that data submitted by users who are
>> not logged-in, and which ends up on generated HTML pages, is
>> converted by this sub:
>>
>> sub htmlize {
>> $_[0] =~ s/&/&/g;
>> $_[0] =~ s/"/"/g;
>> $_[0] =~ s/</</g;
>> $_[0] =~ s/>/>/g;
>> return $_[0];
>> }
>>
>> That conversion is done at a later stage, since (non-converted)
>> HTML is occationally included in email messages.
>>
>> Wouldn't that take care of the risk with backticks as well?
>
> As you know, the degree of risk of input data is directly related
> to "what" a program does. If a program does not call any functions
> susceptable to backtick syntax, no problem.

<snip>

> Contrasting this, our Chahta Chat is susceptable to hostile html
> tags. Nonetheless, we want our visitors to be able to enjoy fancy
> fonts, colors, pictures and all that.
>
> For html, like you, processing outside a read and parse takes care
> of this.
>
> my (@bad_word_list) = ("<applet", "<blockquote", "<body", "<dl", "<form",
> "<head", "<html", "<ol", "<object", "<plaintext",
> "<script", "<strike", "<xmp", "<ul", "<h1", "<h2",
> "<h3", "<h4", "<h5", "<h6", "|/", "<embed",
> "face=symbol", "face=system", "strnps", ... others

Now you are talking about a desire to allow users to modify the
program generated HTML, which reminds me about this ciwac thread:

http://groups.google.se/groups?th=eeb2ba0a37e50722

Even if this is an important security matter as regards CGI scripts, I
suppose it's off topic for this group. Nevertheless, it's worth
noticing that it needs to be handled outside the initial CGI parsing
routine, whether that is done by help of CGI.pm or not.

Eric J. Roode

unread,

Nov 15, 2003, 6:16:50 PM11/15/03

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Purl Gurl <purl...@purlgurl.net> wrote in
news:3FB67ADD...@purlgurl.net:

> This method you display will be a minimum eight-hundred percent
> more efficient than CGI.pm and will be an average, a very large
> average, one-thousand-three-hundred percent more efficient.
> That is one-hundred-thirty times faster, not just twice as fast.

How do you calculate that?

(by the way, 1300% is 13 times faster, not 130 times).

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP7a0DWPeouIeTNHoEQLpPQCgk4YMkSBVXTPlF4bpzsWAl07tbPAAnimH
0ogqfySe3i7T5fJq56mzHGAW
=h3pt
-----END PGP SIGNATURE-----

Gunnar Hjalmarsson

unread,

Nov 15, 2003, 6:33:15 PM11/15/03

to

Eric J. Roode wrote:
> Gunnar Hjalmarsson wrote:

>> Eric J. Roode wrote:
>>> I just can't believe that anyone would advocate writing one's
>>> own limited CGI parsing code from scratch, against using the
>>> robust, flexible CGI.pm off the shelf.
>>
>> One situation where doing so makes sense is when efficiency
>> matters.
>>
>> I have a program, where I believe it would be indefensible to
>> have it load CGI.pm.
>

> If I recall correctly, CGI.pm has only about 200 lines of code that
> gets compiled when the module is first loaded. If the time it
> takes to compile those 200 lines makes a difference in the
> execution of your program, then I suspect Perl/CGI is the wrong
> technology to be using. :-) I'd suggest mod_perl, FastCGI, or
> maybe even writing the CGI input parsing code in C and loading it
> via XS.

Please see my reply to Alan about that.

> What sort of timing did you use to determine that CGI.pm was
> slowing you down?

None.

It's just that the program by its very nature may be used in such a
way that repeated calls put quite some load on the server. For that
reason I'm trying to avoid unnecessary load, and since I have already
"reinvented the wheel", keeping to not using CGI.pm is an easy
contribution to that goal.

Eric J. Roode

unread,

Nov 15, 2003, 7:07:53 PM11/15/03

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Gunnar Hjalmarsson <nor...@gunnar.cc> wrote in news:bp6dm6$1kg9c9$1@ID-
184292.news.uni-berlin.de:

>> What sort of timing did you use to determine that CGI.pm was
>> slowing you down?
>
> None.
>
> It's just that the program by its very nature may be used in such a
> way that repeated calls put quite some load on the server. For that
> reason I'm trying to avoid unnecessary load, and since I have already
> "reinvented the wheel", keeping to not using CGI.pm is an easy
> contribution to that goal.

Wait, let me get this straight -- you have no idea whether CGI.pm is faster
or slower than your own code, yet you choose to stick to your own code in
the belief that it contributes to your goal of avoiding unnecessary load?

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP7bAAmPeouIeTNHoEQLMKQCaApd4QCCvLJVFyXPuLm/beJRuSJAAoLzo
0dKs9LXp4Ld7eY0KGVncX2+K
=LKAc
-----END PGP SIGNATURE-----

Purl Gurl

unread,

Nov 15, 2003, 7:13:50 PM11/15/03

to

Gunnar Hjalmarsson wrote:

> Purl Gurl wrote:
> > Gunnar Hjalmarsson wrote:

(snipped)

> > Contrasting this, our Chahta Chat is susceptable to hostile html
> > tags. Nonetheless, we want our visitors to be able to enjoy fancy
> > fonts, colors, pictures and all that.

> > For html, like you, processing outside a read and parse takes care
> > of this.

> > my (@bad_word_list) = ("<applet", "<blockquote", "<body", "<dl", "<form",

> Now you are talking about a desire to allow users to modify the

> program generated HTML, which reminds me about this ciwac thread:

> http://groups.google.se/groups?th=eeb2ba0a37e50722

Yes and no. A point you made about Perl programming is "context"
of usage. Most here jump on a chance to troll when some code is
posted which they deliberately label cargo cult. This activity
is rather humerous and highlights diminished thinking.

Often sample code is out of context, it is simply a short clip.

My example reminds you of an interesting thread in another group.
This example I provide is one of those "no context short clips."
Yes, we allow visitors to use markup language, which modifies
the appearance of their unique "post" but not our chat overall.

What you cannot derive from a short clip like mine, like yours,
is what other code is used and not displayed. Most here seem
to ignore this or simply don't know to consider it. For this
reason, context, I limit my responses to be severely restricted
to the context of an article, nothing more. I stay within given
parameters, anything else is to guess.

Our chat is operated by our two androids. They, Roberta and Robby,
check to be sure html is formatted within a strict set of rules.
Most html markup requires no formal formatting at all with our
androids taking care of this. Using a font color, for example,
a visitor simply types in a color, such as "blue" or "red."
All html markup is performed by our androids. They also check
for tag closure; no open tags without a matching closing tag.
Use of URL links, a visitor only types in the actual URL,
Roberta and Robby format it before posting. Almost all "things"
are handled this way. This provides easy posting for visitors,
which they like, and allows excellent control of html tags.

Additionally, "new" visitors are subject to very restrictive
rules until they have established themselves as "trusted"
regular visitors.

You can only do these types of "things" with custom code.
Success with those "things" is also highly dependent upon
personal experience. Critical factor, though, is what
your code does and does not do. Security is then scaled
to your risk factor generated by your code.

Harping on Stein's module again, you cannot, literally cannot
do any of those magical "things" with his module.

So, when others refer to you as foolish for not using CGI.pm
remind yourself those people are unimaginative poorly skilled
programmers promulgating cargo cult, which is precisely what
Stein's module has become; hard core cargo cult.

Purl Gurl
--
Roberta The Remarkable Robot
http://www.purlgurl.net/~callgirl/roberta/roberta.cgi
Roberta's Operator's Manual
http://www.purlgurl.net/~callgirl/roberta/help.html

Purl Gurl

unread,

Nov 15, 2003, 7:19:32 PM11/15/03

to

Gunnar Hjalmarsson wrote:

> Eric J. Roode wrote:
> > Gunnar Hjalmarsson wrote:
> >> Eric J. Roode wrote:

> > What sort of timing did you use to determine that CGI.pm was
> > slowing you down?

> None.

Below are some highly typical timing results. You can
replicate these, independently. Those results date back
to when I used "Godzilla" as my moniker, a moniker now
taken over by and more fitting for my main squeeze.

Purl Gurl
--
Size Does Matter
http://www.purlgurl.net/~godzilla/

#!perl

print "Content-type: text/plain\n\n";

use Benchmark;

print "Run One:\n\n";
&Time;

print "\n\nRun Two:\n\n";
&Time;

print "\n\nRun Three:\n\n";
&Time;

sub Time
{
timethese (100000,
{
'Godzilla' =>
'$ENV{QUERY_STRING} = "north=north&south=south&east=east&west=west";
$buffer = $ENV{QUERY_STRING};
@Key_Value = split(/&/, $buffer);
for (@Key_Value)
{
($key, $value) = split(/=/, $_);
$key =~ s/%(..)/pack ("c",hex($1))/ge;
$value =~ s/%(..)/pack ("c",hex($1))/ge;
$value =~ tr/+/ /;
$FORM{$key} = $value;
}
$north = "$FORM{north}";
$south = "$FORM{south}";
$east = "$FORM{east}";
$west = "$FORM{west}";',

'Stein' =>
'$ENV{QUERY_STRING} = "north=north&south=south&east=east&west=west";
use CGI;
my $query = new CGI;
$north2 = $query->param(north);
$south2 = $query->param(east);
$east2 = $query->param(south);
$west2 = $query->param(west);',
} );
}

PRINTED RESULTS:
________________

Run One:

Benchmark: timing 100000 iterations of Godzilla, Stein...
Godzilla: 9 wallclock secs ( 9.18 usr + 0.00 sys = 9.18 CPU) @ 10893.25/s
Stein: 126 wallclock secs (125.39 usr + 0.00 sys = 125.39 CPU) @ 797.51/s

Run Two:

Benchmark: timing 100000 iterations of Godzilla, Stein...
Godzilla: 9 wallclock secs ( 9.44 usr + 0.00 sys = 9.44 CPU) @ 10593.22/s
Stein: 124 wallclock secs (125.51 usr + 0.00 sys = 125.51 CPU) @ 796.75/s

Run Three:

Benchmark: timing 100000 iterations of Godzilla, Stein...
Godzilla: 10 wallclock secs ( 9.39 usr + 0.00 sys = 9.39 CPU) @ 10649.63/s
Stein: 126 wallclock secs (125.89 usr + 0.00 sys = 125.89 CPU) @ 794.34/s

Gunnar Hjalmarsson

unread,

Nov 15, 2003, 7:22:15 PM11/15/03

to

Eric J. Roode wrote:
> Gunnar Hjalmarsson wrote:
>> Eric J. Roode wrote:

>>> What sort of timing did you use to determine that CGI.pm was
>>> slowing you down?
>>
>> None.
>>
>> It's just that the program by its very nature may be used in such
>> a way that repeated calls put quite some load on the server. For
>> that reason I'm trying to avoid unnecessary load, and since I
>> have already "reinvented the wheel", keeping to not using CGI.pm
>> is an easy contribution to that goal.
>
> Wait, let me get this straight -- you have no idea whether CGI.pm
> is faster or slower than your own code, yet you choose to stick to
> your own code in the belief that it contributes to your goal of
> avoiding unnecessary load?

Yes, I have an idea. I'm sure that CGI.pm is slower. Thought the code
I posted made that apparent to anybody who has an idea of what CGI.pm
is about.

However, I can't tell *how much* slower since I haven't measured it.

Why are you making such a fuss about that?

Ben Morrow

unread,

Nov 15, 2003, 9:13:37 PM11/15/03

to

Gunnar Hjalmarsson <nor...@gunnar.cc> wrote:
> Yes, I have an idea. I'm sure that CGI.pm is slower. Thought the code
> I posted made that apparent to anybody who has an idea of what CGI.pm
> is about.

By no means. In general, guessing that a particular piece of code will
run slower or faster than another is a dodgy business. The *only* way
to tell is to run benchmarks.

Premature optimisation is the root of all evil, &c...

Ben

--
"The Earth is degenerating these days. Bribery and corruption abound.
Children no longer mind their parents, every man wants to write a book,
and it is evident that the end of the world is fast approaching."
-Assyrian stone tablet, c.2800 BC b...@morrow.me.uk

Gunnar Hjalmarsson

unread,

Nov 15, 2003, 9:43:29 PM11/15/03

to

Ben Morrow wrote:

> Gunnar Hjalmarsson wrote:
>> Yes, I have an idea. I'm sure that CGI.pm is slower. Thought the
>> code I posted made that apparent to anybody who has an idea of
>> what CGI.pm is about.
>
> By no means.

Please, Ben, how about reading the code before making such a comment?

> In general, guessing that a particular piece of code will run
> slower or faster than another is a dodgy business. The *only* way
> to tell is to run benchmarks.

Yes, in general.

Alan J. Flavell

unread,

Nov 16, 2003, 3:57:47 PM11/16/03

to

On Sat, 15 Nov 2003, Gunnar Hjalmarsson wrote:

> Alan J. Flavell wrote:
> > Then it's probably indefensible to run it from the traditional CGI
> > in the first place:
>
> Some may claim it is. (For some reason that comment wasn't unexpected.
> ;-) )

Well, it isn't just me, is it? If you already know what answers
you're going to get, why do you raise the question, without addressing
the points that you know are going to be made?

It wasn't me who challenged you to produce the benchmarks, but it
could just as well have been. The first, second, and third rules of
optimisation are "don't optimise yet", you know.

> > you should be looking to run it from mod_perl or other persistent
> > environment, where the overhead of loading CGI.pm is no longer of
> > any relevance since it's not being done per-invocation any more.
>
> I have already done that, so the program is prepared to be (and is
> actually in a few cases) run under mod_perl. However, there are
> hundreds or 1,000+ users, and most of them don't have access to
> mod_perl...

Then it would seem that general robustness and resilience are of more
importance to you (and your users) than saving those last few cycles
of CPU. Furthermore, when a new version of CGI.pm came out, with some
new browser weakness to workaround, or some new obscure security
loophole discovered, they could get the benefits in short order,
instead of waiting for you to diagnose and fix the implications for
homespun code.

(Of course, you could develop a dual-mode version, that takes
advantage of mod_perl when available, and works as a regular CGI when
it isn't. I gather than CGI.pm makes it rather easy to do that...)

> > "sensible"? - I'd have to reserve judgment until I saw the full
> > implications, including the security review and some sensible
> > assessment of the implications for long-term maintainability.
>
> Even if I provided a link in my reply to Randal, I ask you to please
> not do that, Alan, at least not yet...
>
> I started to write that program more than three years ago, and at that
> time my programming experience basically consisted of having modified
> a couple of Matt's Scripts. :)

That's OK, we all have to start somewhere. But if I was still coding
the same Perl4-style scripts that I started Perl with in around 1994
or so, then I'd need my head examined.

And given your commendable honesty about your existing code, I really
am rather surprised that you maintain that your choice of homespun
code must be the right one for your particular situation. In the end,
you *might* sometimes turn out to be right, but I'd want to see that
proved by more than just energetic handwaving, if you'll excuse me.

all the best

Purl Gurl

unread,

Nov 16, 2003, 5:32:39 PM11/16/03

to

Alan J. Flavell wrote:

> Gunnar Hjalmarsson wrote:
> > Alan J. Flavell wrote:

(snipped)

> (Of course, you could develop a dual-mode version, that takes
> advantage of mod_perl when available, and works as a regular CGI when
> it isn't. I gather than CGI.pm makes it rather easy to do that...)

You have failed to advise a reader the most often used versions
of CGI.pm will not run under mod_perl. You have also failed to
mention myriad problems running later CGI.pm under mod_perl.

Troll FUD.

http://www.google.com/search?q=cgi.pm+mod_perl+problem&hl=en&lr=&ie=ISO-8859-1

Gunnar Hjalmarsson

unread,

Nov 16, 2003, 11:04:21 PM11/16/03

to

Alan J. Flavell wrote:
> It wasn't me who challenged you to produce the benchmarks, but it
> could just as well have been. The first, second, and third rules
> of optimisation are "don't optimise yet", you know.

Okay, I made a benchmark. My starting-point was Purl Gurl's benchmark:
http://groups.google.com/groups?selm=3FB6C294.17E745DE%40purlgurl.net

Since I wanted to include also the compilation phase in the
comparison, I rewrote it. Basically I put my 'limited' code in a
separate file, required (not 'used') both that file and CGI.pm, and
reset %INC before doing so.

This is one typical result:

Rate CGI.pm myCGI
CGI.pm 5.48/s -- -97%
myCGI 178/s 3144% --

If I didn't make some stupid mistake, the comparison shows that the
compilation+execution time for parsing a simple query string with 4
name/value pairs is about 30 times longer when you use CGI.pm compared
to my code. CGI.pm needs about 0.2 seconds!

You find the code I used for the benchmark at the bottom of this message.

> Gunnar Hjalmarsson wrote:
>> ... the program is prepared to be (and is actually in a few

>> cases) run under mod_perl. However, there are hundreds or 1,000+
>> users, and most of them don't have access to mod_perl...
>
> Then it would seem that general robustness and resilience are of
> more importance to you (and your users) than saving those last few
> cycles of CPU.

Sorry, but I fail to see how that conclusion relates to what I said.

You assume that my code is not "robust" without explaining why. Robust
code is good, no doubt, and I believe that the robustness of those few
lines for CGI parsing is sufficient.

You also disregard my view that CPU may be a critical issue for
certain users.

> (Of course, you could develop a dual-mode version, that takes
> advantage of mod_perl when available, and works as a regular CGI
> when it isn't. I gather than CGI.pm makes it rather easy to do
> that...)

My program is already dual-mode, but I don't see how CGI.pm would have
made that easier. On the contrary, certain versions of CGI.pm don't
work with certain mod_perl versions:
http://groups.google.com/groups?selm=bcksc9%24k3n6f%241%40ID-184292.news.dfncis.de

> And given your commendable honesty about your existing code, I
> really am rather surprised that you maintain that your choice of
> homespun code must be the right one for your particular situation.
> In the end, you *might* sometimes turn out to be right, but I'd
> want to see that proved by more than just energetic handwaving, if
> you'll excuse me.

Surprised? That is my view, and I think I have presented reasons for
it far beyond "energetic handwaving".

This is the benchmark code:

#---------------------- cgispeed.pl ----------------------#
#!/usr/bin/perl
use strict;
use Benchmark 'cmpthese';
my (%tmpINC, $north, $south, $east, $west);
our %in;
BEGIN { %tmpINC = %INC }
$ENV{QUERY_STRING} = 'north=north&south=south&east=east&west=west';
print "Content-type: text/html\n\n<pre>";
cmpthese( -5, {
myCGI => sub {
%INC = %tmpINC;
require 'myCGI';
($north, $south, $east, $west) =
@in{'north', 'south', 'east', 'west'};
},
'CGI.pm' => sub {
%INC = %tmpINC;
require CGI;

my $query = new CGI;

$north = $query->param('north');
$south = $query->param('south');
$east = $query->param('east');
$west = $query->param('west');
}
} );

#------------------------- myCGI -------------------------#
use strict;
my $buffer;

if ($ENV{REQUEST_METHOD} eq 'POST') {

my $len = $ENV{CONTENT_LENGTH};
$len <= 131072 or die "Too much data submitted.\n";
read(STDIN, $buffer, $len) == $len
or die "Reading of posted data failed.\n";
} else {
$buffer = $ENV{QUERY_STRING};
}
$buffer =~ tr/+/ /;
for (split /[&;]/, $buffer) {
my ($name, $value) = split /=/, $_, 2;
$value =~ s/%(..)/pack('c', hex $1)/ge;

$value =~ tr/\r//d; # Windows fix

$main::in{$name} = $value;
}
1;

Alan J. Flavell

unread,

Nov 17, 2003, 3:33:43 PM11/17/03

to

On Mon, 17 Nov 2003, Gunnar Hjalmarsson wrote:

> If I didn't make some stupid mistake, the comparison shows that the
> compilation+execution time for parsing a simple query string with 4
> name/value pairs is about 30 times longer when you use CGI.pm compared
> to my code. CGI.pm needs about 0.2 seconds!

And how long does a typical do-nothing browser HTTP transaction and
CGI invocation need in comparison?

> You also disregard my view that CPU may be a critical issue for
> certain users.

Sorry, I really don't "disregard" it. I'm saying the need is to
review the overall process, including server invocation from the
client and the subsequent CGI process creation, which I'm afraid your
benchmarks don't do.

In fact with some rough benchmarking of the overall process (using
LWP::Simple to run the tests against a local webserver), it seemed to
me as if our (otherwise lightly-loaded) server could run about 14
invocations per second of wallclock with your economy-model script,
compared with some 7 per second with CGI.pm, so - a factor of around 2
(wallclock) overall, compared with your measurement of 30 (cpu) for
some portion of the process. On Windows, I even got a factor
approaching 3 between them for the overall process. (Server in both
cases was a version of Apache 1.3.*, on linux and on Win2000
respectively).

While I must admit the factor is somewhat larger than I had expected,
this does rather put your measurement of a factor of 30 into a rather
more realistic context, I feel.

> My program is already dual-mode, but I don't see how CGI.pm would have
> made that easier. On the contrary, certain versions of CGI.pm don't
> work with certain mod_perl versions:

I'm sorry if you felt it had been improper of me not to mention
versioning issues, but it seems to me that adopting use of mod_perl
would inevitably call for a review of the version compatibility of any
related Perl modules that will be used, and CGI.pm would be no
exception there. mod_perl doesn't lack documentation about such
matters.

But if one is genuinely serious about saving CPU cycles, then such an
approach would seem to me to be indispensible. You can see, by the
comparison between your numbers and mine, just what a large proportion
of the execution of CGI is not accounted for by the CGI script itself.

> Surprised? That is my view, and I think I have presented reasons for
> it far beyond "energetic handwaving".
>
> This is the benchmark code:

As I say, that focuses on one part of the invocation of a Perl script
in CGI context, namely the Perl code itself. But that's only a part
of the overall process, as I think we have seen.

Nevertheless, I will concede that any rule can have exceptions. What
I usually say about CGI.pm is that those who have genuinely got the
expertise to *not* use CGI.pm will know why they are doing that, and
will need no advice from me. On the other hand anyone who's in a
position to seek advice is going to get my best advice (and you know
what that's going to be, in the overwhelming proportion of cases).

I'm clearly aware that CGI.pm is in no way magical - the code doesn't
do anything that one couldn't just as well code for oneself. And the
author admits that it's grown too big, and might benefit from being
modularised. I've found the odd bug in it myself on occasion. So
this is not the uncritical adulation that some trolls accuse us of.
Nevertheless, it's overall the best thing available for doing CGI in
Perl, because the author is actively working on it and is actively
adapting it to the changing situation, to encapsulate the gathered
knowledge of browser bugs, workarounds etc.

A proportion of extra CPU cycles isn't usually too high a price to pay
for that. And as we've seen - if it _is_ too high a price to pay,
then the most productive place to make real savings is elsewhere.

Can we call a truce on this, then?

all the best

Gunnar Hjalmarsson

unread,

Nov 17, 2003, 8:42:02 PM11/17/03

to

Alan J. Flavell wrote:
> Gunnar Hjalmarsson wrote:
>> If I didn't make some stupid mistake, the comparison shows that
>> the compilation+execution time for parsing a simple query string
>> with 4 name/value pairs is about 30 times longer when you use
>> CGI.pm compared to my code. CGI.pm needs about 0.2 seconds!
>
> And how long does a typical do-nothing browser HTTP transaction and
> CGI invocation need in comparison?

<snip>

> the need is to review the overall process, including server
> invocation from the client and the subsequent CGI process creation,
> which I'm afraid your benchmarks don't do.

What the need is depends on what you are actually trying to measure.
The conclusion I make out from my benchmark is that the *absolute*
time it takes to parse a query string is significant if you use
CGI.pm, while it's negligible if you use my code. Whether the factor
is 20, 30 or 50 is something I pay little regard to since, as you
point out, I did not measure the whole process.

My program supports a certain kind of web application, and is
typically used on web sites that are hosted on shared servers.
Sometimes it's used in a way that results in thousands of calls per day.

Now, if you have a busy web site on a shared hosting account, there is
always a limit where the hosting provider says: "This is too much, our
other cusomers are affected adversely." That's why I'm anxious to
watch the server load, and to me, 0.2 seconds appears to be
significant if there are thousands of daily calls.

mod_perl is of course suitable in order to further reduce the server
load. It's just that it's very unusual that mod_perl is availabe on
shared web hosting accounts. Of course, you can always say that the
program should have been written in PHP instead. However, that's not
the case.

> In fact with some rough benchmarking of the overall process (using
> LWP::Simple to run the tests against a local webserver), it seemed
> to me as if our (otherwise lightly-loaded) server could run about
> 14 invocations per second of wallclock with your economy-model
> script, compared with some 7 per second with CGI.pm, so - a factor
> of around 2 (wallclock) overall, compared with your measurement of
> 30 (cpu) for some portion of the process. On Windows, I even got a
> factor approaching 3 between them for the overall process. (Server
> in both cases was a version of Apache 1.3.*, on linux and on
> Win2000 respectively).
>
> While I must admit the factor is somewhat larger than I had
> expected, this does rather put your measurement of a factor of 30
> into a rather more realistic context, I feel.

As regard "realistic", see above.

It surprises me that your server would allow 7 invocations per second
with CGI.pm when you run the whole process, while I found that it
would allow 5 times when only the Perl part is taken into account.
Maybe the server you used is significantly faster. Btw, are you sure
that you captured the compilation time?

Anyway, this is interesting additional info. Thanks, Alan! I suppose
it indicates that, provided that the factor is 2, I would double the
server load by starting to use CGI.pm. The difference appears to be
significant also when you look at it from this angle.

These benchmarks demonstrate that the design of CGI.pm is surprisingly
'expensive'.

> I will concede that any rule can have exceptions. What I usually
> say about CGI.pm is that those who have genuinely got the expertise
> to *not* use CGI.pm will know why they are doing that, and will
> need no advice from me. On the other hand anyone who's in a
> position to seek advice is going to get my best advice (and you
> know what that's going to be, in the overwhelming proportion of
> cases).
>
> I'm clearly aware that CGI.pm is in no way magical - the code
> doesn't do anything that one couldn't just as well code for
> oneself. And the author admits that it's grown too big, and might
> benefit from being modularised. I've found the odd bug in it
> myself on occasion. So this is not the uncritical adulation that
> some trolls accuse us of. Nevertheless, it's overall the best thing
> available for doing CGI in Perl, because the author is actively
> working on it and is actively adapting it to the changing
> situation, to encapsulate the gathered knowledge of browser bugs,
> workarounds etc.
>
> A proportion of extra CPU cycles isn't usually too high a price to
> pay for that. And as we've seen - if it _is_ too high a price to
> pay, then the most productive place to make real savings is
> elsewhere.
>
> Can we call a truce on this, then?

I hear what you say. :) And it makes much sense.

Let me try to summarize my view on it out from a different angle:

Good advice is a good thing, and using Perl modules is a convenient
way to reuse code. Personally I use several modules, but when I'm able
to do something with just a couple of lines of Perl code, I sometimes
do so instead of loading hundreds of lines of code by using a module.
I don't feel that I risk getting bashed for doing so, and nobody
demands that I *prove* that my choices are right.

That is, there is one exception: The 'sacred cow' CGI.pm. Even if you
say that "the code doesn't do anything that one couldn't just as well
code for oneself" and "it's grown too big", your reasoning above
presupposes that you are able to explicitly justify your decision if
you choose to not use CGI.pm for parsing CGI data. That makes little
sense to me. The presumption that people don't know what they are
doing if they don't use CGI.pm is patronizing.

If the explanation is the security implications with CGI, I'd like to
see the focus moved to the desirable that you

- *learn* about the implied risks with CGI scripts,

- don't use code copied from random sources if you don't understand
how it works,

- carefully consider the risks with your own applications, and
validate the data accordingly, and

- enable taint mode.

I feel that these things, which I take for granted that we can agree
upon, tend to be forgotten in the 'campaign' for using CGI.pm.