# replace decimals with 999999999 in order to check for non-numerical
# data, then switch it back (this is a lazyman's shortcut)
$in{hours} =~ s/\./999999999/;
if ($in{hours} =~ /\D/) {
push @missing,'Hours contains non-numerical data';
$missing = 1;
}
$in{hours}=~s/999999999/\./;
And I realized I've been too lazy with this for too long. So my first
thought was to post here with "how do I test for non-numerical in decimal
number data?" but, of course, that violates the "How to ask intelligent
questions in CPLM", so I googled it.
Came up with a bunch of non-relevant results until I saw a similar
question and the answer was "Your answer is in perldoc perldata"
So I did exactly that. Of course, the answer *is* right there:
warn "not a decimal number" unless /^-?\d+\.?\d*$/;
Which got me thinking - I take waaaay too many long-trips when there is
stuff in the regex language that would make my life easier, so I realized
that I need to understand the above, rather than just copy/paste into
code.
So, here's my understanding of
/^-?\d+\.?\d*$/;
/ start the search pattern
^ match the beginning of the line
-? match a minus sign once or not at all
\d+ match a digit character zero or more times
\.? match a decimal once or not at all
\d match a digit character
*$ all to the end of the line
/ end the search pattern
Erm... I'm not so sure it's all stuck in my head.
I *think* this means
"Warn {text} unless the input *only* matches minus signs, digit
characters and decimals, from the beginning to the end of the string"
Is that about right ?
I tried the above with the following in test.pl just to try to reinforce
it in my mind:
$foo = '3.5'; # no warning
$foo = '5'; # no warning
$foo = '-1'; # no warning
$foo = '4x4'; # warning - non-digit
$foo = '--3.2' # warning - more than one minus
$foo = '3.5.5' # warning - more than one decimal
Now that section of my code is:
if ($in{hours} !~ /^-?\d+\.?\d*$/) {
push @missing,'Hours contains non-numerical data';
$missing = 1;
}
Comments and general pointing and laughing welcome. :)
--
Marc Bissonnette
Looking for a new ISP? http://www.canadianisp.com
Largest ISP comparison site across Canada.
MB> So, here's my understanding of
MB> /^-?\d+\.?\d*$/;
MB> / start the search pattern
MB> ^ match the beginning of the line
beginning of string in this case.
MB> -? match a minus sign once or not at all
MB> \d+ match a digit character zero or more times
one or more times. you must have digits before the decimal point.
MB> \.? match a decimal once or not at all
MB> \d match a digit character
MB> *$ all to the end of the line
that is \d* which is zero or more digits. then comes $ which is end of
the string (or before an ending newline.
MB> / end the search pattern
MB> I *think* this means
MB> "Warn {text} unless the input *only* matches minus signs, digit
MB> characters and decimals, from the beginning to the end of the string"
well it has its flaws. it matches most numbers but what about just
fractional numbers like .9? it fails there since it requires digits
before any decimal point. also it doesn't allow a leading + sign.
look at Regexp::Common on cpan. i am sure it has a number validation
regex in there. it is trickier than your example here as it allows all
number formats (including exponents).
MB> Now that section of my code is:
MB> if ($in{hours} !~ /^-?\d+\.?\d*$/) {
MB> push @missing,'Hours contains non-numerical data';
MB> $missing = 1;
MB> }
the $missing = 1 is a red flag as boolean flags are poor coding IMO. you
have @missing which supposedly contains error strings so just check that
if it isn't empty.
MB> Comments and general pointing and laughing welcome. :)
MUAHAHAHAHAHAHHAH!!!
uri
--
Uri Guttman ------ u...@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
>>>>>> "MB" == Marc Bissonnette <dragnet\_@_/internalysis.com> writes:
>
> MB> So, here's my understanding of
>
> MB> /^-?\d+\.?\d*$/;
>
> MB> / start the search pattern
> MB> ^ match the beginning of the line
>
> beginning of string in this case.
>
> MB> -? match a minus sign once or not at all
> MB> \d+ match a digit character zero or more times
>
> one or more times. you must have digits before the decimal point.
> MB> \.? match a decimal once or not at all
> MB> \d match a digit character
> MB> *$ all to the end of the line
>
> that is \d* which is zero or more digits. then comes $ which is end of
> the string (or before an ending newline.
>
> MB> / end the search pattern
> MB> I *think* this means
> MB> "Warn {text} unless the input *only* matches minus signs, digit
> MB> characters and decimals, from the beginning to the end of the
> string"
>
> well it has its flaws. it matches most numbers but what about just
> fractional numbers like .9? it fails there since it requires digits
> before any decimal point. also it doesn't allow a leading + sign.
I noticed that when I was goofing around with it locally;
In readin your points and thinking about it (now that I understand it a
bit better), this seems to work (leading pluses or minuses, as well as
leading decimals, such as .5
/^\+?-?\d?\.?\d*$/
> look at Regexp::Common on cpan. i am sure it has a number validation
> regex in there. it is trickier than your example here as it allows all
> number formats (including exponents).
I'll definitely take a peek in there when the need arises - For now, I'm
helping a friend out by trying to automate some of their purchase orders,
packing slips and job accounting summaries, so I think the numerical data
will suffice being limited to positives and negatives :)
>
> MB> Now that section of my code is:
>
> MB> if ($in{hours} !~ /^-?\d+\.?\d*$/) {
> MB> push @missing,'Hours contains non-numerical data';
> MB> $missing = 1;
> MB> }
>
> the $missing = 1 is a red flag as boolean flags are poor coding IMO.
> you have @missing which supposedly contains error strings so just
> check that if it isn't empty.
Ya know, that thought honestly popped in my mind as soon as I hit send :)
It's an ingrained bad habit I'll have to work myself out of.
> MB> Comments and general pointing and laughing welcome. :)
>
> MUAHAHAHAHAHAHHAH!!!
:)
Better a MUAHAHAHAHHAHA than a RTFM :)
> Uri Guttman <u...@stemsystems.com> fell face-first on the keyboard.
> This was the result: news:x7y75xs...@mail.sysarch.com:
(snip)
>> well it has its flaws. it matches most numbers but what about just
>> fractional numbers like .9? it fails there since it requires digits
>> before any decimal point. also it doesn't allow a leading + sign.
>
> I noticed that when I was goofing around with it locally;
> In readin your points and thinking about it (now that I understand it
> a bit better), this seems to work (leading pluses or minuses, as well
> as leading decimals, such as .5
>
> /^\+?-?\d?\.?\d*$/
Apologies for following up my own post: I just realized the above has a
flaw: It matches on or zero beginning digits (.4 or 0.4) but not two digits
or more (22.4)
This works better:
/^\+?-?(\d?|\d+)\.?\d*$/
MB> I noticed that when I was goofing around with it locally;
MB> In readin your points and thinking about it (now that I understand it a
MB> bit better), this seems to work (leading pluses or minuses, as well as
MB> leading decimals, such as .5
MB> /^\+?-?\d?\.?\d*$/
that allows +-3
use [] to allow one char from a set:
/^[+-]?\d?\.?\d*$/
that allows either a single leading + or - but not both.
MB> Marc Bissonnette <dragnet\_@_/internalysis.com> fell face-first on the
MB> keyboard. This was the result:
MB> news:Xns9AA9EF2C4CC97dr...@216.196.97.131:
>> Uri Guttman <u...@stemsystems.com> fell face-first on the keyboard.
>> This was the result: news:x7y75xs...@mail.sysarch.com:
>> /^\+?-?\d?\.?\d*$/
MB> Apologies for following up my own post: I just realized the above has a
MB> flaw: It matches on or zero beginning digits (.4 or 0.4) but not two digits
MB> or more (22.4)
MB> This works better:
MB> /^\+?-?(\d?|\d+)\.?\d*$/
the middle part is silly. it matches 0 or 1 digit OR one or more
digits. that is the same as 0 or more digits which is \d* all by itself.
but you can't use \d*\.?\d* as that will match the empty string (as will
your regex above). everything in yours is optional. look at this:
perl -ne 'print "yes\n" if /^\+?-?(\d?|\d+)\.?\d*$/'
yes
+
yes
+-
yes
note that the blank line was input. as i said, matching decimal numbers
is not trivial. use regexp::common as it has solved that problem.
Many thanks for the pointers and solution - I will indeed look into
regexp::common
At the very least, it's been instructional/educational - As I mentioned,
I've got a few bad habits and long-ways-around-the-bush to correct :)
> # replace decimals with 999999999 in order to check for non-numerical #
> data, then switch it back (this is a lazyman's shortcut)
>
> $in{hours} =~ s/\./999999999/;
> if ($in{hours} =~ /\D/) {
> push @missing,'Hours contains non-numerical data'; $missing =
1;
> }
> $in{hours}=~s/999999999/\./;
If your original is something like
3579.2468
then after the second substitution you'll have
357.92468
You should use a string like ABCDEFG for this, where nothing is repeated.
> > note that the blank line was input. as i said, matching decimal
> > numbers is not trivial. use regexp::common as it has solved that
> > problem.
> Many thanks for the pointers and solution - I will indeed look into
> regexp::common
It's actually Regexp::Common. The casing makes a difference if
installing from the CPAN command-line, as well as when 'use'ing the
module. I would thing Uri would know that.
--
G.Etly
> -? match a minus sign once or not at all
Match the optional minus sign is more descriptive IMHO.
> I *think* this means
> "Warn {text} unless the input *only* matches minus signs, digit
> characters and decimals, from the beginning to the end of the string"
I don't think verbalization attempts like this are useful. You already
did a pretty good job dissecting the regex and that's good. But the
statement above is a little too loose.
> I tried the above with the following in test.pl just to try to
> reinforce it in my mind:
>
> $foo = '3.5'; # no warning
> $foo = '5'; # no warning
> $foo = '-1'; # no warning
> $foo = '4x4'; # warning - non-digit
> $foo = '--3.2' # warning - more than one minus
> $foo = '3.5.5' # warning - more than one decimal
How about $foo = '+3.1'? Do you want that to be considered numeric?
> Now that section of my code is:
>
> if ($in{hours} !~ /^-?\d+\.?\d*$/) {
How about $foo = '-3.'? Do you want that to be a number or a typo?
Sinan
--
A. Sinan Unur <1u...@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)
comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/
Yep, I always do the search from the CPAN command line before the
installation. While I know there are all lower-case modules out there, I
don't think I've ever installed one (odd point, but...)
> Marc Bissonnette <dragnet\_@_/internalysis.com> wrote in
> news:Xns9AA9E172266E9dr...@216.196.97.131:
>
>> -? match a minus sign once or not at all
>
> Match the optional minus sign is more descriptive IMHO.
>
>
>> I *think* this means
>> "Warn {text} unless the input *only* matches minus signs, digit
>> characters and decimals, from the beginning to the end of the string"
>
> I don't think verbalization attempts like this are useful. You already
> did a pretty good job dissecting the regex and that's good. But the
> statement above is a little too loose.
>
>> I tried the above with the following in test.pl just to try to
>> reinforce it in my mind:
>>
>> $foo = '3.5'; # no warning
>> $foo = '5'; # no warning
>> $foo = '-1'; # no warning
>> $foo = '4x4'; # warning - non-digit
>> $foo = '--3.2' # warning - more than one minus
>> $foo = '3.5.5' # warning - more than one decimal
>
> How about $foo = '+3.1'? Do you want that to be considered numeric?
>
>> Now that section of my code is:
>>
>> if ($in{hours} !~ /^-?\d+\.?\d*$/) {
>
> How about $foo = '-3.'? Do you want that to be a number or a typo?
You're right - I think I caught all of those in my latest interation, but
as Uri points out, it passes a blank string, which isn't a good thing, so
Regexp::Common looks like the smartest thing for the best coverage.
Modules with names in all-lowercase are pragmata: they affect the way
perl parses your program. The only ones that are usually used are the
ones that come with perl: strict, warnings, open, encoding, &c. There
are some pragmata on CPAN, but they are generally rather advanced
modules that interact with the perl core in subtle and complicated ways.
Ben
--
It will be seen that the Erwhonians are a meek and long-suffering people,
easily led by the nose, and quick to offer up common sense at the shrine of
logic, when a philosopher convinces them that their institutions are not based
on the strictest morality. [Samuel Butler, paraphrased] b...@morrow.me.uk
When you choose to do this, it is responsible to figure out what
is to be done when the data already contains the "flag" value.
And also what's to be done when your manipulations create a
"flag" string when there was not one in the original.
> in order to check for non-numerical
> # data, then switch it back (this is a lazyman's shortcut)
[snip the-bad-kind-of-lazy code]
> And I realized I've been too lazy with this for too long. So my first
> thought was to post here with "how do I test for non-numerical in decimal
> number data?" but, of course, that violates the "How to ask intelligent
> questions in CPLM", so I googled it.
>
> Came up with a bunch of non-relevant results
Your Question is Asked Frequently but perhaps you missed the
right search term to uncover it:
perldoc -q number
How do I determine whether a scalar is a number/whole/integer/float?
> until I saw a similar
> question and the answer was "Your answer is in perldoc perldata"
>
> So I did exactly that. Of course, the answer *is* right there:
>
> warn "not a decimal number" unless /^-?\d+\.?\d*$/;
Note that this is an application of the common Perl idiom for
validating data.
ie:
anchor the beginning
anchor the end
write a pattern in between that accounts for everything you want to allow
> Which got me thinking - I take waaaay too many long-trips when there is
> stuff in the regex language that would make my life easier, so I realized
> that I need to understand the above, rather than just copy/paste into
> code.
>
> So, here's my understanding of
>
> /^-?\d+\.?\d*$/;
>
> / start the search pattern
> ^ match the beginning of the line
> -? match a minus sign once or not at all
> \d+ match a digit character zero or more times
> \.? match a decimal once or not at all
> \d match a digit character
> *$ all to the end of the line
The [*+?] quantifiers all apply to "the previous thing", so
it is a syntax error if there is no "previous thing".
The asterisk "goes with" the digit character class, not with the anchor.
> / end the search pattern
/
^ # beginning of the string (not line)
-? # optional minus sign
\d+ # run of digit characters
[.]? # optional dot character (backslashes annoy me)
\d* # optional run of digit characters
$ # end of the string (not line)
/x # note eXtended regular expression modifier
Look Ma!
Real Perl syntax!
> if ($in{hours} !~ /^-?\d+\.?\d*$/) {
I'd prefer to write that as:
unless ($in{hours} =~ /^-?\d+\.?\d*$/) {
as that puts the "not" out where it is harder to miss...
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
> While I know there are all lower-case modules out there, I
> don't think I've ever installed one (odd point, but...)
Modules named in all-lowercase are pragmas (compiler hints) by convention.
(see "Pragmatic Modules" in perldoc perlmodlib.)
So they (mostly?) are installed along with the perl distribution.
> "A. Sinan Unur" <1u...@llenroc.ude.invalid> fell face-first on the
> keyboard. This was the result:
> news:Xns9AAA6479B6C9...@127.0.0.1:
<snip> for brevity.
> You're right - I think I caught all of those in my latest interation,
> but as Uri points out, it passes a blank string, which isn't a good
> thing, so Regexp::Common looks like the smartest thing for the best
> coverage.
Always recommended.
> /^-?\d+\.?\d*$/
I would write the
"\.?\d*"
part as
"(?:\.[0-9]*)?",
or as
"(?:\.[0-9]+)?"
to force at least one trailing digits if there was a decimal point.
If numerics like "-0" and "-0." and "-.0" are allowed, but not "-." or
".", then I would do
/^-?(?:[0-9]+(?:\.[0-9]*)?|[0-9]*\.[0-9]+)$/
--
Affijn, Ruud
"Gewoon is een tijger."
> Marc Bissonnette <dragnet> wrote:
>> So I was commenting some code I wrote for a friend who's new to perl
>> and I came across the following in my code:
>>
>> # replace decimals with 999999999
>
> When you choose to do this, it is responsible to figure out what
> is to be done when the data already contains the "flag" value.
>
> And also what's to be done when your manipulations create a
> "flag" string when there was not one in the original.
Yeah, Ben pointed that out, too - Glaring error in my "shortcut" that
would have ended up completely invalidating/changing the original data
>> in order to check for non-numerical
>> # data, then switch it back (this is a lazyman's shortcut)
>
> [snip the-bad-kind-of-lazy code]
>
>> And I realized I've been too lazy with this for too long. So my first
>> thought was to post here with "how do I test for non-numerical in
>> decimal number data?" but, of course, that violates the "How to ask
>> intelligent questions in CPLM", so I googled it.
>>
>> Came up with a bunch of non-relevant results
>
> Your Question is Asked Frequently but perhaps you missed the
> right search term to uncover it:
>
> perldoc -q number
>
> How do I determine whether a scalar is a
> number/whole/integer/float?
Indeed I did - I can't for the life of me remember how to bring up a list
of the perldoc subject titles - I'll google it after I get the kids from
school.
Many thanks - I've put this in my favourites.pl which I refer to for
learning and re-use stuff
>> if ($in{hours} !~ /^-?\d+\.?\d*$/) {
>
> I'd prefer to write that as:
>
> unless ($in{hours} =~ /^-?\d+\.?\d*$/) {
>
> as that puts the "not" out where it is harder to miss...
Maybe a silly question, but would that mean that if there are only two
conditions (true|false), use unless and if there are more than two, use
if/elsif as better code ?
I would never use else or elsif with unless: the double negatives are
just too confusing. I would rewrite
unless (X) { foo; }
else { bar; }
as
if (X) { bar; }
else { foo; }
and if there were more branches I would use
if (not X) { foo; }
elsif (Y) { bar; }
else { baz; }
rather than
unless (X) { foo; }
elsif (Y) { bar; }
else { baz; }
Note that there is no 'elsunless' :).
Ben
--
Musica Dei donum optimi, trahit homines, trahit deos. |
Musica truces mollit animos, tristesque mentes erigit. | b...@morrow.me.uk
Musica vel ipsas arbores et horridas movet feras. |
No, I wouldn't say that.
I'd say if there is a single _clause_ you can choose between
saying "if not" or "unless".
If there are two clauses, use an if-else (I never use an unless-else).
Note that using lowercase for "pragmata" is just a convention. Perl itself
doesn't care at all about the casing (as long as it matches), nor does
it know what a "pragma" is. Nor does everyone agree what a pragma is,
for instance, you say a pragma affects the way perl parses your program,
yet you mention warnings and strict as a pragmata. However, "use warnings"
doesn't affect how Perl parses my program, and the only way "use strict"
affects how Perl parses my program is that it will sometimes refuse to
parse my program. OTOH, "use Switch" and a lot of other source filters
do affect how my program is parsed, yet many source filters aren't
lowercased, and usually not called "pragmata".
I tend to call modules that fiddle with $^H or %^H pragmata.
Abigail
--
sub _'_{$_'_=~s/$a/$_/}map{$$_=$Z++}Y,a..z,A..X;*{($_::_=sprintf+q=%X==>"$A$Y".
"$b$r$T$u")=~s~0~O~g;map+_::_,U=>T=>L=>$Z;$_::_}=*_;sub _{print+/.*::(.*)/s};;;
*_'_=*{chr($b*$e)};*__=*{chr(1<<$e)}; # Perl 5.6.0 broke this...
_::_(r(e(k(c(a(H(__(l(r(e(P(__(r(e(h(t(o(n(a(__(t(us(J())))))))))))))))))))))))
Doesn't "care" where? In the 'use' statement, it most certainly does
matter. And if you install every module as lowercase, you're definitely
going to have problems running someone else's code that uses the correct
casing. Saying Perl doesn't care about casing is misleading at best. In
general, casing in Perl matters very much.
--
szr
s> Abigail wrote:
>> Note that using lowercase for "pragmata" is just a convention. Perl
>> itself doesn't care at all about the casing (as long as it matches),
>> nor does
s> Doesn't "care" where? In the 'use' statement, it most certainly does
s> matter. And if you install every module as lowercase, you're definitely
s> going to have problems running someone else's code that uses the correct
s> casing. Saying Perl doesn't care about casing is misleading at best. In
s> general, casing in Perl matters very much.
you didn't read abigail's post correctly. he means that the case of the
name doesn't matter in regards to what kind of module (pragma or not) it
is. the convention (not a syntax or semantic requirement) is pragmas are
named in all lower case and regular modules are in StudlyCaps. this has
nothing to do with the use statement nor about case matching of file
names.
Ok, fair enough, and I am sorry, though I still find the way he wrote
that to be a bit misleading.
> the convention (not a syntax or semantic requirement) is pragmas
> are named in all lower case and regular modules are in StudlyCaps.
> this has nothing to do with the use statement nor about case
> matching of file
Got it.
--
szr
> Marc Bissonnette <dragnet> wrote:
>> Tad J McClellan <ta...@seesig.invalid> fell face-first on the keyboard.
>> This was the result: news:slrng3lpcs...@tadmc30.sbcglobal.net:
>>
>>> Marc Bissonnette <dragnet> wrote:
>
>
>>>> if ($in{hours} !~ /^-?\d+\.?\d*$/) {
>>>
>>> I'd prefer to write that as:
>>>
>>> unless ($in{hours} =~ /^-?\d+\.?\d*$/) {
>>>
>>> as that puts the "not" out where it is harder to miss...
>>
>> Maybe a silly question, but would that mean that if there are only two
>> conditions (true|false), use unless and if there are more than two, use
>> if/elsif as better code ?
>
>
> No, I wouldn't say that.
>
> I'd say if there is a single _clause_ you can choose between
> saying "if not" or "unless".
>
> If there are two clauses, use an if-else (I never use an unless-else).
Ahh, gotcha - thank you!