PHP RegExp - Newbie

1 view
Skip to first unread message

ProBowlUK

unread,
May 25, 2009, 10:34:14 PM5/25/09
to Regex
1) Need to check that a number consists of all digits and that the
sum of the first two digits is greater than 5

2) Validate that a person has entered their name with a capital at
the start of their surname and firstname but that the name has only a
single letter for any middle names.
( eg: "Robin J Sharp", ‘"Robin Sharp", "Robin J B Sharp" )

3) Convert the case of the letters in someone’s name so that however
it is entered (as a single string) their first name and surname both
have a single initial upper-case character.
( eg: "ROBin sharP" is converted to "Robin Sharp" )



Can someone please explain how these patterns would be constructed ?

Eugeny Sattler

unread,
May 28, 2009, 4:25:09 PM5/28/09
to re...@googlegroups.com
On May 26, 7:34 am, ProBowlUK <bobsh...@ntlworld.com> wrote: > 1)  Need to check that a number consists of all digits and that the > sum of the first two digits is greater than 5 although it is possible to do it via regexp, it would be far more elegant to check this condition using simple math (still you can use regexp to fetch the first and the second digit... then you sum them up and check the result against 5 using programming language operands...) Regex works with digits as with text... That is why regex solution will be lengthy... See for yourself. When the first digit is zero, the second one can be either 6 or 7 or 8 or 9. The corresponding regex is 0[6-9] When the first digit is 1, the second one can be either 5, or 6 or 7 or 8 or 9. The corresponding regex is 1[5-9] etc...etc.. We unite this regexps using alternation and here is what we get: (0[6-9]|1[5-9]|2[4-9]|3[3-9]|4[2-9]|5[1-9]|[6-9][0-9]) This regex corresponds to this matrix of digits (each column of this matrix corresponds to one branch of alternation, except the last branch [6-9][0-9] which is one for the last 4 columns together:

            60 70 80 90
          51 61 71 81 91
        42 52 62 72 82 92
      33 43 53 63 73 83 93
    24 34 44 54 64 74 84 94
  15 25 35 45 55 65 75 85 95
06 16 26 36 46 56 66 76 86 96
07 17 27 37 47 57 67 77 87 97
08 18 28 38 48 58 68 78 88 98
09 19 29 39 49 59 69 79 89 99

digits can mutually change places, i.e. both 06 and 60 are acceptable as per your condition. Our regex is good for all such cases.
> 2)  Validate that a person has entered their name with a capital at > the start of their surname and firstname but that the name has only a > single letter for any middle names. > ( eg:  "Robin J Sharp", ‘"Robin Sharp", "Robin J B Sharp" )

In PHP, PREG and EREG functions are case sensitive, PREGI and EREGI functions are case insensitive.You know what to use.

I woulf rephrase the task this way: 
  • incoming string must contain words consisting of only letters i.e. no digits or special symbols 
  • at least 2 words (i.e. name and surname is what everybody has) 
  • each word must have the first letter capitalized...
  • words in the middle (not the first and not the last ones) must be one cap letter only.
  • starting and final words can not be one letter long (i have added this condition myself but it seems logical to me)
  • there are hyphenated names, need to pass them in... (examples i found among files on my PC: Joby-Rome Otero, Jean-Pierre Cardin, Ryh-Ming Poon, On-Line Sales.... :) :) - done by adding (?:-[A-Z][a-z]+)? after firstname part of the regex. Final question mark makes it optional. ?: prohibits it from being stored into temporary variables, thus improving performance.

So I suggest such a regex:
\A\x20*[A-Z][a-z]+(?:-[A-Z][a-z]+)?\x20+(?:[A-Z]\x20+){0,4}[A-Z][a-z]+\x20*\Z
Some comments
(?:[A-Z]\x20+){0,4}
is responsible for middle names represented by single caps. I assume here a human being can not have more than 4 middle names. Spanish guys will argue, I know.

\A is start of input string
\Z is end of input string
\x20 is for space
...as for your 3rd question, I hope to send addition shortly...

--
Eugeny Sattler

Eugeny Sattler

unread,
May 29, 2009, 6:46:35 AM5/29/09
to re...@googlegroups.com


On Tue, May 26, 2009 at 7:34 AM, ProBowlUK <bobs...@ntlworld.com> wrote:
>
> 1)  Need to
....

> 3)  Convert the case of the letters in someone’s name so that however
> it is entered (as a single string) their first name and surname both
> have a single initial upper-case character.
> ( eg:  "ROBin sharP"  is converted to  "Robin Sharp" )

After your input string passed the check I described in point2, you can work further with it, to apply appropriate case....

In case insensitive mode, a word in text can be found this way \b[a-z]+\b

where \b is a boundary between a letter and a space/start of string/end of string ...Or in reverse order - a boundary between a space/start of string/end of string and a letter.

You need the first letter to be stored in one variable and the rest of the word in another variable, for further case changes.

So you use \b([a-z])([a-z]*)\b

And these parts will be stored into variables thanks to enclosing parts of the regex into round brackets. The first pair of round brackets will save into $1 variable, and the second pair of round brackets will save into $2 variable.

input word      $1           $2

Robin              R           obin
J                    J            NULL STRING
shaRP            s            haRP  

Having $1 and $2 variables content at your hand you just applu PHP case conversion functions, concatenate back and you are ready.

--
Eugeny Sattler

P.S. non-regex solution is here

BobSharp

unread,
Jun 2, 2009, 8:53:07 PM6/2/09
to re...@googlegroups.com, eugeny....@gmail.com
(a)  check that a number consists of all digits and that if the
      first two digits are added together the total is greater than 5
 
 
 
Have been offered this regexp solution ... 
 
   return  (preg_match('/^(d){2}(d)*$/',) && ([0] + [1]) > 5)
 
 
 
How would it be scripted into a PHP function ?
 
 
 
 


I am using the Free version of SPAMfighter.
We are a community of 6 million users fighting spam.
SPAMfighter has removed 13342 of my spam emails to date.
The Professional version does not have this message.
Reply all
Reply to author
Forward
0 new messages