Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Zip Codes ctype? Pregmatch?

36 views
Skip to first unread message

Twayne

unread,
Aug 20, 2013, 1:27:50 PM8/20/13
to
Hi all,

I'm attempting to check for US and Canadian zip codes (postal codes).
The US is easy; mostly just be sure it's five numerics and except
"00000" and "99999". But Canadian is a different story because:
It consists of alternating alpha and numeric characters (AnAnAn) but
not the entire alphabet. 8 N.A. English letters are not used, as in
DFIOQUW AND Z or put another way, they only use 18 letters in their
postal codes.
I haven't see a single example in all my research to check if the
1st, 3rd, and 5th characters are alpha and th 2nd, 4th and 6th
characters are numeric.

I've tried preg_match and strpos without succees, likely due to my own
weakness with preg_match, and regex creates an incredibly long statement
I'm sure it's not right to put upon the servers; they slow down even my
local server XAMPP & PHP 5.3 on win 7.

Might anyone have a better method?

Or know of any functions anywhere that could be modified to be used?

Martin Leese

unread,
Aug 20, 2013, 1:59:14 PM8/20/13
to
Twayne wrote:
> Hi all,
>
> I'm attempting to check for US and Canadian zip codes (postal codes).
> The US is easy; mostly just be sure it's five numerics and except
> "00000" and "99999". But Canadian is a different story because:
> It consists of alternating alpha and numeric characters (AnAnAn) but
> not the entire alphabet. 8 N.A. English letters are not used, as in
> DFIOQUW AND Z or put another way, they only use 18 letters in their
> postal codes.

Note that postcodes must have a space in
the middle, ie, AnA nAn. In practice,
however, there is so much brain-dead
software in use that believes otherwise, it
would be prudent to make the space optional.

From the Canada Post Addressing Guide:
"Postal codes must be printed in
upper case with the first three
elements separated from the last
three by one space (no hyphens)."

--
Regards,
Martin Leese
E-mail: ple...@see.Web.for.e-mail.INVALID
Web: http://members.tripod.com/martin_leese/

Robert Heller

unread,
Aug 20, 2013, 2:08:18 PM8/20/13
to
At Tue, 20 Aug 2013 13:27:50 -0400 Twayne <nob...@spamcop.net> wrote:

>
> Hi all,
>
> I'm attempting to check for US and Canadian zip codes (postal codes).
> The US is easy; mostly just be sure it's five numerics and except
> "00000" and "99999".

5+4:

nnnnn-mmmm

Most of the time, just the basic 5 digits is enough, but sometimes the USPS
wants the additional 4 digits as well.

Whether or not you include the extra 4 digits or not depends on what you are
using the zip code for. UPS and FexEX for example don't use the extra 4
digits, but the USPS does. The extra four digits are important mostly for big
city addresses, where there might be multiple branch POs and/or delivery
routes, etc. for a given post office.


--
Robert Heller -- 978-544-6933 / hel...@deepsoft.com
Deepwoods Software -- http://www.deepsoft.com/
() ascii ribbon campaign -- against html e-mail
/\ www.asciiribbon.org -- against proprietary attachments



Twayne

unread,
Aug 20, 2013, 3:01:43 PM8/20/13
to
On 2013-08-20 1:59 PM, Martin Leese wrote:
...
>
> From the Canada Post Addressing Guide:
> "Postal codes must be printed in
> upper case with the first three
> elements separated from the last
> three by one space (no hyphens)."
>

OUCH! Thanks! After more research I found a trail about that; forgot all
the details but the space it seems is something special more than just
esthetics.

Thanks much!

Twayne

unread,
Aug 20, 2013, 3:03:48 PM8/20/13
to
On 2013-08-20 2:08 PM, Robert Heller wrote:
> At Tue, 20 Aug 2013 13:27:50 -0400 Twayne <nob...@spamcop.net> wrote:
>
>>
>> Hi all,
>>
>> I'm attempting to check for US and Canadian zip codes (postal codes).
>> The US is easy; mostly just be sure it's five numerics and except
>> "00000" and "99999".
>
> 5+4:
>
> nnnnn-mmmm
>
> Most of the time, just the basic 5 digits is enough, but sometimes the USPS
> wants the additional 4 digits as well.

Agreed; but in this case it's only to ID a country.
>
> Whether or not you include the extra 4 digits or not depends on what you are
> using the zip code for. UPS and FexEX for example don't use the extra 4
> digits, but the USPS does. The extra four digits are important mostly for big
> city addresses, where there might be multiple branch POs and/or delivery
> routes, etc. for a given post office.
>
>
Thanks,

Twayne`

Twayne

unread,
Aug 20, 2013, 3:06:06 PM8/20/13
to
As usual, after futzing with a problem for a couple days and end up
finally finding multiple solutions.

Thanks all,

Twayne`

Robert Heller

unread,
Aug 20, 2013, 3:36:33 PM8/20/13
to
At Tue, 20 Aug 2013 15:03:48 -0400 Twayne <nob...@spamcop.net> wrote:

>
> On 2013-08-20 2:08 PM, Robert Heller wrote:
> > At Tue, 20 Aug 2013 13:27:50 -0400 Twayne <nob...@spamcop.net> wrote:
> >
> >>
> >> Hi all,
> >>
> >> I'm attempting to check for US and Canadian zip codes (postal codes).
> >> The US is easy; mostly just be sure it's five numerics and except
> >> "00000" and "99999".
> >
> > 5+4:
> >
> > nnnnn-mmmm
> >
> > Most of the time, just the basic 5 digits is enough, but sometimes the USPS
> > wants the additional 4 digits as well.
>
> Agreed; but in this case it's only to ID a country.

You might need to 'accept' the extra 4 digits, since people are going to enter
them and will be mifted if your page rejects it. Just quietly drop the extra
digits.

> >
> > Whether or not you include the extra 4 digits or not depends on what you are
> > using the zip code for. UPS and FexEX for example don't use the extra 4
> > digits, but the USPS does. The extra four digits are important mostly for big
> > city addresses, where there might be multiple branch POs and/or delivery
> > routes, etc. for a given post office.
> >
> >
> Thanks,
>
> Twayne`
>

Norman Peelman

unread,
Aug 20, 2013, 7:52:11 PM8/20/13
to
US Zip code:
[0-9]{5}(-{0,1}[0-9]{4}){0,1}

Canadian zip code (all one line, don't miss the space!):
([A-C,E,G-H,J-N,P,R-T,V,X,Y]{1}[0-9]{1})[A-C,E,G-H,J-N,P,R-T,V,X,Y]{1}
{1}([0-9]{1}[A-C,E,G-H,J-N,P,R-T,V,X,Y]{1}[0-9]{1})


--
Norman
Registered Linux user #461062
-Have you been to www.php.net yet?-

Norman Peelman

unread,
Aug 20, 2013, 7:53:36 PM8/20/13
to
I responded even though you SOLVED... at least let us know what your
solution was.

Twayne

unread,
Aug 20, 2013, 8:17:24 PM8/20/13
to
On 2013-08-20 3:36 PM, Robert Heller wrote:
> At Tue, 20 Aug 2013 15:03:48 -0400 Twayne <nob...@spamcop.net> wrote:
>
>>
>> On 2013-08-20 2:08 PM, Robert Heller wrote:
>>> At Tue, 20 Aug 2013 13:27:50 -0400 Twayne <nob...@spamcop.net> wrote:
>>>
>>>>
>>>> Hi all,
>>>>
>>>> I'm attempting to check for US and Canadian zip codes (postal codes).
>>>> The US is easy; mostly just be sure it's five numerics and except
>>>> "00000" and "99999".
>>>
>>> 5+4:
>>>
>>> nnnnn-mmmm
>>>
>>> Most of the time, just the basic 5 digits is enough, but sometimes the USPS
>>> wants the additional 4 digits as well.
>>
>> Agreed; but in this case it's only to ID a country.
>
> You might need to 'accept' the extra 4 digits, since people are going to enter
> them and will be mifted if your page rejects it. Just quietly drop the extra
> digits.
>
>>>
>>> Whether or not you include the extra 4 digits or not depends on what you are
>>> using the zip code for. UPS and FexEX for example don't use the extra 4
>>> digits, but the USPS does. The extra four digits are important mostly for big
>>> city addresses, where there might be multiple branch POs and/or delivery
>>> routes, etc. for a given post office.
>>>
>>>
>> Thanks,
>>
>> Twayne`
>>
>
I hear you, but when I specifically ask for a 5-digit US Zip Code if the
person wants to insist on 9, that's not the kind of person I want to
hear from anyway.
Besides, a 9-digit code is a lot more personal information than is
needed and very few people want to give away more information about
themselves than is necessary. A 9 digit zip code can in many cases lead
you right to a person's street address. With a name and an address, it's
only a short step to their ss# being compromised and then the field is
set for full identity theft.

Regards,

Twayne`

Twayne

unread,
Aug 20, 2013, 8:29:40 PM8/20/13
to
On 2013-08-20 7:53 PM, Norman Peelman wrote:
> On 08/20/2013 03:06 PM, Twayne wrote:

...

>>
>
> I responded even though you SOLVED... at least let us know what your
> solution was.
>

Oh, sorry; guess I was in a hurry. I simply came across a couple of
methods, neither of which was properly coded, and use a combo of the two
methods to assemble mine. If you're interested, here's a crimp sheet I
collected:
--------------
From the Canada Post Addressing Guide:
"Postal codes must be printed in
upper case with the first three
elements separated from the last
three by one space (no hyphens)."

=================================================


POSTAL CODES FOR 12 COUNTRIES

<?php
$country_code="US";
$zip_postal="11111";

$ZIPREG=array(
"US"=>"^\d{5}([\-]?\d{4})?$",
"UK"=>"^(GIR|[A-Z]\d[A-Z\d]??|[A-Z]{2}\d[A-Z\d]??)[ ]??(\d[A-Z]{2})$",
"DE"=>"\b((?:0[1-46-9]\d{3})|(?:[1-357-9]\d{4})|(?:[4][0-24-9]\d{3})|(?:[6][013-9]\d{3}))\b",
"CA"=>"^([ABCEGHJKLMNPRSTVXY]\d[ABCEGHJKLMNPRSTVWXYZ])\
{0,1}(\d[ABCEGHJKLMNPRSTVWXYZ]\d)$",
"FR"=>"^(F-)?((2[A|B])|[0-9]{2})[0-9]{3}$",
"IT"=>"^(V-|I-)?[0-9]{5}$",
"AU"=>"^(0[289][0-9]{2})|([1345689][0-9]{3})|(2[0-8][0-9]{2})|(290[0-9])|(291[0-4])|(7[0-4][0-9]{2})|(7[8-9][0-9]{2})$",
"NL"=>"^[1-9][0-9]{3}\s?([a-zA-Z]{2})?$",
"ES"=>"^([1-9]{2}|[0-9][1-9]|[1-9][0-9])[0-9]{3}$",
"DK"=>"^([D-d][K-k])?( |-)?[1-9]{1}[0-9]{3}$",
"SE"=>"^(s-|S-){0,1}[0-9]{3}\s?[0-9]{2}$",
"BE"=>"^[1-9]{1}[0-9]{3}$"
);

if ($ZIPREG[$country_code]) {

if (!preg_match("/".$ZIPREG[$country_code]."/i",$zip_postal)){
//Validation failed, provided zip/postal code is not valid.
} else {
//Validation passed, provided zip/postal code is valid.
}

} else {

//Validation not available

}

=======================================================================
OR ...



function fnValidatePostal($mValue, $sRegion = '')
{
$mValue = strtolower($mValue));
$sFirst = substr($mValue, 0, 1);
$sRegion = strtolower($sRegion);

$aRegion = array(
'nl' => 'a',
'ns' => 'b',
'pe' => 'c',
'nb' => 'e',
'qc' => array('g', 'h', 'j'),
'on' => array('k', 'l', 'm', 'n', 'p'),
'mb' => 'r',
'sk' => 's',
'ab' => 't',
'bc' => 'v',
'nt' => 'x',
'nu' => 'x',
'yt' => 'y'
);

if (preg_match('/[abceghjlkmnprstvxy]/', $sFirst) &&
!preg_match('/[dfioqu]/', $mValue) && preg_match('/^\w\d\w[-
]?\d\w\d$/', $mValue))
{
if (!empty($sRegion) && array_key_exists($sRegion, $aRegion))
{
if (is_array($aRegion[$sRegion]) && in_array($sFirst,
$aRegion[$sRegion]))
{
return true;
}
else if (is_string($aRegion[$sRegion]) && $sFirst ==
$aRegion[$sRegion])
{
return true;
}
}
else if (empty($sRegion))
{
return true;
}
}

return false;
}
===================================================
AND

===========================================================

Sounds like a regexp pattern like:

Code:

/^(?:[A-CEGHJ-NPR-TVX][0-9]){3}$/

could be used to validate the form of a given Canadian postal code
based on the description you gave. (Whether or not the postal code is
truly valid/used is, of course, another matter altogether.)


All that being said, I see that Canada Post has an API (and I'm
fairly sure the USPS does, too) ... you might actually check validity
with the code issuing authority at the time of submission....
------------------------



I'm most interested in using the API's that are available though as when
I figure out how to access them programatically I'll do so. Most of the
places I want to validate postal codes turn out to have an online API,
seriously relieving me of a lot of code and nullifying the possibility
of future changes, although there apparently have been few in the last
decade or so.
I'm a neophyte with little experience in these matters yet.

Cheers,

Twayne`

Jeff North

unread,
Aug 21, 2013, 7:25:53 AM8/21/13
to
On Tue, 20 Aug 2013 13:27:50 -0400, in comp.lang.php Twayne
<nob...@spamcop.net>
Try this (I found it on the web but can't remember where) and I
haven't tired it out:
^[ABCEGHJ-NPRSTVXY]{1}\d{1}[A-Z]{1}\s?\d{1}[A-Z]{1}\d{1}$

Thomas 'PointedEars' Lahn

unread,
Aug 21, 2013, 8:53:55 AM8/21/13
to
Norman Peelman wrote:

> On 08/20/2013 01:27 PM, Twayne wrote:
>> I'm attempting to check for US and Canadian zip codes (postal codes).
>> The US is easy; mostly just be sure it's five numerics and except
>> "00000" and "99999". But Canadian is a different story because:
>> It consists of alternating alpha and numeric characters (AnAnAn) but
>> not the entire alphabet. 8 N.A. English letters are not used, as in
>> DFIOQUW AND Z or put another way, they only use 18 letters in their
>> postal codes.
>> I haven't see a single example in all my research to check if the
>> 1st, 3rd, and 5th characters are alpha and th 2nd, 4th and 6th
>> characters are numeric.
>>
>> I've tried preg_match and strpos without succees, likely due to my own
>> weakness with preg_match, and regex creates an incredibly long statement
>> I'm sure it's not right to put upon the servers; they slow down even my
>> local server XAMPP & PHP 5.3 on win 7.
>>
>> Might anyone have a better method?
>>
>> Or know of any functions anywhere that could be modified to be used?
>
> US Zip code:
> [0-9]{5}(-{0,1}[0-9]{4}){0,1}
^^^^^ ^^^^^^^^^^ ^^^^^
In Perl-Compatible Regular Expressions (PCRE), as also used by PHP's preg_*
functions, the following shorthands are available:

- “*” for “{0,}”
- “?” for “{0,1}”
- “+” for “{1,}”
- “\d” for “[0-9]” (includes more numeric characters in “UTF-8 mode”)

Thus, the above expression can be simplified to

\d{5}(-?\d{4})?

However, the specification above says that “00000” and “99999” are _not_
valid U.S. ZIP codes, so to be exact you cannot just use either “[0-9]{5}”
or “\d{5}”; but you would have to use, for example, a zero-width negative
lookahead:

$possibleZips = array('00000', '00001', '99998', '99999');
foreach ($possibleZips as $possibleZip)
{
preg_match('^(?![09]{5})\\d{5}(?:-?\\d{4})?$', $possibleZip, $matches);
var_dump($possibleZip);
var_dump($matches);
}

(thanks to Anubhava: <http://stackoverflow.com/a/9609624/855543>)

> Canadian zip code (all one line, don't miss the space!):
> ([A-C,E,G-H,J-N,P,R-T,V,X,Y]{1}[0-9]{1})[A-C,E,G-H,J-N,P,R-T,V,X,Y]{1}
^ ^^^^^^^^
> {1}([0-9]{1}[A-C,E,G-H,J-N,P,R-T,V,X,Y]{1}[0-9]{1})

“{1}” is superfluous in all regular expression flavours (in BRE the escaped
variant is superfluous). An expression that matches, matches exactly one
time unless a following quantifier says otherwise.

In a character class expression, ranges are _not_ delimited by comma.
A comma there is a *literal* comma instead (just like most other special
characters lose, and “-” gains meaning), and repetitions are ignored:

[A-C,E,G-H,J-N,P,R-T,V,X,Y]

matches the same strings as

[A-CEG-HJ-NPR-TVXY,]

So unless you want to allow commas in ZIP codes, you need to remove them
from the respective character class.

Thus, the above expression would have to be changed, and can be simplified
to

^(?:[A-CEG-HJ-NPR-TVXY]\d){3}$

(The “^” makes sure that the second, fourth, aso. character must be a digit.
Let \s* follow it if you want to allow leading whitespace. Likewise for “$”
and trailing whitespace.)

Anyhow, if an expression is repeated, and this repetition cannot be handled
with a quantifier like above, in programming languages like PHP that allow
this, code is easier readable if you assign the repeated expression to a
variable, and have the variable reference expanded:

$cdn_letter = '[A-CEG-HJ-NPR-TVXY]';
$pattern = "^{$cdn_letter}\\d{$cdn_letter}\\d{$cdn_letter}\\d\$";

[In certain programming languages, libraries like my JSX:regexp.js [1] are
useful that allow you to define and use your own character class escape
sequences, eliminating the need for variable expansion: "\\p{cdnLetter}".]

Note that expansion/repetition is semantically different from expression
backreferences:

$pattern2 = "([A-CEG-HJ-NPR-TVXY])\\d\\1\\d\\1\\d";

$pattern would match "A1B2C3"; $pattern2 would match "A1A2A3", but not
"A1B2C3".


PointedEars
___________
[1] <http://PointedEars.de/scripts/test/regexp> p.
--
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
-- from <http://www.vortex-webdesign.com/help/hidesource.htm> (404-comp.)

Thomas 'PointedEars' Lahn

unread,
Aug 21, 2013, 8:58:31 AM8/21/13
to
Thomas 'PointedEars' Lahn wrote:

> Norman Peelman wrote:
>> On 08/20/2013 01:27 PM, Twayne wrote:
>>> I'm attempting to check for US and Canadian zip codes (postal codes).
>>> The US is easy; mostly just be sure it's five numerics and except
>>> "00000" and "99999". […]
>>
>> US Zip code:
>> [0-9]{5}(-{0,1}[0-9]{4}){0,1}
> ^^^^^ ^^^^^^^^^^ ^^^^^
> In Perl-Compatible Regular Expressions (PCRE), as also used by PHP's
> preg_* functions, the following shorthands are available:
>
> - “*” for “{0,}”
> - “?” for “{0,1}”
> - “+” for “{1,}”
> - “\d” for “[0-9]” (includes more numeric characters in “UTF-8 mode”)
>
> Thus, the above expression can be simplified to
>
> \d{5}(-?\d{4})?
>
> However, the specification above says that “00000” and “99999” are _not_
> valid U.S. ZIP codes, so to be exact you cannot just use either “[0-9]{5}”
> or “\d{5}”; but you would have to use, for example, a zero-width negative
> lookahead:
>
> $possibleZips = array('00000', '00001', '99998', '99999');
> foreach ($possibleZips as $possibleZip)
> {
> preg_match('^(?![09]{5})\\d{5}(?:-?\\d{4})?$', $possibleZip,

Replace this line with

preg_match('/^(?![09]{5})\\d{5}(?:-?\\d{4})?$/', $possibleZip,

(add the delimiter).

> $matches);
> var_dump($possibleZip);
> var_dump($matches);
> }
>
> (thanks to Anubhava: <http://stackoverflow.com/a/9609624/855543>)

--
PointedEars

Thomas 'PointedEars' Lahn

unread,
Aug 21, 2013, 9:06:52 AM8/21/13
to
Thomas 'PointedEars' Lahn wrote:

> Thomas 'PointedEars' Lahn wrote:
>> Norman Peelman wrote:
>>> On 08/20/2013 01:27 PM, Twayne wrote:
>>>> I'm attempting to check for US and Canadian zip codes (postal codes).
>>>> The US is easy; mostly just be sure it's five numerics and except
>>>> "00000" and "99999". […]
>>>
>>> US Zip code:
>>> [0-9]{5}(-{0,1}[0-9]{4}){0,1}
>> ^^^^^ ^^^^^^^^^^ ^^^^^
>> […]
>> However, the specification above says that “00000” and “99999” are _not_
>> valid U.S. ZIP codes, so to be exact you cannot just use either
>> “[0-9]{5}” or “\d{5}”; but you would have to use, for example, a
>> zero-width negative lookahead:
>>
>> $possibleZips = array('00000', '00001', '99998', '99999');

$possibleZips = array('00000', '00001', '99998', '90909', '99999');

>> foreach ($possibleZips as $possibleZip)
>> {
>> preg_match('^(?![09]{5})\\d{5}(?:-?\\d{4})?$', $possibleZip,
>
> Replace this line with
>
> preg_match('/^(?![09]{5})\\d{5}(?:-?\\d{4})?$/', $possibleZip,
>
> (add the delimiter).

Which is still not correct, because '/[09]{5}/' matches "90909", which is
thus designated not valid. ISTM that alternation is required here:

preg_match('/^(?!0{5}|9{5})\\d{5}(?:-?\\d{4})?$/', $possibleZip);


PointedEars
--
var bugRiddenCrashPronePieceOfJunk = (
navigator.userAgent.indexOf('MSIE 5') != -1
&& navigator.userAgent.indexOf('Mac') != -1
) // Plone, register_function.js:16

Twayne

unread,
Aug 21, 2013, 6:53:26 PM8/21/13
to
lol! I'll be happy to check it out! :) I never did get my own to work,
and that's a bit different from what I used.

Twayne`

Twayne

unread,
Aug 21, 2013, 7:10:24 PM8/21/13
to
On 2013-08-21 8:53 AM, Thomas 'PointedEars' Lahn wrote:
> Norman Peelman wrote:
>
>> On 08/20/2013 01:27 PM, Twayne wrote:
>>> I'm attempting to check for US and Canadian zip codes (postal codes).
>>> The US is easy; mostly just be sure it's five numerics and except
>>> "00000" and "99999". But Canadian is a different story because:

...
Woof! A veritable cornucopia of information which I've already dedicated
to a file on my hard drive! I wasn't aware of most of that and it's
going to be really handy soon's I understand it all, for now and the
future.

One slight correction: the Canadian valid letters are:
abc e gh jklmn p rst vxy .

Not sure where it went astray; if you need clarification visit the
Canadian Postal Code reference; don't have the URL itself. Besides, it's
always best to verify ANY information from any source on the 'net.

If you happen to know the Canadian system at all, the fuller breadk-down is:
$aRegion = array(
'nl' => 'a',
'ns' => 'b',
'pe' => 'c',
'nb' => 'e',
'qc' => array('g', 'h', 'j'),
'on' => array('k', 'l', 'm', 'n', 'p'),
'mb' => 'r',
'sk' => 's',
'ab' => 't',
'bc' => 'v',
'nt' => 'x',
'nu' => 'x',
'yt' => 'y'
);

Also verifiable at the Canadian Postal website, including a map.

Thanks much!

Twayne`

BootNic

unread,
Aug 21, 2013, 10:09:25 PM8/21/13
to
In article <kv3hd4$p9g$1...@speranza.aioe.org>, Twayne <nob...@spamcop.net>
wrote:

[snip]

> One slight correction: the Canadian valid letters are:
> abc e gh jklmn p rst vxy .

The second and third letters may also contain [WZ]

[url] https://maps.google.com/maps?q=B0W+1H0 [/url]

[url] https://maps.google.com/maps?q=A1W+4Z1 [/url]

The pattern you posted in: Message-ID: <kv11lq$8eb$1...@speranza.aioe.org>

[url]
https://groups.google.com/d/msg/comp.lang.php/D-OKMe-iZa4/ARnTsmDHSdIJ
[/url]

"^([ABCEGHJKLMNPRSTVXY]\d[ABCEGHJKLMNPRSTVWXYZ])" .
"\ {0,1}(\d[ABCEGHJKLMNPRSTVWXYZ]\d)$"

seem to work as it is.


^(?=.{3} ?.{3})(?!.{0,}[DFIOQU]|[WZ])(?:[A-Z] ?\d){3}$

• (?=.{3} ?.{3}) Positive lookahead

◦ string matches [3 charters] [optional space] [3 charters]

▪ charters are restricted in the rest of the expression

▪ remove the question mark to make the space required

• (?!.{0,}[DFIOQU]|[WZ]) Negative lookahead

◦ .{0,}[DFIOQU] string may not contain ‘DFIOQU’

◦ [WZ] string may not start with ‘W’ or ‘Z’

• (?:[A-Z] ?\d){3} Basic pattern repeat 3 times

◦ [any letter A-Z] [optional space] [any digit 0-9]

▪ spaces are restricted (index 3 or none) in the Positive lookahead

▪ letters are restricted in the Negative lookahead

[snip]





--
BootNic Wed Aug 21, 2013 10:09 pm
The human mind treats a new idea the same way the body treats a strange
protein; it rejects it.
*P. B. Medawar*
signature.asc

Norman Peelman

unread,
Aug 21, 2013, 10:46:01 PM8/21/13
to
I'd say it's a bit different... it doesn't match the rules as given.

Twayne

unread,
Aug 22, 2013, 6:22:18 PM8/22/13
to
<QUOTE>
[snip]

> One slight correction: the Canadian valid letters are:
> abc e gh jklmn p rst vxy .

The second and third letters may also contain [WZ]

[url] https://maps.google.com/maps?q=B0W+1H0 [/url]

[url] https://maps.google.com/maps?q=A1W+4Z1 [/url]

The pattern you posted in: Message-ID: <kv11lq$8eb$1...@speranza.aioe.org>

[url]
https://groups.google.com/d/msg/comp.lang.php/D-OKMe-iZa4/ARnTsmDHSdIJ
[/url]

"^([ABCEGHJKLMNPRSTVXY]\d[ABCEGHJKLMNPRSTVWXYZ])" .
"\ {0,1}(\d[ABCEGHJKLMNPRSTVWXYZ]\d)$"

seem to work as it is.
</QUOTE>

Thanks for that, but this is a perfect example of an API use instead of
other methods of checking postal code validity. I haven't mentioned a
few things because I didn't want to become involved in a long session of
e-mail about their postal codes.
There is a lot of confusion in all these areas in Canada, not the
least of them being the law-suit over the "list" copyright being
violated so much. I pretty much at this time consider Canada Post to be
the "bible" for information, as loose and lacking as their information
is. It's pretty clear why the general concensus is to use A-Za-z for the
alpha part of the codes.
For instance, in a few pages on their site, it says "New Postal
Codes are added every month." but nowhere is there any information on
what's been added or where either.
And, a lot of postal zones have created their OWN postal codes,
without the blessing of Canada Post, and they work just fine because
that particular area is known to that office and thus delivers to it.
They're even trying to trade-mark "Canada Post" according to more than a
couple sources, meaning no one carrying their "list", meaning those
carrying their "lists" couldn't say that's where the codes originated.
Lot of miscellaneous wrong-headedness is going on too but I have to
draw the line and be realistic.
That I can find the post doesn't evein indicate the need for the
space in the postal code. Looking at their images sure isn't definitive,
at their website, that is. And at one time it started to be a dash, then
reverted to a space again, and so on.

This is probably the last I'll have to say on this subject. suffice it
to say I settled on all letters and all digits with a space in the middle.

Regards,

Twayne`

Curtis Dyer

unread,
Aug 24, 2013, 6:22:52 AM8/24/13
to
Thomas 'PointedEars' Lahn <Point...@web.de> wrote:

> Thomas 'PointedEars' Lahn wrote:
>
>> Thomas 'PointedEars' Lahn wrote:
>>> Norman Peelman wrote:
>>>> On 08/20/2013 01:27 PM, Twayne wrote:
>>>>> I'm attempting to check for US and Canadian zip codes
>>>>> (postal codes). The US is easy; mostly just be sure it's
>>>>> five numerics and except "00000" and "99999". […]

<snip>

>> Replace this line with
>>
>> preg_match('/^(?![09]{5})\\d{5}(?:-?\\d{4})?$/',
>> $possibleZip,
>>
>> (add the delimiter).
>
> Which is still not correct, because '/[09]{5}/' matches "90909",
> which is thus designated not valid. ISTM that alternation is
> required here:
>
> preg_match('/^(?!0{5}|9{5})\\d{5}(?:-?\\d{4})?$/',
> $possibleZip);

Perhaps you might also keep invalid postal codes in an array to look
up before utilizing the regular expression.

<snip>

--
Curtis Dyer
<?$x='<?$x=%c%s%c;printf($x,39,$x,39);?>';printf($x,39,$x,39);?>

Thomas 'PointedEars' Lahn

unread,
Aug 24, 2013, 6:52:10 AM8/24/13
to
Curtis Dyer wrote:

> Thomas 'PointedEars' Lahn <Point...@web.de> wrote:
>> Thomas 'PointedEars' Lahn wrote:
>>> Thomas 'PointedEars' Lahn wrote:
>>>> Norman Peelman wrote:
>>>>> On 08/20/2013 01:27 PM, Twayne wrote:
>>>>>> I'm attempting to check for US and Canadian zip codes
>>>>>> (postal codes). The US is easy; mostly just be sure it's
>>>>>> five numerics and except "00000" and "99999". […]
>
> <snip>
>
>>> Replace this line with
>>>
>>> preg_match('/^(?![09]{5})\\d{5}(?:-?\\d{4})?$/',
>>> $possibleZip,
>>>
>>> (add the delimiter).
>>
>> Which is still not correct, because '/[09]{5}/' matches "90909",
>> which is thus designated not valid. ISTM that alternation is
>> required here:
>>
>> preg_match('/^(?!0{5}|9{5})\\d{5}(?:-?\\d{4})?$/',
>> $possibleZip);
>
> Perhaps you might also keep invalid postal codes in an array to look
> up before utilizing the regular expression.

I did that exactly where you snipped.


PointedEars
--
Anyone who slaps a 'this page is best viewed with Browser X' label on
a Web page appears to be yearning for the bad old days, before the Web,
when you had very little chance of reading a document written on another
computer, another word processor, or another network. -- Tim Berners-Lee

Doug Miller

unread,
Aug 25, 2013, 10:31:34 AM8/25/13
to
Curtis Dyer <dye...@gmail.com> wrote in news:kva1hr$rc0$1...@dont-email.me:

> Perhaps you might also keep invalid postal codes in an array to look
> up before utilizing the regular expression.

What happens when new postal codes are added?

Perhaps one might simply send a request to the USPS address validation API to find out if it's
a valid zip code or not. Use the "City-State Lookup" tool described here:

https://www.usps.com/business/web-tools-apis/address-information.htm

I don't know if Canada Post has similar tools.

Twayne

unread,
Aug 25, 2013, 1:17:09 PM8/25/13
to
On 2013-08-24 6:22 AM, Curtis Dyer wrote:
> Thomas 'PointedEars' Lahn <Point...@web.de> wrote:
>
>> Thomas 'PointedEars' Lahn wrote:
>>
>>> Thomas 'PointedEars' Lahn wrote:
>>>> Norman Peelman wrote:
>>>>> On 08/20/2013 01:27 PM, Twayne wrote:
>>>>>> I'm attempting to check for US and Canadian zip codes
>>>>>> (postal codes). The US is easy; mostly just be sure it's
>>>>>> five numerics and except "00000" and "99999". […]
>
> <snip>
>
>>> Replace this line with
>>>
>>> preg_match('/^(?![09]{5})\\d{5}(?:-?\\d{4})?$/',
>>> $possibleZip,
>>>
>>> (add the delimiter).
>>
>> Which is still not correct, because '/[09]{5}/' matches "90909",
>> which is thus designated not valid. ISTM that alternation is
>> required here:
>>
>> preg_match('/^(?!0{5}|9{5})\\d{5}(?:-?\\d{4})?$/',
>> $possibleZip);
>
> Perhaps you might also keep invalid postal codes in an array to look
> up before utilizing the regular expression.
>
> <snip>
>
That's a great idea!

Thanks!

Twayne

unread,
Aug 25, 2013, 1:21:38 PM8/25/13
to
Someone said there was a CDN API but I haven't gotten to it yet, though
I have seen it. Haven't tried to programmatically access it yet though;
want to finish up what I've been doing.

Thanks

Gordon Burditt

unread,
Aug 31, 2013, 2:17:55 AM8/31/13
to
> I'm attempting to check for US and Canadian zip codes (postal codes).

For what purpose? The reason I ask is that the type of checking you
do may depend on the reason for the checking.

What do you want to do with the (US) ZIP code? (a) Mail them a
first-class letter, (b) Mail them a package, (c) Use it as verification
for the billing address on a credit card (which is about equivalent
to (a) but the credit card company does the actual mailing), (Note:
validating a credit card transaction against a payment processor
often costs real money, so it is often desirable to verify the Luhn
checksum on the credit card number, check the number of digits vs.
the bank prefix, and validate the address at least for a valid
country and state/province before sending it to the payment processor.)
(d) figure out how far it is for them to drive to one of your stores,
or (e) Use it to figure out the country since it's too hard to ask
for it? If you're checking because some standard requires you to,
name the standard (and preferably the section). If you're checking
because, well, er, um, I dunno, you're supposed to check user input,
aren't you? maybe you need to review what it is you're trying to
accomplish.

If your requirement is to check because you want to check, you don't
care WHAT you check, you just wanna check! You WANNA! You WANNA!
You WANNA! then I have to wonder if the whole project should be
dropped.

For the USA, there are 5 types of 5-digit ZIP codes (and you
need a database that's kept up to date for determining this):

Unassigned. There are references to 42,000 codes being allocated
(as of 2011), so the majority of the 100,000 possible codes are
unassigned. 00000 and 99999 are permanently unassigned but only
the tip of a very large iceberg. (Usually, unassigned codes are
not assigned a type; they are just left out of the list entirely.)

Standard. These are geographic ZIP codes covering an area, what you'd
normally think of as a ZIP code area.

Post Office. These are ZIP codes with an area covering only part
of the inside of a USPS Post Office (PO boxes, caller services,
etc.) Typically a given PO Box has a unique 9-digit ZIP code. As
a test, I once mailed a letter to "Jeff Snerfelbot" (not anywhere
close to my name) with a 9-digit zip code of a PO box. It arrived.

Unique. Some 5-digit ZIP codes are allocated to a single entity
that generates or receives lots of mail. For example, Wal-Mart
Stores has 72716 and the CIA has 20505. It is rumored that Publisher's
Clearing House has at least 2 5-digit ZIP codes, one for "YES" and
one for "NO".

Military. These are used to route mail to US military forces,
including those stationed outside the country.

The Unassigned ZIP codes are clearly invalid.

The Standard and Unique ZIP codes are valid for most purposes.

If you are planning on using UPS or FedEx, the Post Office ZIP codes
are probably invalid, since they are filled with PO Boxes in USPS
Post Offices.

If you are planning on sending something bulky, the Military ZIP
codes may be invalid. They may have special rules on what you can
send.

A ZIP code may change. I was in 2 zip code splits near Houston,
Texas around Feb. 1976 where the apartment I moved out of and the
house I moved into both changed (5-digit) ZIP codes. 9-digit
codes hadn't been put to use yet.

Private mail box services generally appear in a Standard ZIP code
and UPS or FedEx can probably deliver there if the address looks
like a street address.

A valid address doesn't avoid the possibility that (a) it's an empty
lot, and has always been an empty lot, (b) it's unoccupied, or (c)
it burned down years ago. Someone might rebuild. The Post Office
might eventually invalidate addresses taken over for a freeway
interchange or for which soil erosion and/or a hurricane has placed
it permanently underwater.


You probably should accept a 9-digit ZIP code as well, even if you
ignore the extra 4 digits beyond checking that they are digits,
especially if part of the purpose is country recognition.

If your requirements are to (1) accept what is or may become a valid
code in the near future, and (2) must not require periodic database
updates, then I suggest you figure out what the basic pattern is
and check that. Be liberal in what you accept, as you can't fully
predict the future.

The USA is easy. Rejecting 00000 and 99999 is simple, and you don't
worry about the other 57,998 or so codes that might get used in the
future.

Unless you can find a general specification for Canadian Postal
codes, *NOT* what's currently in use, you're probably better off
allowing A-Z everywhere a letter is currently allowed. Is there
anything around that says that an asterisk won't become part of a
valid Canadian Postal code? or telephone number? There are
supposedly some codes reserved for testing. Those you can exclude.
You might also exclude H0H 0H0, which is reserved for Santa Claus.

On the other hand, if your requirements are to (1) eliminate as
many bad codes as possible, and (2) a database subscription to
issued codes is acceptable, and (3) an occasional rejection of brand
new codes is acceptable, then you want to find an API preferably
maintained by someone else to do the checking for you.

Fiver

unread,
Aug 31, 2013, 10:40:35 AM8/31/13
to
On 2013-08-31 08:17, Gordon Burditt wrote:
>> I'm attempting to check for US and Canadian zip codes (postal codes).
>
> For what purpose? The reason I ask is that the type of checking you
> do may depend on the reason for the checking.
>
> What do you want to do with the (US) ZIP code?
[..snip..]
> If your requirement is to check because you want to check, you don't
> care WHAT you check, you just wanna check! You WANNA! You WANNA!
> You WANNA! then I have to wonder if the whole project should be
> dropped.

If he had no reason to collect the postal codes at all, he wouldn't have
put them in the form (I hope). As soon as he does collect and store them
somewhere (like a database), for whatever purpose, he needs to do some
basic formal/plausibility checking.

This would be true even if he didn't limit the input to US and Canadian
codes. There are some general assumptions you can make about postal
codes that are valid everywhere in the world. For example

- no valid code will start/end with white space
- codes will never be longer than 20 characters
- codes without any alphanumeric characters are never valid

This is just basic data hygiene. I would never store user input without
at least checking the known formal constraints of the field.

The more he knows about the codes, the more he can check. This is where
the formal validation for the US/Canadian codes comes in. The general
format is known. If he has this information, he should use it, if only
to prevent some categories of typos.

Checking if a postal code is actually assigned to a real location at
this point in time, or if a certain delivery method is available to the
location - that's well outside the area of formal checks. If this level
of validation is required, he'll need an API from the delivery service
(or whoever is going to use the code). Then he needs to validate the
code at least twice: once on entry and once before sending out an actual
package.

> The USA is easy. Rejecting 00000 and 99999 is simple, and you don't
> worry about the other 57,998 or so codes that might get used in the
> future.
>
> Unless you can find a general specification for Canadian Postal
> codes, *NOT* what's currently in use, you're probably better off
> allowing A-Z everywhere a letter is currently allowed.

Good advice.

> You might also exclude H0H 0H0, which is reserved for Santa Claus.

Just wondering... why does Santa Claus have a Canadian postal code?
The geographic north pole outside Canadian territory. If his house was
as close to the north pole as he can get while still remaining on firm
land, he should have a Danish postal code.

regards,
5er

Norman Peelman

unread,
Aug 31, 2013, 11:56:32 AM8/31/13
to
On 08/31/2013 10:40 AM, Fiver wrote:
> On 2013-08-31 08:17, Gordon Burditt wrote:
>>> I'm attempting to check for US and Canadian zip codes (postal codes).
>>
>
>> You might also exclude H0H 0H0, which is reserved for Santa Claus.
>
> Just wondering... why does Santa Claus have a Canadian postal code?
> The geographic north pole outside Canadian territory. If his house was
> as close to the north pole as he can get while still remaining on firm
> land, he should have a Danish postal code.
>
> regards,
> 5er
>

H0H 0H0 is his Canadian mailing address... there are others.

Martin Leese

unread,
Aug 31, 2013, 11:56:33 AM8/31/13
to
Fiver wrote:

> On 2013-08-31 08:17, Gordon Burditt wrote:
...
>> You might also exclude H0H 0H0, which is reserved for Santa Claus.
>
> Just wondering... why does Santa Claus have a Canadian postal code?

So that Canadian children can write to him.

--
Regards,
Martin Leese
E-mail: ple...@see.Web.for.e-mail.INVALID
Web: http://members.tripod.com/martin_leese/

Welsh Vanner

unread,
Aug 31, 2013, 3:23:18 PM8/31/13
to
On Sat, 31 Aug 2013 11:56:32 -0400, Norman Peelman wrote:
>
> H0H 0H0 is his Canadian mailing address... there are others.

According to the UK Royal Mail his address is
Santa/Father Christmas,
Santa’s Grotto,
Reindeerland,
SAN TA1

:-)

Twayne

unread,
Aug 31, 2013, 8:07:56 PM8/31/13
to
On 2013-08-31 2:17 AM, Gordon Burditt wrote:
>> I'm attempting to check for US and Canadian zip codes (postal codes).
>
> For what purpose? The reason I ask is that the type of checking you
> do may depend on the reason for the checking.

While I certainly appreciate all the data you posted, I'm aware of most
of it, not all, and my eventual course, which I'm getting around to now,
is to use their API to determine valid codes (or not).

My "real" reason? Well, I've accomplished what I've set out to do for
now and the next step is using APIs where they exist; supply a
zip/postal code and get a response for whether it's valid or not.

>
...

maybe you need to review what it is you're trying to
> accomplish.
>

My goal is to learn, and what I've learned from tis thread is a great
deal about using PHP to handle these matters for postal codes and many
other similar formats that have nothing to do with postal codes. I
believe I have picked up a good deal of the information/experience I
need now, and am better off for it.

...

>
> The Standard and Unique ZIP codes are valid for most purposes.

Agreed.
>
...
>
> A ZIP code may change. I was in 2 zip code splits near Houston,
> Texas around Feb. 1976 where the apartment I moved out of and the
> house I moved into both changed (5-digit) ZIP codes. 9-digit
> codes hadn't been put to use yet.

Which is the draw to using their automated lookups to determine the
validity or not; it's my current bent.

>

...
>
> On the other hand, if your requirements are to (1) eliminate as
> many bad codes as possible, and (2) a database subscription to
> issued codes is acceptable, and (3) an occasional rejection of brand
> new codes is acceptable, then you want to find an API preferably
> maintained by someone else to do the checking for you.

They're freely available for the US and Canada; I've used them manually
and they accomplish my goals for me. I'm very hungry for information
from the learning PHP POV and this has been an excellent thread to that
end. One of the better things is having learned ctype_ ... something
I've never used successfully before and now find it an easy thing to
handle.

Again, thanks for all that information; I appreciate a post like this
and don't mind saying so.

Regards,

Twayne`
>

Twayne

unread,
Aug 31, 2013, 8:21:50 PM8/31/13
to
On 2013-08-31 11:56 AM, Norman Peelman wrote:
> On 08/31/2013 10:40 AM, Fiver wrote:
>> On 2013-08-31 08:17, Gordon Burditt wrote:
>>>> I'm attempting to check for US and Canadian zip codes (postal codes).
>>>
>>
>>> You might also exclude H0H 0H0, which is reserved for Santa Claus.
>>
>> Just wondering... why does Santa Claus have a Canadian postal code?
>> The geographic north pole outside Canadian territory. If his house was
>> as close to the north pole as he can get while still remaining on firm
>> land, he should have a Danish postal code.
>>
>> regards,
>> 5er
>>
>
> H0H 0H0 is his Canadian mailing address... there are others.
>

True; every major city seems to have an address and a lot more just make
up the zip codes so they seem logical to kids, and advertised locally at
that. The post office actually just looks at who it's addressed to; if
it's Santa, Kringle et al, it goes to the various bags they set aside
for them. Here you can even get letters to answer yourself if there's an
address to go along with it. They keep lists of who's 'naughty and nice'
and it's legally the registered person's responsibility to not be stupid
with their letters. Here there's even a form letter template that has to
be used. There are even charity "bags" where you can anonymously
signed-up send gifts to kids in need, and a lot more. It works neatly
here since we're a small rural community; I don't know how other, larger
cities work it. AFAIK there has never been a miscreant in the process;
here, at least. Oh, and they have to be sent in special envelopes, too,
that are donated for the purpose. I've done print runs for them several
times.

It reaffirms the good in people; as long as it's a full registration and
responsibility oriented.

Cheers, won't be long & it'll be here!

Twayne`

Moon Elf

unread,
Dec 16, 2013, 5:02:23 AM12/16/13
to
A faster algorithm would be to use regexes which use .+ .* with a unique
fingerprint. The above code is grinding your system probably.

I am sure tutorials such as Mastering Regular Expressions 2nd ed. would help
out.
Usually you do not test on vars unless you don't know at runtime what they
are which is not good coding practice.

>===================================================
> AND
>
>===========================================================
>
> Sounds like a regexp pattern like:
>
> Code:
>
> /^(?:[A-CEGHJ-NPR-TVX][0-9]){3}$/
>
> could be used to validate the form of a given Canadian postal code
> based on the description you gave. (Whether or not the postal code is
> truly valid/used is, of course, another matter altogether.)
>
>
> All that being said, I see that Canada Post has an API (and I'm
> fairly sure the USPS does, too) ... you might actually check validity
> with the code issuing authority at the time of submission....
> ------------------------
>
>
>
> I'm most interested in using the API's that are available though as when
> I figure out how to access them programatically I'll do so. Most of the
> places I want to validate postal codes turn out to have an online API,
> seriously relieving me of a lot of code and nullifying the possibility
> of future changes, although there apparently have been few in the last
> decade or so.
> I'm a neophyte with little experience in these matters yet.
>
> Cheers,
>
> Twayne`

PHP is procedural if you want, which I like but you need to be clever in what
you write, procedurally or not.

ME

--
Member of the DR rogue circle.
Search and you will find.

Doug Miller

unread,
Dec 16, 2013, 7:27:51 AM12/16/13
to
Moon Elf <moo...@moonelfsystem.net> wrote in news:slrnlatjrr....@sigtrans.org:

> On 2013-08-21, Twayne <nob...@spamcop.net> wrote:
[...]
>><?php
>> $country_code="US";
>> $zip_postal="11111";
>>
>> $ZIPREG=array(
>> "US"=>"^\d{5}([\-]?\d{4})?$",
[...]
>
> A faster algorithm would be to use regexes which use .+ .* with a unique
> fingerprint. The above code is grinding your system probably.
>
> I am sure tutorials such as Mastering Regular Expressions 2nd ed. would help
> out.

Never mind that -- the bigger problem is that it's just plain wrong. Comparing to a RegEx can
determine only if a postal code has the correct *format*, not if it is actually a valid code.

According to this ,
>
>> if ($ZIPREG[$country_code]) {
>>
>> if (!preg_match("/".$ZIPREG[$country_code]."/i",$zip_postal)){
>> //Validation failed, provided zip/postal code is not valid.
>> } else {
>> //Validation passed, provided zip/postal code is valid.
>> }

00000-0000 is a "valid" US postal code. It's not. It's correctly *formatted*, but its contents do
not correspond to a valid ZIP+4.

99999 also fails; it matches the RegEx, but is not a valid ZIP code. Same problem with
11111, 22222, 33333, and 54321 -- and thousands of others. Out of the 100,000 possible 5-
digit zip codes, less than 42,000 are actually in use, but this algorithm will say that all 100,000
of them are valid.

And since only 42K out of 100K 5-digit zip codes are actually in use, *at most* 420 million of
the one billion possible ZIP+4 codes can be valid, making *at least* 580 million *more*
invalid ZIP+4 codes that this algorithm will incorrectly declare to be valid.

It's pretty likely that similar problems exist for the other 11 nations as well.
0 new messages