Contact to User Correlation

73 views
Skip to first unread message

Robert DiFalco

unread,
Nov 21, 2013, 10:22:55 AM11/21/13
to libphonenum...@googlegroups.com
libphonenumber has allowed me to have my mobile users enter valid E.164 phone numbers upon registration and further more have all their wellformed contacts, given the carrier country code, stored as E.164. This allows me to correlate the contacts of one user with other users in my system no matter where in the world they are.

I am currently storing the national long (as an indexed field) along with the E.164 string.

To query I join on the long field from the User.phone_natl to the Contact.phone_natl. Then for those I do a quick check to see if the e164 strings compare.

I was wondering if anyone has ever calculated the probability that national numbers will overlap in the world. I know mathematically that they will but was curious about the probability.

Given that the probability is too high, is there a quicker way to join these two tables? Maybe taking the left most 5 digits of an E.164 formatted phone number, turning that into a NUMBER, indexing that, and joining it with the national number. Or is the way I'm currently doing it the best approach?

Also, FWIW, IOS provides the carrier country code as lower-case. It seems that you require it to be upper case. It's a simple matter for me to upper case it before using your library but I just thought I'd let you know this issue exists. 

Thanks for a great library!

R.

Lara Rennie

unread,
Nov 27, 2013, 11:33:28 AM11/27/13
to libphonenum...@googlegroups.com

I'm pretty sure lots of national numbers overlap, but I have never calculated the probability. I used to have a cell number in Norway where the last 7 digits were the same as my Dad's in NZ and this caused some bad caller ID systems problems... But there are only say 200 country calling codes so worst case you're comparing your number against 200 others... And they vary in length from 4 to say 16 digits or whatever Germany's up to these days....

Note that E164 doesn't include extensions so multiple contacts from the same company with different extensions would be conflated.

Re region code: Yes, we decided instead of upper casing everywhere in our library we'd mandate upper case region codes, it matches the standard.

--
You received this message because you are subscribed to the Google Groups "libphonenumber-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to libphonenumber-di...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Robert DiFalco

unread,
Nov 27, 2013, 11:46:59 AM11/27/13
to libphonenum...@googlegroups.com
Thanks Lara, this is a mobile app so I only care about correlating users of the app with the contacts of other users of the app. If that makes sense. So I will only be correlating mobile numbers.

It just occurred to me, and maybe you can tell me if I'm right, I don't even need to store the E.164 string, just the two digit country code and the national number should be sufficient and give me as much uniqueness as the E.164 but more efficiently for querying.



--
You received this message because you are subscribed to a topic in the Google Groups "libphonenumber-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/libphonenumber-discuss/3ChYlOA8o9Q/unsubscribe.
To unsubscribe from this group and all its topics, send an email to libphonenumber-di...@googlegroups.com.

David Cantrell

unread,
Nov 27, 2013, 11:53:40 AM11/27/13
to libphonenum...@googlegroups.com
On Wed, Nov 27, 2013 at 05:33:28PM +0100, Lara Rennie wrote:
> I'm pretty sure lots of national numbers overlap, but I have never
> calculated the probability. I used to have a cell number in Norway where
> the last 7 digits were the same as my Dad's in NZ and this caused some bad
> caller ID systems problems... But there are only say 200 country calling
> codes so worst case you're comparing your number against 200 others... And
> they vary in length from 4 to say 16 digits or whatever Germany's up to
> these days....

FWIW the longest a number can be, including the country code, is 15
digits. Germany is +49, two digits, so the longest possible national
number (at least the longest that would be accessible from outside
Germany) is 13 digits. The longest actually in use is 12.

> > I was wondering if anyone has ever calculated the probability that
> > national numbers will overlap in the world. I know mathematically that they
> > will but was curious about the probability.

Something like a third of UK subscriber numbers will overlap with
NANP numbers. London, for example, has the same numbers as people in
Connecticut, Maine and Idaho. The reverse doesn't necessarily hold true
though.

However, given that a third of numbers overlap with just one other
national numbering scheme, I assume that something close to *all* UK
numbers will overlap with something somewhere in the world. The same
logic applies to all other countries. You could actually make a
reasonable stab at figuring it out based solely on the data in
PhoneNumberMetadata.xml.

--
David Cantrell | Hero of the Information Age

I think the most difficult moment that anyone could face is seeing
their domestic servants, whether maid or drivers, run away
-- Abdul Rahman Al-Sheikh, writing on 25 Jan 2004 at
http://archive.arabnews.com/?article=38558

Lara Rennie

unread,
Nov 27, 2013, 12:03:06 PM11/27/13
to libphonenum...@googlegroups.com


27. nov. 2013 17:53 skrev "David Cantrell" <da...@cantrell.org.uk> følgende:
>
> On Wed, Nov 27, 2013 at 05:33:28PM +0100, Lara Rennie wrote:
> > I'm pretty sure lots of national numbers overlap, but I have never
> > calculated the probability. I used to have a cell number in Norway where
> > the last 7 digits were the same as my Dad's in NZ and this caused some bad
> > caller ID systems problems... But there are only say 200 country calling
> > codes so worst case you're comparing your number against 200 others... And
> > they vary in length from 4 to say 16 digits or whatever Germany's up to
> > these days....
>
> FWIW the longest a number can be, including the country code, is 15
> digits. Germany is +49, two digits, so the longest possible national
> number (at least the longest that would be accessible from outside
> Germany) is 13 digits. The longest actually in use is 12.

Germany ignores the standard. We have successfully dialled longer numbers from outside the country... We accept up to 15 for the nsn at the moment.

>
> > > I was wondering if anyone has ever calculated the probability that
> > > national numbers will overlap in the world. I know mathematically that they
> > > will but was curious about the probability.
>
> Something like a third of UK subscriber numbers will overlap with
> NANP numbers.  London, for example, has the same numbers as people in
> Connecticut, Maine and Idaho.  The reverse doesn't necessarily hold true
> though.
>
> However, given that a third of numbers overlap with just one other
> national numbering scheme, I assume that something close to *all* UK
> numbers will overlap with something somewhere in the world. The same
> logic applies to all other countries. You could actually make a
> reasonable stab at figuring it out based solely on the data in
> PhoneNumberMetadata.xml.
>
> --
> David Cantrell | Hero of the Information Age
>
> I think the most difficult moment that anyone could face is seeing
> their domestic servants, whether maid or drivers, run away
>   -- Abdul Rahman Al-Sheikh, writing on 25 Jan 2004 at
>      http://archive.arabnews.com/?article=38558
>

Lara Rennie

unread,
Nov 27, 2013, 12:03:28 PM11/27/13
to libphonenum...@googlegroups.com

Yes, if all numbers are mobile then other fields like leading zeros can be ignored since that shouldn't be the only difference between two numbers.

David Cantrell

unread,
Nov 27, 2013, 12:04:39 PM11/27/13
to libphonenum...@googlegroups.com
On Wed, Nov 27, 2013 at 08:46:59AM -0800, Robert DiFalco wrote:

> It just occurred to me, and maybe you can tell me if I'm right, I don't
> even need to store the E.164 string, just the two digit country code and
> the national number should be sufficient and give me as much uniqueness as
> the E.164 but more efficiently for querying.

Country codes are anything from 1 to 3 digits (or 4 if you count things
like 1876 as Jamaica's code instead of it being 1 and an area code).

Also don't forget that country codes can be shared by several countries
and countries can have more than one country code. +1 and +7 are the
most obvious, but also consider odd-balls like Kosovo, which uses chunks
of the numbering space from several different countries. It's using - at
least - bits of +381 (Serbia), +377 (Monaco) and +386 (Slovenia). No
doubt places like Transnistria and Abkhazia have similarly quirky
arrangements.

--
David Cantrell | London Perl Mongers Deputy Chief Heretic

Cum catapultae proscriptae erunt tum soli proscript catapultas habebunt

Lara Rennie

unread,
Nov 27, 2013, 12:07:20 PM11/27/13
to libphonenum...@googlegroups.com


27. nov. 2013 18:04 skrev "David Cantrell" <da...@cantrell.org.uk> følgende:
>
> On Wed, Nov 27, 2013 at 08:46:59AM -0800, Robert DiFalco wrote:
>
> > It just occurred to me, and maybe you can tell me if I'm right, I don't
> > even need to store the E.164 string, just the two digit country code and
> > the national number should be sufficient and give me as much uniqueness as
> > the E.164 but more efficiently for querying.
>
> Country codes are anything from 1 to 3 digits (or 4 if you count things
> like 1876 as Jamaica's code instead of it being 1 and an area code).

Yes, 1 to 3 clearly instead of 2. (4 would overlap with national number and is not helpful or necessary)

>
> Also don't forget that country codes can be shared by several countries
> and countries can have more than one country code.  +1 and +7 are the
> most obvious, but also consider odd-balls like Kosovo, which uses chunks
> of the numbering space from several different countries. It's using - at
> least - bits of +381 (Serbia), +377 (Monaco) and +386 (Slovenia). No
> doubt places like Transnistria and Abkhazia have similarly quirky
> arrangements.

But that shouldn't affect his usecase.

>
> --
> David Cantrell | London Perl Mongers Deputy Chief Heretic
>
> Cum catapultae proscriptae erunt tum soli proscript catapultas habebunt
>

David Cantrell

unread,
Nov 27, 2013, 12:29:57 PM11/27/13
to libphonenum...@googlegroups.com
On Wed, Nov 27, 2013 at 06:03:06PM +0100, Lara Rennie wrote:
> 27. nov. 2013 17:53 skrev "David Cantrell" <da...@cantrell.org.uk> f�lgende:
> > FWIW the longest a number can be, including the country code, is 15
> > digits. Germany is +49, two digits, so the longest possible national
> > number (at least the longest that would be accessible from outside
> > Germany) is 13 digits. The longest actually in use is 12.
> Germany ignores the standard. We have successfully dialled longer numbers
> from outside the country... We accept up to 15 for the nsn at the moment.

That's very naughty of them. Those numbers will have odd reachability
problems.

Or part of the number is an extension. Their longest area code is five
digits, so a fifteen digit number would have at least ten digits for the
local part, which seems improbable. If you dial it slowly by hand, do
you get any odd noises at any point?

It would be good if someone who speaks better German than I do could
check with the Bundesnetzagentur.

--
David Cantrell | Bourgeois reactionary pig

Good advice is always certain to be ignored,
but that's no reason not to give it -- Agatha Christie

Lara Rennie

unread,
Nov 27, 2013, 12:34:43 PM11/27/13
to libphonenum...@googlegroups.com


27. nov. 2013 18:30 skrev "David Cantrell" <da...@cantrell.org.uk> følgende:
>
> On Wed, Nov 27, 2013 at 06:03:06PM +0100, Lara Rennie wrote:

> > 27. nov. 2013 17:53 skrev "David Cantrell" <da...@cantrell.org.uk> følgende:
> > > FWIW the longest a number can be, including the country code, is 15
> > > digits. Germany is +49, two digits, so the longest possible national
> > > number (at least the longest that would be accessible from outside
> > > Germany) is 13 digits. The longest actually in use is 12.
> > Germany ignores the standard. We have successfully dialled longer numbers
> > from outside the country... We accept up to 15 for the nsn at the moment.
>
> That's very naughty of them. Those numbers will have odd reachability
> problems.
>
> Or part of the number is an extension. Their longest area code is five
> digits, so a fifteen digit number would have at least ten digits for the
> local part, which seems improbable. If you dial it slowly by hand, do
> you get any odd noises at any point?

Didn't notice. But I'm sure they consider part of it the extension but there is no gap before that part is dialled.


>
> It would be good if someone who speaks better German than I do could
> check with the Bundesnetzagentur.

We had a German speaker ask but they weren't very keen on answering specifically...

>
> --
> David Cantrell | Bourgeois reactionary pig
>
>       Good advice is always certain to be ignored,
>       but that's no reason not to give it            -- Agatha Christie
>

Robert DiFalco

unread,
Nov 27, 2013, 2:10:02 PM11/27/13
to libphonenum...@googlegroups.com
One last question on this. The integer value you store for country in Phonenumber.PhoneNumber, is that a standard mapping of a country code like "US" to an integer or is it a libphonenumber specific mapping?

Thanks!


--
You received this message because you are subscribed to a topic in the Google Groups "libphonenumber-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/libphonenumber-discuss/3ChYlOA8o9Q/unsubscribe.
To unsubscribe from this group and all its topics, send an email to libphonenumber-di...@googlegroups.com.

Robert DiFalco

unread,
Nov 27, 2013, 2:26:15 PM11/27/13
to libphonenum...@googlegroups.com
I see, it is a map of 2 letter ISO country codes to E.164 country code integers. Would love to have a public method to that mappings as I am going to have to recreate it in my code. 

Since you have it, I would love to have a mapping of the code 1 to "US" and from "US" to get 1, etc.  

Joshua Gigg

unread,
Nov 27, 2013, 2:31:05 PM11/27/13
to libphonenum...@googlegroups.com


On 27 November 2013 19:26, Robert DiFalco <robert....@gmail.com> wrote:
I see, it is a map of 2 letter ISO country codes to E.164 country code integers. Would love to have a public method to that mappings as I am going to have to recreate it in my code. 

Since you have it, I would love to have a mapping of the code 1 to "US" and from "US" to get 1, etc.  

--
You received this message because you are subscribed to the Google Groups "libphonenumber-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to libphonenumber-di...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Joshua Gigg

Robert DiFalco

unread,
Nov 27, 2013, 2:33:11 PM11/27/13
to libphonenum...@googlegroups.com
Shit, I don't know how I missed those. Thank you and I apologize for asking a question I should have seen for myself. :)


--
You received this message because you are subscribed to a topic in the Google Groups "libphonenumber-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/libphonenumber-discuss/3ChYlOA8o9Q/unsubscribe.
To unsubscribe from this group and all its topics, send an email to libphonenumber-di...@googlegroups.com.

Lara Rennie

unread,
Nov 27, 2013, 2:38:29 PM11/27/13
to libphonenum...@googlegroups.com

It is standard, in that we use the country calling code you use to dial that country... We have methods if you want to know what it is for a given region too.

Reply all
Reply to author
Forward
0 new messages