International number validation

299 views
Skip to first unread message

Matt Shoemaker

unread,
Feb 28, 2017, 4:39:21 AM2/28/17
to OpenRefine
I am new to using OpenRefine and have found it useful for applying a template to imported CSV data and being able to easily manipulate known data set transformations. I am currently running into an issue on how to clean up a phone numbers column. It has various numbering formats in the column such as 15555555555, 5555555555 or international numbers as straight numbers. I am wondering if there is a way to implement Twilio's lookup (http://https://www.twilio.com/lookup) to format and validate the list of telephone numbers to a specific format. I have attempted to use their code on site in conjunction with the "Edit Column -> Add column by fetching urls". I always get an empty cell. 

"https://lookups.twilio.com/v1/PhoneNumbers/" + Value+"?Type=phone_number"

It maybe something simple, to reconcile these numbers that I'm missing. Alternatively if another project like Python-Phonenumbers or Google Libphonenumbers can be implemented as an extension. I know that trying to use regex on various formatting of phone numbers is tedious and error prone and I am hoping to be able to implement one of the above methods.

Thanks,
Matt

Owen Stephens

unread,
Feb 28, 2017, 5:26:52 AM2/28/17
to OpenRefine
Hi Matt,

I'm not familiar with the Twilio lookup but having a quick look now it seems that you need to signup and authenticate to use the service? It looks like Twilio supports "basic auth" (https://www.twilio.com/docs/api/rest/request) which means you should be able to use a URL with the form:


So once you have signed up you could try 'Add column by fetching URLs' with something like:


Note that there is no charge for a Format lookup, but other information carries a per-request charge

Using the libraries that you mention makes sense as an alternative, but I'm afraid I don't have any advice on how this could be done.

There is an extension that does telephone number extraction, but I can't currently get this to work, and I'm not sure it is what you need https://github.com/giTorto/extraCTU-plugin

Owen

Thad Guidry

unread,
Feb 28, 2017, 10:47:18 AM2/28/17
to openr...@googlegroups.com
Hi Matt,

I know a lot of folks struggle with this, so I decided to instead take some time to write up how I and others sometimes deal with lots of advanced scary things like parsing phone numbers in OpenRefine and even outside of it (you'd think it be so simple, I know !) 

I've written up a step by step tutorial on our Wiki to show you some advanced techniques in OpenRefine that you and others might enjoy.


Possible enhancement to OpenRefine could eventually move some of that libphonenumber library into GREL itself and make it easier to work with phone numbers parsing and formatting...but its not that hard with existing Jython and the same expression input box..

but let us know what you think.

Owen Stephens

unread,
Feb 28, 2017, 11:05:46 AM2/28/17
to OpenRefine
Thanks Thad - this is a brilliant addition to the documentation - I'd previously seen the statement on this wiki page "you can even import Java libraries" but never knew how to actually do this.

This is great and I'm sure will prove useful in many other scenarios.

Best wishes

Owen
Reply all
Reply to author
Forward
0 new messages