[Blog] Geolocation in Pentaho Data Integration

28 views
Skip to first unread message

Pedro Alves

unread,
Jan 10, 2017, 10:58:20 AM1/10/17
to pentaho-...@googlegroups.com

Blog post at http://pedroalves-bi.blogspot.pt/2017/01/doing-geolocation-in-pdi-pentaho-data.html


------


Geo Location


Geo location is something we often need in ETL work. And while we had a step that worked in PDI 5.x and earlier releases, we just noticed it's not currently working.

Until this morning, that is :p

I just forked Matt's initial project and applied the relevant changes to make it compatible with Pentaho 6+

The basics


Well, easy to understand... We have an IP address, we want to know where it comes from!

Geolocation transformation - Let me see if it finds out where I am...

Once I execute this, I get the following result:

Yep, this is where I am...

I am indeed in Porto Salvo, Portugal, so this is right. Can't get any easier than this!


Making it work

So, how to make this work? First, you have to get the plugin from the PDI marketplace

This plugin is available through the marketplace. Just go ahead and install it.

PDI Marketplace - Get your goodies from here

After installing it and restarting PDI, you'll see the GeoIP Lookup step in the lookup folder. Configuring it is straightforward: You point to the stream field containing the IP address, point to the IP database files and specify what fields you want back:

Configuring the step

Getting the IP Database files

You need to get the files from MaxMind, and from my experience these guys do a great job here. They have some great commercial offerings but also a GeoLite database for country and city location. You can get them from here

Getting the GeoIP data files

And you should be done! This even works great in a map reduce job







Reply all
Reply to author
Forward
0 new messages