bin/tiger_import vs Geocoder::US::Import::TIGER

41 views
Skip to first unread message

Peter

unread,
May 11, 2010, 12:04:05 PM5/11/10
to GeoCommons Geocoder
Hello,
After I uncovered what looks to be a bug caused by ruby vs non-ruby
metaphone (http://github.com/geocommons/geocoder/issues#issue/17), I
started thinking about how to fix the issue. The problem stems from
having two ways of accessing the data -- through sql and through Ruby.
I know this has been talked about in other contexts (http://
groups.google.com/group/geocommons-geocode/browse_thread/thread/
a85e37e2bc264043), so I was wondering what the driving reason for
having the import take place outside of Ruby is. Using the Ruby
importer would likely solve at least 2 issues, potentially more. It
seems that ideally, bin/tiger_import should be a ruby script similar
to bin/rebuild_metaphones.

All else being equal, I'd rather focus my effort on the Ruby importer,
and fix any issues that might be happening there, rather than worry
about bin/tiger_import. Is there a reason not to do that?

Thanks!
Peter

Peter

unread,
May 28, 2010, 5:09:47 PM5/28/10
to GeoCommons Geocoder
After working with the Ruby import, I'm willing to bet performance and
memory usage were the main reasons to use tiger_import. So to fix the
metaphone issue, I created a simple Ruby C extension that makes the
metaphone function used in libsqlite3_geocoder available in Ruby. This
replaces the metaphone function from the Text gem. I then confirmed
that it fixes the issue I found, since the metaphone results when
geocoding always match those from the import process. Before I go
further down this path (packaging up the C extension and pushing
changes up to my branch), I wanted to ping the idea off the group.
Does this seem like a reasonable way forward?

Thanks,
Peter
Reply all
Reply to author
Forward
0 new messages