Google Refine can be a powerful tool for data cleansing during crisis response

9 views
Skip to first unread message

Chamindra de Silva

unread,
Nov 11, 2010, 10:46:53 PM11/11/10
to humanita...@yahoogroups.com, humanita...@googlegroups.com
I am sure all of you who have worked in the field has had to deal with dirty data, that takes a lot of cleansing before it can be effectively aggregated, analyzed, presented and used in a form for decision making. You get a lot of this type of data coming in spreadsheets where there is little validation on data entry. Web based database systems like Sahana handle this better, but there is still a lot of opportunity for entering data badly. And sometimes it is not dirty data, but rather that the same entity is known by multiple valid terms making it harder to automatically analyze the information and generate reports. In this regard I believe the newly announced Google Refine can be a fantastic tool to improve the productivity of the much needed data cleansing efforts, especially when handling spreadsheets.

"Google Refine is a power tool for working with messy data sets, including cleaning up inconsistencies, transforming them from one format into another, and extending them with new data from external web services or other databases. Version 2.0 introduces a new extensions architecture, a reconciliation framework for linking records to other databases (like Freebase), and a ton of new transformation commands and expressions."

Ref Article: http://google-opensource.blogspot.com/2010/11/… 
Project page: http://code.google.com/p/google-refine/

Checkout the clustering and data transformation functionality.

Chamindra de Silva
http://chamindra-de-silva.blogspot.com | http://twitter.com/ChamindraS
Reply all
Reply to author
Forward
0 new messages