Reconciliation tools

223 views
Skip to first unread message

Andrea Zanni

unread,
Feb 22, 2016, 5:23:53 AM2/22/16
to OpenRefine
Dear all,
I have some questions on OpenRefine [*] and reconciliation.
I'm simply trying to learn how to reconciliate some terms with others.
For example, I have lists and lists of language abbreviations (ita, eng, deu, esp, etc.) and I have a file in which I map them with their Italian translation ("italiano, inglese, tedesco, ecc.).
I don't understand how to do that.

I tried Nomenklatura (http://nomenklatura.okfnlabs.org/) but that public installation doesn't really work. It had APIs, so I considered it was the right way to do that (pudo, the main developer, told me it's possible installing nomenklatura on my own server, but unfortunately I'm a n00b and it's probably too difficult for me).

In the end, I don't understand how to reconciliate with external, online tools, AND with internal mappings, personal csv files or similar.

Do you have any tips, docs or pointers?
Thank you all.

Andrea Zanni
Wikimedia Italia

[*] I use actually use Google Refine 2.5;  I tried to switch to OperRefine 2.6 but it doesn't show buttons and some text. So I downgraded, but I lost all my projects! Now I recovered everything, but it would be important to say something in the docs about recovery from 2.6 to 2.5.... :-)

Joe Wicentowski

unread,
Feb 22, 2016, 8:51:29 AM2/22/16
to OpenRefine
Andrea,

> For example, I have lists and lists of language abbreviations (ita, eng,
> deu, esp, etc.) and I have a file in which I map them with their Italian
> translation ("italiano, inglese, tedesco, ecc.).
> I don't understand how to do that.

If you're trying to reference one list from another list, see the
"cross" function:

https://github.com/OpenRefine/OpenRefine/wiki/GREL-Other-Functions#crosscell-c-string-projectname-string-columnname

> In the end, I don't understand how to reconciliate with external, online tools, AND with internal mappings, personal csv files or similar.

I'm a little unclear about what, exactly, you're trying to do. Could
you give a concrete example?

Joe

Andrea Zanni

unread,
Feb 22, 2016, 10:17:52 AM2/22/16
to openr...@googlegroups.com
If you're trying to reference one list from another list, see the
"cross" function:

  https://github.com/OpenRefine/OpenRefine/wiki/GREL-Other-Functions#crosscell-c-string-projectname-string-columnname

Thanks!
I'll study it.
 
> In the end, I don't understand how to reconciliate with external, online tools, AND with internal mappings, personal csv files or similar.

I'm a little unclear about what, exactly, you're trying to do.  Could
you give a concrete example?

I often have csv in which a column is made of language ISO standards (ita, eng, ger, ...)
I want these to be expanded into full terms, in Italian or in English (italian, english, german, etc.). I figured that this is probably a very common need, so there is some reconciliation API/online tool/whatever available. But I didn't find it.

I also figured that, instead of an online tool, another way was to create a csv myself, with all the terms translated, and find a way to replace all the ISO standards with my own term. Maybe the "cross" function is the way to do it, I'll try it asap.

So, this is what I intended with online reconciliation tool VS offline mapping.
Sorry if it's not clear, but it's difficult for me to explain it also in Italian :-)

Andrea

Joe Wicentowski

unread,
Feb 22, 2016, 10:27:20 AM2/22/16
to OpenRefine
Hi Andrea,

> I also figured that, instead of an online tool, another way was to create a
> csv myself, with all the terms translated, and find a way to replace all the
> ISO standards with my own term. Maybe the "cross" function is the way to do
> it, I'll try it asap.

Yes, I think the cross function should allow you to perform the
function you describe on data you have on your computer. If this
doesn't meet your needs, let us know - and if you mention a specific
source(s) this will help folks here understand the exact need.

Joe

Tom Morris

unread,
Feb 22, 2016, 11:18:24 AM2/22/16
to openr...@googlegroups.com
I agree with Joe's suggestion of using cross() for your immediate need.

For more general reconciliation with personal CSV files, you might want to look at OKFN Lab's reconcile-csv http://okfnlabs.org/reconcile-csv/

On Mon, Feb 22, 2016 at 4:44 AM, Andrea Zanni <zanni.a...@gmail.com> wrote:

[*] I use actually use Google Refine 2.5;  I tried to switch to OperRefine 2.6 but it doesn't show buttons and some text. So I downgraded, but I lost all my projects! Now I recovered everything, but it would be important to say something in the docs about recovery from 2.6 to 2.5.... :-)

Glad you found your projects. We've got this wiki page that describes where the data is and I think there were more explicit directions about co-existence on the mailing list when the 2.6 beta was announced, but these apparently haven't been transferred to the wiki.

In general when installing new versions, one should: a) back up all your data and b) not assume that data formats are backward compatible between versions (in this particular case we didn't change anything between 2.5 & 2.6).

Tom

p.s. The missing buttons that you saw were likely due to issues with our localization code in the early 2.6 alpha/beta releases. We think those have all been fixed, so you might want to try again.  Plus, we even have an Italian translation now, as well as Spanish, French, and Chinese.

Andrea Zanni

unread,
May 10, 2016, 4:47:48 PM5/10/16
to openr...@googlegroups.com
Hi all,
I'm going to revive this thread for a related question.

After a while, I've been using cell.cross(); it doesn't work always (I don't understand why), but when it does is fantastic.
I'm going to try reconcile-csv as well, which seems to be even better.

A question though:
both of there tool seems to work with the "entire value of the cell".
Sometimes (especially for encoding issues), I'm in need to make multiple "search and replace", but the string to be replaced is inside the cell among others. Being a total newbie I kind of work with regex and search and replace multiple times,
but I wonder if it's possible to store a set of transformations in another csv, and use it to "reconcile" the first dataset.
My issue, as I see it, is that I have OR should "search" this strings in every cell, we are not talking about whole values.

Thanks everyone (especially if you understood what I tried to say :-D)

Andrea



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thad Guidry

unread,
May 11, 2016, 11:15:03 AM5/11/16
to openrefine
Andrea,

Unfortunately we have yet to implement a Search Box to enable search and replace.
This has been requested many times and captured here: https://github.com/OpenRefine/OpenRefine/issues/213

Maybe someone on the mailing list knows if one of the OpenRefine extensions has this capability already ?

Andrea Zanni

unread,
May 11, 2016, 11:22:21 AM5/11/16
to openr...@googlegroups.com
Thanks Thad.
Yeah, basically what I'm looking for is a "search and replace" to be used in a selected set of columns.
The only difference, is that I have a long list of items A to search and to replace with a list of items B.

I usually use basic Transforms per column, but it gets tiring and boring really easily when you have more than 100 transformations to do every day by hand :-D

Aubrey

John Little

unread,
May 11, 2016, 11:28:39 AM5/11/16
to openr...@googlegroups.com
Andrea:

I'm not sure I understand your question but, if I do understand, my thoughts are as follows:

I'm not sure why your cell.cross() doesn't work consistently.  It may be data inconsistencies in the source or target project.  Maybe it's the syntax: typing the cell.cross command.  It's easy to make a typo with that long cell.cross command.  You might try loading the VIB-BITS extension which adds a menu driven approach to adding data from other projects.  That could reduce the chance of introducing typos into the cell.cross command.  

I don't know anything about the reconcile.csv extension but it appears you may also be asking questions about that extension?  They have a link to a mailing list so you may want to direct reconsile-csv question directly to them.

With cell.cross you can access strings within the value of the cells.  If you follow this tutorial (from Tony Hirst)...  \

You can chain on the .match or .replace function to the GREL.  

from this:  
cell.cross("Merge Test B", "HESA code").cells["Average Teaching Score"].value[0]  
to:
cell.cross("Merge Test B", "HESA code").cells["Institution"].value[0].match(/.*\b(\w+)\b.*/)[0]

The above function takes the second word from within each full string in the "Institution" column.  By chaining the match() function you can then use all the power of regex within match.  Alternatively you could use replace()

Then I wasn't sure if you are talking about reapplying previous GREL operations.  As to reusing multiple search and replace operations.  You can use operation history (undo/redo) to extract operations, copy those operations out to a JSON file and then apply those operations to other OpenRefine projects.  You can also find your previous GREL transformations in the history tab of the transformation GREL window:  (screen shot)

I hope that helps some.  If not, please feel free to clarify the question.

John

Thad Guidry

unread,
May 11, 2016, 12:15:09 PM5/11/16
to openrefine
Oh gosh, in that case Andrea, you might just want to export to a CSV file...and use an external tool that can support Search and Replace with scripting support across selections.

Myself personally for this type of work I use 2 things:

1. Beyond Compare (which has a cool table mode feature)  http://www.scootersoftware.com/features.php?zz=features_list
2. LibreOffice (specifically Base - similar to MS Access) which can do the virtual lookups and scripted replacing - bit of learning curve but you could do it

Thad Guidry

unread,
May 11, 2016, 12:18:38 PM5/11/16
to openrefine
Andrea,

Do you already have your mapping in a csv file (or could make one?) something like this with your original source name and the new name you want to replace to ?

Original, New

ThadOldName, ThadNewName
Apple, Fuji Apple
Mouse, PC Mouse
Apple Phone, iPhone

Andrea Zanni

unread,
May 11, 2016, 12:27:32 PM5/11/16
to openr...@googlegroups.com
Thanks everyone
for the kind and detailed responses.
I really love OR (I use it everyday, all day long), and the community (you)
is really supportive and helpful.

Tomorrow I'll read what you have written and try the several solutions proposed.

@thad: to your last question, yes, I have that csv. 

Andrea


Ettore Rizza

unread,
May 11, 2016, 3:38:16 PM5/11/16
to OpenRefine
"After a while, I've been using cell.cross(); it doesn't work always (I don't understand why), but when it does is fantastic. "

Hi Andrea,

if you often use the cell.cross() function, i recommend you the Vibits extension, which do the same with a graphical interface : http://data.bits.vib.be/hidden/g7dt6RjuUTU421dY2CwrGePGX/vib-bits.zip

 It works always fine for me. The only and rare cases when it fails, it's because i tried to match a numeric column ID with a string column ID. 
Reply all
Reply to author
Forward
0 new messages