cell.cross

580 views
Skip to first unread message

Andrea Zanni

unread,
Sep 21, 2016, 2:04:39 PM9/21/16
to openr...@googlegroups.com
Is it possible that "cell.cross" doesn't always work properly?
There are times in which I think it should work fine
but it doesn't match. Is it because different projects or different columns have similar names?
I had this issue in 2.5 and now I have it in 2.6.

Andrea

has the wrong code (there's a redundant "[0]"?)



Thad Guidry

unread,
Sep 21, 2016, 6:50:14 PM9/21/16
to openrefine
Hi Andrea,

Yes, many folks have reported that it still exists, for some reason.  It seems that the Source cell... is not always seen by the Cross() function for some reason...in order to compare it to Target cell.
I looked briefly into this last year for a few hours but never isolated the problem.  Unfortunately, neither I or anyone else on the team has looked into the issue further.

I suspect there is some kinda bug lurking somewhere, and perhaps its around our Project Saving (automatically) when the Source cell gets suddenly 'unseen', or perhaps its the 2nd time that the Cross() function is initialized, or somewhere else further downstream.  I'm so busy with my day job that I cannot look into this unfortunately.  But feel free to test around and see if you can isolate the 'when' it happens...that certainly would help us...just being able to reproduce the bug over and over would help us narrow to where this bug is lurking.  You can open a new issue for this, since the older issues don't have much further detail and they are against older versions prior to Beta RC2.

Tom Morris

unread,
Sep 22, 2016, 1:42:00 AM9/22/16
to openr...@googlegroups.com
The doc bug, if it's that (I didn't look) sounds like something that could be fixed with a simple, easily merged, pull request.

A potential cross bug is potentially more difficult to isolate. First of all, cross() behaves differently than pretty much every other function in that it takes a cell reference rather than a cell value, which causes a lot of user confusion. Second, the results of the multi-project join are cached for performance reasons, so cache bugs could introduce wonky behavior. <== technical term

As in other parts of the software universe, a solid, reproduceable test case could make this much easier to fix. Bonus points (ALL the points) for a test case which is valid across Refine server restarts and other sources of inconsistency.

Tmo

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Andrea Zanni

unread,
Sep 22, 2016, 4:04:26 AM9/22/16
to openr...@googlegroups.com
On Thu, Sep 22, 2016 at 7:41 AM, Tom Morris <tfmo...@gmail.com> wrote:
The doc bug, if it's that (I didn't look) sounds like something that could be fixed with a simple, easily merged, pull request.

Of course, they are 2 different things.
In blogposts and on this mailing list I've seen

 cell.cross("Other project", "column of other project").cells["new column"].value[0]

but the doc put a [0] more

 cell.cross("Other project", "column of other project")[0].cells["new column"].value[0]
Didn't request a pull because I'm not sure is an error or not.
 

A potential cross bug is potentially more difficult to isolate. First of all, cross() behaves differently than pretty much every other function in that it takes a cell reference rather than a cell value, which causes a lot of user confusion. Second, the results of the multi-project join are cached for performance reasons, so cache bugs could introduce wonky behavior. <== technical term

As in other parts of the software universe, a solid, reproduceable test case could make this much easier to fix. Bonus points (ALL the points) for a test case which is valid across Refine server restarts and other sources of inconsistency.

 
I've never been able to isolate a test case or a inconsistent behaviour.
In the past I change column and project names, and at least once it solved the problem.

Andrea

Owen Stephens

unread,
Sep 22, 2016, 4:22:41 AM9/22/16
to OpenRefine
On the documentation I believe you are correct that there is an extra [0] in the expression, but the position of the [0] can be varied (and to some extent the use of the [0] here might be questioned). 

The cross function returns an array of row objects. To get out a single value that can be stored in the cell, this array needs to be processed somehow. One option is simply to take the first result returned by the cross function - this is what using [0] does. However, because both the .cells and .value operators act on all items in the array the [0] can appear in different positions. This means the following are all equivalent:

cell.cross("Other project", "column of other project")[0].cells["new column"].value
cell.cross("Other project", "column of other project").cells["new column"][0].value
cell.cross("Other project", "column of other project").cells["new column"].value[0]

If you add an additional [0] after .value in the first two expressions above (as in the documentation) you end up simply selecting the first character of the value - which isn't usually what is wanted.

I've updated the documentation to use the first of these forms as it makes sense to me to process the array before applying further transformations - this then makes the shift to other options for processing the array (e.g. wrapping the cross function in a forEach or filter function) more intuitive IMO

Owen


To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.

Andrea Zanni

unread,
Sep 23, 2016, 9:40:37 AM9/23/16
to openr...@googlegroups.com
After trying different times ,
I resolved... with an extension.

Following this old post [1]
I re-discovered the VIB-BITS OpenRefine extension, which
"Add column(s) from other projects..." function is exactly what I need.
It's also more simple than GREL syntax, and more user friendly
(it's easier to understand which columns to match, from which project, and which columns to add).   

As much as I love GREL it sometimes can be tricky :-)

Maybe we want to link the extension from the "cross.cell" section in the wiki?

A.

To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+unsubscribe@googlegroups.com.

Thad Guidry

unread,
Sep 23, 2016, 1:11:09 PM9/23/16
to openrefine

raja kumar dash

unread,
Oct 18, 2016, 4:03:19 AM10/18/16
to OpenRefine
I noticed a scenario related to cell.cross not working for me on occasion:

If I export a refine project (or both) from one computer to another (in this case, always Mac to Mac, but sometimes different OS X versions), cross does not work numeric values.

This has happened to me three times:

1. Exported Google Refine project from iMac running Yosemite to an old 17" MacBook Pro running either Yosemite or earlier (possibly Snow Leopard).

2. Exported Google Refine project from the same iMac to a new 15" MacBook Pro running El Capitan and OpenRefine 2.6 beta 1

3. Exported OpenRefine 2.6 beta 1 project from new Mac Mini running El Capitan to the same 15" MBP (e.g., both with the same configuration)

(I did not test these situations with my Linux and Windows 10 installations of Open Refine.) What I was doing was trying to cross-join on a standardized college ID (from http://nces.ed.gov/collegenavigator). I first tried with the string versions, first making sure to trim() and strip() spaces in both projects, and sorted the target project. No luck. I changed the ID toNumber() in both projects. No luck. I tried with the college names and it worked just fine. I had this exact same problem in the other scenarios listed, always on the ID, if either or both of the projects were imported from another Mac. If I create two projects on the same Mac, I can join them.

Now here's how I solved it, just this very minute. I created a duplicate column of the college id values in both projects and converted them toString(). Then I tried to join on these two new columns created on the same computer, and it lo and behold, it worked perfectly. 

To further test my premise (numeric values don't join from projects created on other computers, whether in text or numeric mode), I then converted both new columns to numbers, did the join again, and verified that both sets of restults are identical.

Not sure if this helps you. If I knew more Java (and had more than 10 free minutes per week), I'd try to hunt down the problem in the code. Unfortunately, I can't help there.

Thad Guidry

unread,
Oct 18, 2016, 9:57:07 AM10/18/16
to OpenRefine
Thanks for this update Raia !  I will add these notes to a Github issue for tracking.
We can continue discussion in there.

raja kumar dash

unread,
Oct 18, 2016, 11:50:07 AM10/18/16
to OpenRefine
Thanks for posting it as an issue to Github, Thad.


Reply all
Reply to author
Forward
0 new messages