How to go directly to row number x?

271 views
Skip to first unread message

Patrick

unread,
Feb 7, 2013, 12:01:27 PM2/7/13
to openr...@googlegroups.com
Hi, 
I'm building a dictionary of names with OpenRefine. One facet returns say 3000 matching rows. I have to scan every row of each segment to see if the match proposed by the reconciliation service is ok or not. The problem is the following: as soon as I accept the match proposed by Freebase, the list is refreshed and starts again from row 1 (my facet excludes the matched rows). I have to manually go to the row where I was at the time I accepted the match. It's very annoying and I was wondering if it exists a way to avoid that behavior or a way to go directly to my row (say 1500) without having to click the button Next many times.
Thanks a lot,
Patrick

Thad Guidry

unread,
Feb 7, 2013, 12:12:28 PM2/7/13
to openr...@googlegroups.com
From our documentation:


There is also the "judgment" facet, which lets you filter for the cells that haven't been matched (pick "None" in the facet). As you process each cell, its judgment changes from "None" to "Matched" and it disappears from the view, because it no longer fits the facet's selection.



--
You received this message because you are subscribed to the Google Groups "Open Refine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
-Thad
http://www.freebase.com/view/en/thad_guidry

Tom Morris

unread,
Feb 7, 2013, 12:19:42 PM2/7/13
to openr...@googlegroups.com
On Thu, Feb 7, 2013 at 12:01 PM, Patrick <agin.p...@gmail.com> wrote:
I'm building a dictionary of names with OpenRefine. One facet returns say 3000 matching rows. I have to scan every row of each segment to see if the match proposed by the reconciliation service is ok or not. The problem is the following: as soon as I accept the match proposed by Freebase, the list is refreshed and starts again from row 1 (my facet excludes the matched rows). I have to manually go to the row where I was at the time I accepted the match. It's very annoying and I was wondering if it exists a way to avoid that behavior or a way to go directly to my row (say 1500) without having to click the button Next many times.

There isn't a way to jump directly to an entry.  It might be more user friendly if we maintained your position in the list for small edits like this, but would require a special optimization path (it obviously wouldn't make sense for large scale edits/transformations).

The way to deal with this currently is to organize your workflow so that you don't need to do this.  This could mean using more facets; for example, I typically review things in chunks starting with the highest match score.  It could mean removing things from view by selecting "Create new topic" for those that you're sure don't reconcile.  Basically, whatever it takes so that you're always working on the first page of results.

Tom

Patrick

unread,
Feb 7, 2013, 2:38:26 PM2/7/13
to openr...@googlegroups.com
Thanks Thad and Tom for your answers. 
@Thad: I already filtered the rows with the judgment facet (and picked the "None" rows). The problem is that as soon as I choose to match one row (say the 2354th), the list is refreshed and I'm back to row 1.
@Tom: I understand your point. My facet filters for the names that contain no space (ex: Madonna, Rockefeller, Bono, Smith, etc.). I could add a facet to filter for the names whose length is x characters so the list is not too long. Is this what you meant?

Tom Morris

unread,
Feb 7, 2013, 3:13:36 PM2/7/13
to openr...@googlegroups.com
On Thu, Feb 7, 2013 at 2:38 PM, Patrick <agin.p...@gmail.com> wrote:

@Tom: I understand your point. My facet filters for the names that contain no space (ex: Madonna, Rockefeller, Bono, Smith, etc.). I could add a facet to filter for the names whose length is x characters so the list is not too long. Is this what you meant?

Yes.  Any facet which reduces the number of choices will work, but since it's a reconciliation task, I like to use filters related to that.  Reconcile->Facets->Best Candidate's Score is one that I use a lot.  My setup usually looks something like this:

Inline image 1 
image.png

Patrick

unread,
Feb 7, 2013, 3:34:34 PM2/7/13
to openr...@googlegroups.com
Thank you again Tom, I understand the rationale, even if I don't see the image you attached in your last post :)


Le jeudi 7 février 2013 12:01:27 UTC-5, Patrick a écrit :

Martin Magdinier

unread,
Feb 7, 2013, 11:10:04 PM2/7/13
to openrefine


> @Thad: I already filtered the rows with the judgment facet (and picked the "None" rows). The problem is that as soon as I choose to match one row (say the 2354th), the list is refreshed and I'm back to row 1.

This is not the first time this issue is raised (see issue 571.) Should we investigate further how refine can support this use case?

Thad Guidry

unread,
Feb 8, 2013, 10:06:09 AM2/8/13
to openr...@googlegroups.com
I'm not sure where Bug #33 went during the Github transition, but : http://code.google.com/p/google-refine/issues/detail?id=33

This issue is addressed there.

David and I have had a chat about this in the past; actually many times.

We came to the conclusion that, for some changes, you cannot remain on the current page because such changes disturb the whole structure of the project.  We need a future way to distinguish between structural changes and local changes.

The stopgap solution in that bug does work in most scenarios.  I had also offered my idea of having an alert popup to notify the user after their first match click that..." Hey, you might want to click _none_ here, in order to see only your remaining unmatched items".  And there are lots of little hint bubbles that probably need to be thrown up during a first run of a reconcile project, and have a preference to enable or disable Reconciling Hints. :) 

Anyways....2.6 needs to get released before lots of other things.



On Thu, Feb 7, 2013 at 10:10 PM, Martin Magdinier <martin.m...@gmail.com> wrote:


> @Thad: I already filtered the rows with the judgment facet (and picked the "None" rows). The problem is that as soon as I choose to match one row (say the 2354th), the list is refreshed and I'm back to row 1.

This is not the first time this issue is raised (see issue 571.) Should we investigate further how refine can support this use case?

--
You received this message because you are subscribed to the Google Groups "Open Refine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Thad Guidry

unread,
Feb 8, 2013, 10:12:53 AM2/8/13
to openr...@googlegroups.com

@Thad: I already filtered the rows with the judgment facet (and picked the "None" rows). The problem is that as soon as I choose to match one row (say the 2354th), the list is refreshed and I'm back to row 1.

Did you remove all facets after the reconcile, and then enable the Judgement facet first ? (You can reorder facet panels by clicking and dragging them into the order of precedence for filtering your rows, btw.)

--
-Thad
http://www.freebase.com/view/en/thad_guidry

Tom Morris

unread,
Feb 8, 2013, 1:09:26 PM2/8/13
to openr...@googlegroups.com
On Fri, Feb 8, 2013 at 10:06 AM, Thad Guidry <thadg...@gmail.com> wrote:
I'm not sure where Bug #33 went during the Github transition, but : http://code.google.com/p/google-refine/issues/detail?id=33


All issues up to 400 or 500 or so have the same number.  There was a deleted issue on Google Code that causes them to be off by one after that.  Issue #33 is here https://github.com/OpenRefine/OpenRefine/issues/33

We came to the conclusion that, for some changes, you cannot remain on the current page because such changes disturb the whole structure of the project.  We need a future way to distinguish between structural changes and local changes.

Yes, that's what I mean when I wrote "more user friendly if we maintained your position in the list for small edits like this, but would require a special optimization path (it obviously wouldn't make sense for large scale edits/transformations)."

Having said that, there's an additional inconsistency because some forms of reconciliation (e.g. selecting one of the available recon choices) don't reset the facets while others (e.g. manually searching for new topic) do reset the facets.

Tom

Patrick

unread,
Feb 8, 2013, 3:58:01 PM2/8/13
to openr...@googlegroups.com
It also happens with the judgment as the only facet selected. If my rows are those ones that are not matched ("None" selected), my list is refreshed as soon as I match one row and I'm back to row #1.

David Huynh

unread,
Feb 8, 2013, 4:09:59 PM2/8/13
to openr...@googlegroups.com
You could also flag or star each row that you want to perform the same kind of action, and then once you've gone through all rows, then filter by flag or star and then apply the action.

David

--

Thad Guidry

unread,
Feb 8, 2013, 4:11:23 PM2/8/13
to openr...@googlegroups.com
Right Patrick,

We understand.  The use case is different.  Manually reconciling entities in a column, and working through that one by one.  Your skipping over some entities, for whatever reason or logic, and want to concentrate on the entities further down the grid, like on page 5 or page 12.  It's the logic or reason that you have, that we need a way to deal with easily.  Perhaps it's the use of a flag, or setting a flag based on your logic and using a custom facet to apply that flag or star.  Dunno without knowing your reasoning for skipping some of those rows and advancing your manual reconcile process.


--
You received this message because you are subscribed to the Google Groups "Open Refine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Patrick

unread,
Feb 11, 2013, 12:12:58 PM2/11/13
to openr...@googlegroups.com
Thanks Thad and sorry about the delay. I'm not sure though I understand what you're saying (english is not my first language) : do you make a general statement about this annoying behavior or are you expecting an answer from me?
Patrick

Patrick

unread,
Feb 12, 2013, 9:31:39 AM2/12/13
to openr...@googlegroups.com
Maybe the question is stupid but how do people do this kind of reconciliation work? I didn't have the feeling that I do something very special or use a special reasoning. I'm only scanning a huge list of names, trying to reconcile them. As soon as I accept the candidate proposed by the reconciliation service, it sends me back to the top of the list. How do people proceed to do this?
Patrick

Thad Guidry

unread,
Feb 12, 2013, 9:35:55 AM2/12/13
to openr...@googlegroups.com
Patrick, 

For me, working from the top of the list going down is acceptable and that's how I do it.

For you, there is a reason that you do not want to work from the top of the list.  What is that reason ?

Patrick

unread,
Feb 12, 2013, 10:02:28 AM2/12/13
to openr...@googlegroups.com
I'm working from the top of the list Thad. The point is that (and sorry if it was not clear since the beginning), as work is progressing, I process rows that are far from the top (say row r=765). If I accept the candidate for row 765, I will be sent to row 1 and find it wearisome to have to click Next many many times to return to row r+1 (766). If I accept the candidate for 766 too, I'll be back again to row 1 and will have to click again many many times to return to row 767 etc etc.

Thad Guidry

unread,
Feb 12, 2013, 10:23:07 AM2/12/13
to openr...@googlegroups.com
Are you working in Records mode ? or Row mode ?

Patrick

unread,
Feb 12, 2013, 11:10:49 AM2/12/13
to openr...@googlegroups.com
Row mode.

Thad Guidry

unread,
Feb 12, 2013, 11:42:42 AM2/12/13
to openr...@googlegroups.com
That might explain things... perhaps you really want to use Record mode ?  perhaps since your facets might be ordering against record/rows ... rather than just rows ?  Could you get a screenshot of it ?

Tom Morris

unread,
Feb 12, 2013, 12:03:56 PM2/12/13
to openr...@googlegroups.com
On Tue, Feb 12, 2013 at 10:02 AM, Patrick <agin.p...@gmail.com> wrote:
I'm working from the top of the list Thad. The point is that (and sorry if it was not clear since the beginning), as work is progressing, I process rows that are far from the top (say row r=765). If I accept the candidate for row 765, I will be sent to row 1 and find it wearisome to have to click Next many many times to return to row r+1 (766). If I accept the candidate for 766 too, I'll be back again to row 1 and will have to click again many many times to return to row 767 etc etc.


I think what people are suggesting (at least I am) is that you organize your work flow so that you *ARE* always working at the top of the list.

As you consider each item, either reconcile it or flag it as new.  If you really want to keep a "later" or "maybe" bin, use the star/flag facets and flag/star the items that you're processed and use the facet to exclude them from the display.

Does that make sense?

Tom 

Patrick

unread,
Feb 12, 2013, 12:37:32 PM2/12/13
to openr...@googlegroups.com
Yes I could work with the flags. The workflow would be:

Starting at row 1, analyze each item and flag it if it requires some post-processing.
When 50 items are flagged (to be sure they all fit on one page), filter the rows by flag and process all the flagged items.
Return to the original list. Click Next many times to return to the row where I was when I flagged the 50th item.

It would reduce the number of times I have to click to return to the current row but it's a little bit annoying, isn't it?

I could also add a facet to filter names that begin with A (or names that have length=x, etc)... Wearisome too.

I'v included a screenshot of my first page, records mode or row mode, it's the same anyway.

Thanks again for your help guys,
Patrick
screenshot OpenRefine.png

Tom Morris

unread,
Feb 12, 2013, 2:38:18 PM2/12/13
to openr...@googlegroups.com
You aren't taking full advantage of the facets.  You can have as many facets as you want and combine them in different ways to filter down to a small range of rows.

I'd start by adding the score facet that I mentioned before (Reconcile->Facets->Best Candidate's Score).  Uncheck the "error" box and it will get rid of all the entries like Hervé Témime who have no candidate listed as all.  If you then narrow the selected score range to something like 90-100, you can reconcile all the high confidence matches quickly.  Then move the selection to 85-90 or whatever chunks work best.

If that's still not specific enough, perhaps use the "Best candidates edit distance" facet or one of the other facets.

As a last resort, use the star facet filtered to "not starred" and then star every entry that you've looked at.

Tom

--

Patrick

unread,
Feb 12, 2013, 3:58:45 PM2/12/13
to openr...@googlegroups.com
That's what I do (process items by score range, ex:90-100) but I have to add facets (best candidates edit distance, flags, etc) to narrow the result (I have thousands and thousands of names). Don't you think it would be easier to check items one by one on the entire list if the "back to row 1" problem does not occur? Having to add facets is a little bit tiring. 

By the way, do not misunderstand me: OpenRefine is a great tool! :)
Reply all
Reply to author
Forward
0 new messages