row indexing and future needs

5 views

Skip to first unread message

Thad Guidry

unread,

Jul 20, 2021, 3:08:31 PM7/20/21

to openref...@googlegroups.com

At one point in the early history of design of OpenRefine, David Huynh had contemplated a few things to consider with Row indexes and Record indexes, which might be thoughtful to review and understand again. I've quoted David's thoughts below for current reflection. The original thread is here: https://groups.google.com/g/openrefine/c/2Zo3ui5wOfk/m/ScPAgJVTWWUJ

I definitely agree that there's a need for accessing other rows. It's a bit tricky to implement in conjunction with faceted browsing, though, because with some facets or filters applied, "the nth preceding row" might mean 2 different things: either the row addressed by subtracting the current row's index by n, or the preceding nth row that has matched the facets and/or filters. The nth *following* row is even harder because we would need to look ahead.

We might need 2 different syntaxes for the 2 meanings

  rows[row.index - 1] // this is the first meaning
  row.neighbors[-1] // this is the second meaning

And what you're asking is yet another meaning, which perhaps we can express as

  row.neighbors.find(10, r, r.cells["column X"].value == 5)
  // look forward up to 10 rows, bind each row to variable "r",
  // evaluate the expression, if it's true, return that row)

I'd like to understand all the use cases we know before committing to this feature. We want the fewest and simplest building blocks that would solve all those cases.

But, note that with however we decide to implement this feature, there is an important complication: these operations that involve looking at other rows cannot be implemented in a map/reduce manner in which rows are processed in parallel.

David

Basically, advanced lookups or searching within a row range (configurable forward/backward) as a condition to evaluate a further expression which if true returns a row within the search range.

I'm not sure if we ever captured that into a feature/issue. But thought we might contemplate it here.

My thoughts are that searches within a row/record range are a more specific faceting mechanism, or could be the basis of an advanced faceting dialog for conditional lookups.

And the beginnings of that might be new functions that allow that range search as David was thinking of.

Thad

https://www.linkedin.com/in/thadguidry/

https://calendly.com/thadguidry/

Reply all

Reply to author

Forward

0 new messages