Facet by either is not null or value count within record?

olivia solis

unread,

Feb 19, 2018, 3:03:54 PM2/19/18

to OpenRefine

Hello all,

I'm trying to figure out something that seems like it should be relatively easy. I'm working with a dataset that has inventory items that are supposed to refer to boxes with unique barcodes. (The inventory item representing, e.g. a folder within that box). Alas, this is not always the case, and two different boxes may be listed as having the same barcode, which can't be. To figure out which barcodes have more than one box associated with them, I've moved the barcode to the beginning of the project and blanked down to give each barcode a record.

Right now my project looks like this:

barcode	box #	inventory item
0000001	1	1234
0000002	2	1235
		1236
	3	1237
		1238
0000003	4	1239
		1240

Though I can easily make it look like this:

barcode	box #	inventory item
0000001	1	1234
0000002	2	1235
	2	1236
	3	1237
	3	1238
0000003	4	1239
	4	1240

Is there a way I can create a facet on the "box #" column that either identifies if more than one row within a record is not null (how my project is now) or a facet that tells me if there is more than one value within a record (how my project could be). I want to keep each inventory item #, which has other info associated with it that can be used to figure out whether it is the box number or barcode that is the problem.

Thanks. I'm sure there must be an easy way to do this.

Olivia

olivia solis

unread,

Feb 19, 2018, 3:13:17 PM2/19/18

to OpenRefine

I also should have explained. I also temporarily joined the box number + inventory item columns, did this, and then separated the columns again so that inventory items were sorted by box within a record. Then I split the box and inventory columns, and blanked down the boxes.

Thad Guidry

unread,

Feb 19, 2018, 3:41:52 PM2/19/18

to openr...@googlegroups.com

Just add facets like crazy to get what you want. :) Your not limited to just 1 facet.

Facets -> Customized facets -> Duplicates facet

perform that on your box # column

Facets -> Customized Facet -> Facet by blank

perhaps on your barcode column ?

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

olivia solis

unread,

Feb 19, 2018, 5:27:37 PM2/19/18

to OpenRefine

Thank you, Thad, for your response! Hmmm. Well if you or anyone has any good suggestions, please let me know. I also think I didn't provide a good enough representation of my extremely messy 100k+ record dataset.

barcode	box #	inventory item
0000001	1	1234
0000002	2	1235
	2	1236
	3	1237
	3	1238
0000003	4	1239
	4	1240

0000004	5	1241
	6	1242
	6	1243
0000005	7	1244
	8	1245
0000006	8	1246
	8	1247

What I really want to do is get rid of the records that are fine. i.e. barcodes that only have one box name associated with them. In the above example, those would be 0000001 and 0000003. Faceting by blank on the barcode column (in record mode), which I'd tried, doesn't help much because clicking on "false" would also grab every other record in the set since the line with the barcode is never blank. But that really wouldn't work conceptually anyway since it would also grab 0000003, which is fine. I should have mentioned that a box # may be associated with multiple barcodes. So a duplicates facet wouldn't do the job in the above example, because box 8 appears in 0000005 and 0000006, and the latter is fine. (this is another project, but essentially the same fix)

So, I'm wondering if there is some custom facet that might that tells me if within a record there is more than one box value. Or even a column I could add that would tell me such. I'll keep playing around with the facets to see if I can come up with anything.

Thanks again!

-Olivia

olivia solis

unread,

Feb 19, 2018, 5:32:05 PM2/19/18

to OpenRefine

Sorry, the ones that are good in the third example are 0000001, 0000003, and 0000006!

olivia solis

unread,

Feb 19, 2018, 6:08:42 PM2/19/18

to OpenRefine

OK, so I found the answer to my own question:

http://kb.refinepro.com/2015/12/check-if-value-in-one-record-are-unique.html

This worked!

Thad Guidry

unread,

Feb 19, 2018, 6:33:24 PM2/19/18

to openr...@googlegroups.com

Great !

You probably would like a 1 click button for that Unique Records GREL syntax , yes ?

Added Enhancement
https://github.com/OpenRefine/OpenRefine/issues/1504

olivia solis

unread,

Feb 19, 2018, 7:35:10 PM2/19/18

to OpenRefine

That would be lovely, though I may be the only one!

Ettore Rizza

unread,

Feb 20, 2018, 12:46:08 AM2/20/18

to OpenRefine

I don't understand the sense of forEach() in this formula. This gives exactly the same result:

row.record.cells['box #'].value.uniques().length()

I suspect that Martin's blog post was referring to an older version of OR.

olivia solis

unread,

Feb 20, 2018, 8:44:32 AM2/20/18

to OpenRefine

I think it is old. Looking at the URL of the blog post, it was written in 2015.

Reply all

Reply to author

Forward