Provide the name of the keying function and the number of clusters

231 views
Skip to first unread message

Dawayne eason

unread,
Jan 26, 2021, 3:21:26 PM1/26/21
to OpenRefine
I have been asked to:
  Select the column name and apply the Text Facet (Facet → Text Facet). Cluster by using (Edit Cells → Cluster and Edit …) this opens a window where you can choose different “methods” and “keying functions” to use while clustering. Choose the keying function that produces the smallest number of clusters under the “Key Collision” method. Click ‘Select All’ and ‘Merge Selected & Close’ 

I'm performing these functions in Open Refine and the only other option under "Key Collision" is "Nearest Neighbor".  Am I supposed to try each option and see the Clusters Found on the right hand corner before merging? 

Owen Stephens

unread,
Jan 27, 2021, 5:57:38 AM1/27/21
to OpenRefine
Hi Dawayne,

I'd recommend taking a look at the documentation on using the Clustering functions in OpenRefine as a starting point 

This should help you understand how the clustering and methods work. Once you've had a look at that feel free to ask further questions about clustering here.

Best wishes

Owen

Tom Morris

unread,
Jan 27, 2021, 5:47:36 PM1/27/21
to openr...@googlegroups.com
I'll second Owen's recommendation, but I'm curious about:

On Tue, Jan 26, 2021 at 3:21 PM 'Dawayne eason' via OpenRefine <openr...@googlegroups.com> wrote:
I have been asked to:
 
Who is asking you to do this? A teacher? A client?

Tom

Dawayne Eason

unread,
Jan 27, 2021, 7:08:26 PM1/27/21
to openr...@googlegroups.com
I'm doing a certification course with my job and I don't have much experience with Open Refine.  So this is why I was asking.


--
You received this message because you are subscribed to a topic in the Google Groups "OpenRefine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openrefine/V87nU4GtGmg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/openrefine/CAE9vqEFJxgPbNRyw%2BSaPe1a0dUR3fn4AMA4RctiBvYs9drXyQw%40mail.gmail.com
.

Dawayne eason

unread,
Jan 27, 2021, 9:24:28 PM1/27/21
to OpenRefine
I appreciate the link as I actually found that yesterday and it has been a great help.  My only now is I've completed the work. I'm getting back some errors.  If anyone has any reason why please let me know.  I can add more context if needed.
  
1. I am tasked to split a column labeled "animals" into multiple columns without removing the original column.  I did so by adding the "/" as the seperator, but I'm getting a message saying   "original column for "animal" was removed".  I see there were 7 more columns formed, but I still have the original one, but I'm not sure what is the issue.

2. I am tasked to  Replace the null values in the "food" column with the text “Unknown” (Edit Cells - > Transform).  So I used the formula if(value==null,"Unknown",value), but I get an error saying "All blank rows were not replaced with Unknown. You could also have an incorrect amount of Unkown values or incorrect number of rows in your .csv"  I thought that was the correct formula.

3. The last task I'm asked to  Create a new column "new price" with the values 0 or 1 based on the item_description column with the following conditions: If it contains the text “deal” or “offer” or “spend”, then set the value in "new price" as 1, else 0. Provide the GREL expression used to perform this. Convert the text to lowercase before you search for the terms.  So i converted the text tolowercase and the formula I ran was:
If(or(value.contains('deal'),value.contains('offer'),value.contains('spend')),1,0).  I thought that was correct but I keep getting the error: New Price column values were not created after converting csv values to lowercase.  

I have gotten some credit for my work, but I'm trying to see why these three are still showing an error.  Any ideas are appreciative.

Owen Stephens

unread,
Jan 28, 2021, 10:16:34 AM1/28/21
to OpenRefine
> 1. I am tasked to split a column labeled "animals" into multiple columns without removing the original column.  I did so by adding the "/" as the seperator, but I'm getting a message saying   "original column for "animal" was removed".  I see there were 7 more columns formed, but I still have the original one, but I'm not sure what is the issue.

The things I'd be checking:
* Using "Edit columns" -> "Split into several columns..." function
* Using the appropriate separator
* Selecting the correct options headed "After splitting" on the dialogue to split the column

> 2. I am tasked to  Replace the null values in the "food" column with the text “Unknown” (Edit Cells - > Transform).  So I used the formula if(value==null,"Unknown",value), but I get an error saying "All blank rows were not replaced with Unknown. You could also have an incorrect amount of Unkown values or incorrect number of rows in your .csv"  I thought that was the correct formula.

Things I'd be checking
* Is it definitely `null` or `blank` (null or empty string) that need to be replaced?
* Make sure no facets or filters applied when I ran the formula
* Depending if I'm looking for 'null' or 'blank' I'd use the isNull(value) or isBlank(value) functions rather than value==null

> 3. The last task I'm asked to  Create a new column "new price" with the values 0 or 1 based on the item_description column with the following conditions: If it contains the text “deal” or “offer” or “spend”, then set the value in "new price" as 1, else 0. Provide the GREL expression used to perform this. Convert the text to lowercase before you search for the terms.  So i converted the text tolowercase and the formula I ran was:
If(or(value.contains('deal'),value.contains('offer'),value.contains('spend')),1,0).  I thought that was correct but I keep getting the error: New Price column values were not created after converting csv values to lowercase.  

Things I'd be checking:
* Make sure no facets or filters applied when I ran the toLowercase formula
* Make sure no facets or filters applied when I ran the GREL to add a new column

Without seeing the original data, or understanding how the marking is being done that's the best I can do

Dawayne eason

unread,
Jan 29, 2021, 11:39:04 AM1/29/21
to OpenRefine
Thank you so much for the directions and feedback.  This is what I did per the three areas I was not getting correct.

1. I am tasked to split a column labeled "animals" into multiple columns without removing the original column.  I did so by adding the "/" as the seperator, but I'm getting a message saying   "original column for "animal" was removed".  I see there were 7 more columns formed, but I still have the original one, but I'm not sure what is the issue.

The things I'd be checking:
* Using "Edit columns" -> "Split into several columns..." function
* Using the appropriate separator
* Selecting the correct options headed "After splitting" on the dialogue to split the column

So I re-did this and the piece I did incorrectly was you have to uncheck the box which says, "remove original column" when you are updating the separator.  So I did that and it was correct.

 2. I am tasked to  Replace the null values in the "food" column with the text “Unknown” (Edit Cells - > Transform).  So I used the formula if(value==null,"Unknown",value), but I get an error saying "All blank rows were not replaced with Unknown. You could also have an incorrect amount of Unkown values or incorrect number of rows in your .csv"  I thought that was the correct formula.

Things I'd be checking
* Is it definitely `null` or `blank` (null or empty string) that need to be replaced?
* Make sure no facets or filters applied when I ran the formula
* Depending if I'm looking for 'null' or 'blank' I'd use the isNull(value) or isBlank(value) functions rather than value==null

I applied the if(value==null, "Unknown", value) and I tried the IsNull(value) and I still got the same feedack: "ll blank rows were not replaced with Unknown. You could also have an incorrect amount of Unkown values or incorrect number of rows in your .csv"  So I'm still not sure why.

> 3. The last task I'm asked to  Create a new column "new price" with the values 0 or 1 based on the item_description column with the following conditions: If it contains the text “deal” or “offer” or “spend”, then set the value in "new price" as 1, else 0. Provide the GREL expression used to perform this. Convert the text to lowercase before you search for the terms.  So i converted the text tolowercase and the formula I ran was:
If(or(value.contains('deal'),value.contains('offer'),value.contains('spend')),1,0).  I thought that was correct but I keep getting the error: New Price column values were not created after converting csv values to lowercase.  

Things I'd be checking:
* Make sure no facets or filters applied when I ran the toLowercase formula
* Make sure no facets or filters applied when I ran the GREL to add a new column

I check the things you advise to check and still get the same error: "column values were not created after converting csv values to lowercase."  So still not sure, but I did better this time then the last.

Thank you so much for all your advice and help.
Reply all
Reply to author
Forward
0 new messages