Spell Check and Replace

320 views
Skip to first unread message

Jevon, Graham

unread,
Mar 8, 2021, 9:59:38 AM3/8/21
to openr...@googlegroups.com

Hi

 

Has anyone ever used Open Refine to identify and correct spelling errors? Is there any obvious methods for this?

 

I have datasets from Excel that contain cells that include lengthy sentences. These will often contain spelling mistakes. A version of the word facet that cross referenced the words with a dictionary and only presented unknown words (words not found in a dictionary) is kind of what I am looking for, but that doesn’t appear to exist as a menu option. An even better option would be a function like that, but which also worked a bit like the cluster and edit function – (e.g. a spell check and edit function that lists words not found in a dictionary and offers an easy option to replace them.

 

Before I make a feature request and as this is somewhat of an immediate problem, I thought I’d see if anyone already has a way of solving a similar problem.

 

Open Refine doesn’t seem to recognise pyspellchecker, so it seems may best option at the moment is for me to create a python script for this purpose outside of OR.

 

Thanks

 

Graham


 
******************************************************************************************************************
Experience the British Library online at www.bl.uk
The British Library’s latest Annual Report and Accounts : www.bl.uk/aboutus/annrep/index.html
Help the British Library conserve the world's knowledge. Adopt a Book. www.bl.uk/adoptabook
The Library's St Pancras site is WiFi - enabled
*****************************************************************************************************************
The information contained in this e-mail is confidential and may be legally privileged. It is intended for the addressee(s) only. If you are not the intended recipient, please delete this e-mail and notify the postm...@bl.uk : The contents of this e-mail must not be disclosed or copied without the sender's consent.
The statements and opinions expressed in this message are those of the author and do not necessarily reflect those of the British Library. The British Library does not take any responsibility for the views of the author.
*****************************************************************************************************************
Think before you print

Owen Stephens

unread,
Mar 9, 2021, 7:27:25 AM3/9/21
to OpenRefine
Coincidentally there's been a discussion of spell checking on a Github issue in the last week

My comment there was: "Spellcheck seems like it might be do-able but I feel like it's on the edge of the intended functionality of OpenRefine. My first thought is that an Extension might be the best approach to adding this type of functionality."
The issue also discusses the potential overlap of a "spellcheck" function with NER (although this may depend on the exact use case) The NER extension is available at https://github.com/stkenny/Refine-NER-Extension

Another approach might be to use a spellcheck API (e.g. https://www.microsoft.com/en-us/bing/apis/bing-spell-check-api)

Despite my comment above that I feel spellcheck is on the edge of the intended functionality of OpenRefine I think two independent requests within a week suggests there is some demand for it :) I think a full discussion on what the functionality might look like would be worthwhile - and help us understand what this functionality would look like in OR (whether that's part of the core product or through an extension)

Owen

Tom Morris

unread,
Mar 9, 2021, 1:21:30 PM3/9/21
to openr...@googlegroups.com
> Coincidentally there's been a discussion of spell checking on a Github issue in the last week
> https://github.com/OpenRefine/OpenRefine/issues/3688

I've reopened that issue to provide a place for discussion.

Graham - Please provide details about the form that you'd like to see this take.

Tom

Jevon, Graham

unread,
Apr 1, 2021, 11:19:40 AM4/1/21
to openr...@googlegroups.com
Thanks Tom (and Owen) for flagging this other spell check feature request. It does feel like they were thinking on similar lines.

I've just added a comment outlining my initial thoughts.

Sorry it was a bit delayed. I went on annual leave and didn’t get a chance to look at this until now.

Thanks

Graham

-----Original Message-----
From: openr...@googlegroups.com <openr...@googlegroups.com> On Behalf Of Tom Morris
Sent: 09 March 2021 18:21
To: openr...@googlegroups.com
Subject: Re: [OpenRefine] Re: Spell Check and Replace

> Coincidentally there's been a discussion of spell checking on a Github issue in the last week
> https://gbr01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FOpenRefine%2FOpenRefine%2Fissues%2F3688&amp;data=04%7C01%7C%7C6d506332c31946a9831a08d8e32836db%7C21a44cb7f9c34f009afabd1e8e88bcd9%7C0%7C0%7C637509109138214995%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=P2XnzB81EPfajxu4xzbhWGY4Q9IvzV8ZvuRguL7jhec%3D&amp;reserved=0

I've reopened that issue to provide a place for discussion.

Graham - Please provide details about the form that you'd like to see this take.

Tom

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://gbr01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsgid%2Fopenrefine%2FCAE9vqEE8nYFxu-DFdCs7A4Mx63AsWA%253D4hOjRZ-XdLTSPwMoqyg%2540mail.gmail.com&amp;data=04%7C01%7C%7C6d506332c31946a9831a08d8e32836db%7C21a44cb7f9c34f009afabd1e8e88bcd9%7C0%7C0%7C637509109138214995%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=KHCivpJEWmnBvgjArd6kQIIH5czzYln%2BIPC12utEM78%3D&amp;reserved=0.


******************************************************************************************************************
Experience the British Library online at www.bl.uk<http://www.bl.uk/>
The British Library’s latest Annual Report and Accounts : www.bl.uk/aboutus/annrep/index.html<http://www.bl.uk/aboutus/annrep/index.html>
Help the British Library conserve the world's knowledge. Adopt a Book. www.bl.uk/adoptabook<http://www.bl.uk/adoptabook>
The Library's St Pancras site is WiFi - enabled
*****************************************************************************************************************
The information contained in this e-mail is confidential and may be legally privileged. It is intended for the addressee(s) only. If you are not the intended recipient, please delete this e-mail and notify the postm...@bl.uk<mailto:postm...@bl.uk> : The contents of this e-mail must not be disclosed or copied without the sender's consent.

Antonin Delpeuch (lists)

unread,
Apr 2, 2021, 3:18:41 AM4/2/21
to openr...@googlegroups.com
I am really impressed by your contribution to the discussion, Graham!
The feature request is much clearer and a lot more convincing now. Very
nice!

Antonin
Reply all
Reply to author
Forward
0 new messages