Separate names and ID numbers entered in the same cell

Carolina

unread,

Jul 22, 2022, 6:40:20 PM7/22/22

to OpenRefine

Hello community
I have a string type data column and I am learning how to use OpenRefine for cleaning but I have these two cases:

I would like to apply a function that allows me to remove the id(10 digits) numbers that are before the names of people. For example 1104512456 Maria Juana Perez Martinez Perez.
I used this function replace(value, /\d/,"") but it does not work for the other records, since there are also records that are correct with numbers for example Ministry of education zone 7 or Educational Unit 27 February. Then I want to identify the records that have 10 digits before the person's name and leave it only with the two names and the two surnames.

I also have the case that there are records of only numbers that should not be, what function could I apply to identify this group and eliminate them.

Please could you help me with any ideas or suggestions on how I can do.

Thank you very much
Regards

Thad Guidry

unread,

Jul 22, 2022, 10:26:37 PM7/22/22

to openr...@googlegroups.com

You could try to use a GREL startsWith() function depending on your needs, but I think you are looking more for something that could be used in a Custom Text Facet, so that you can play around with the patterns, or even make multiple Custom Text Facets as you discover more patterns you need to filter with.

So a good starting point for your example might be:

value.partition(/^\d+/)

also you can use a simple Text Filter for quick analysis with:

^\d+

starts (^) with some digits (\d+)

Thad

https://www.linkedin.com/in/thadguidry/

https://calendly.com/thadguidry/

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/a3c34121-8960-4e75-ac32-27decdd90cd1n%40googlegroups.com.

Thad Guidry

unread,

Jul 22, 2022, 10:39:03 PM7/22/22

to openr...@googlegroups.com

After analysis with some Facets which also let's you filter to only view and work against those rows/records, you probably now wish to modify some values.

You can use the same partition() function to do this:

1. Edit Column

2. Add column based on

3. And use value.partition(/^\d+/)[2]

partition the string into an array of parts based on a pattern, and then take only the 3rd part (0,1,2) as the value to use in the new column.

Thad

https://www.linkedin.com/in/thadguidry/

https://calendly.com/thadguidry/

Carolina

unread,

Jul 28, 2022, 9:46:03 AM7/28/22

to OpenRefine

I tried it and it worked for me. Thank you very much for your help

Thad Guidry

unread,

Jul 28, 2022, 9:56:26 AM7/28/22

to openr...@googlegroups.com

You are welcome! Glad that things worked out. Also we have rpartition() if you ever need to work backwards from the end of a string.

Thad

https://www.linkedin.com/in/thadguidry/

https://calendly.com/thadguidry/

To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/068afb73-7985-45ea-b3e5-7a9e66700d13n%40googlegroups.com.

Carolina

unread,

Jul 28, 2022, 9:59:11 AM7/28/22

to OpenRefine

Auyda with refine-client-py with Jupyter Notebook.

Now I want to test the OpenRefine Python Client Library to somehow consume the clustering methods. My question is whether it is currently possible to work with this library from a Jupyter Notebook.
If it is the case please can you help me by telling me what would be the process I should follow to make it work correctly.
At the moment I already have the refine-client-py downloaded and running Jupyter notebook.

Thanks in advance

El viernes, 22 de julio de 2022 a las 21:39:03 UTC-5, thadg...@gmail.com escribió:

Reply all

Reply to author

Forward