Separate names and ID numbers entered in the same cell

27 views
Skip to first unread message

Carolina

unread,
Jul 22, 2022, 6:40:20 PM7/22/22
to OpenRefine
Hello community
I have a string type data column and I am learning how to use OpenRefine for cleaning but I have these two cases:

I would like to apply a function that allows me to remove the id(10 digits) numbers that are before the names of people. For example 1104512456 Maria Juana Perez Martinez Perez.
I used this function replace(value, /\d/,"") but it does not work for the other records, since there are also records that are correct with numbers for example Ministry of education zone 7 or Educational Unit 27 February. Then I want to identify the records that have 10 digits before the person's name and leave it only with the two names and the two surnames.

I also have the case that there are records of only numbers that should not be, what function could I apply to identify this group and eliminate them.

Please could you help me with any ideas or suggestions on how I can do.
 
Thank you very much
Regards
Imagen 1.pngimagen2.png

Thad Guidry

unread,
Jul 22, 2022, 10:26:37 PM7/22/22
to openr...@googlegroups.com
You could try to use a GREL startsWith() function depending on your needs, but I think you are looking more for something that could be used in a Custom Text Facet, so that you can play around with the patterns, or even make multiple Custom Text Facets as you discover more patterns you need to filter with.

So a good starting point for your example might be:

value.partition(/^\d+/)

also you can use a simple Text Filter for quick analysis with:

^\d+

starts (^) with some digits (\d+)



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/a3c34121-8960-4e75-ac32-27decdd90cd1n%40googlegroups.com.

Thad Guidry

unread,
Jul 22, 2022, 10:39:03 PM7/22/22
to openr...@googlegroups.com
After analysis with some Facets which also let's you filter to only view and work against those rows/records, you probably now wish to modify some values.

You can use the same partition() function to do this:
1. Edit Column
2. Add column based on
3. And use value.partition(/^\d+/)[2]

partition the string into an array of parts based on a pattern, and then take only the 3rd part (0,1,2) as the value to use in the new column.

Carolina

unread,
Jul 28, 2022, 9:46:03 AM7/28/22
to OpenRefine
I tried it and it worked for me. Thank you very much for your help

Thad Guidry

unread,
Jul 28, 2022, 9:56:26 AM7/28/22
to openr...@googlegroups.com
You are welcome!  Glad that things worked out.  Also we have rpartition() if you ever need to work backwards from the end of a string.

Carolina

unread,
Jul 28, 2022, 9:59:11 AM7/28/22
to OpenRefine

Auyda with refine-client-py with Jupyter Notebook.

Now I want to test the OpenRefine Python Client Library to somehow consume the clustering methods. My question is whether it is currently possible to work with this library from a Jupyter Notebook.
If it is the case please can you help me by telling me what would be the process I should follow to make it work correctly.
At the moment I already have the refine-client-py downloaded and running Jupyter notebook.

Thanks in advance


El viernes, 22 de julio de 2022 a las 21:39:03 UTC-5, thadg...@gmail.com escribió:
Reply all
Reply to author
Forward
0 new messages