Hi Steve
Bascially names are a pain in the ... neck. The problem is that even if the names follow a consistent set of patterns like you've outlined here, you can still get small but significant (in terms of what you want to do) variations within the names. So
Bertillon, Marie-Claude
is a different pattern to both
Bertillon, Phillipe
AND
Bertillion, Chloé
Ettore's solution works well for names like Bertillon, Phillipe, but wouldn't work with Bertillon, Marie-Claude or Bertillion, Chloé
So - for example - if any of the names in your set contain accented or non-latin characters you may wish to try the following variation on Ettore's solution:
value.split(", ")[0] + ", " + value.split(", ")[1].replace(/[\p{Ll} ]/, '')
The:
in the replace statement picks up accented and non-latin lowercase letters, which [a-z] doesn't. This 'replace' is using a "regular expression" - if you haven't come across regular expressions before I'd recommend
http://regex.bastardsbook.com as a starting point - but basically they are a way of matching patterns in text strings (not unique to OpenRefine, but used widely in programming and other software)
However, this approach across the board would lead to Bertillion, Marie-Claude being replaced with Bertillion, M-C which may not be what you want.
A good workflow in OpenRefine is to filter your project to narrow down to a single set of problems which can be tackled together. Usually you do this using a facet or a filter to get to get all the rows that follow a particular pattern/exhibit a particular problem, then use a transformation to fix them. This is often in balance with the complexity of the transformation you need to write - that is to say, the more varied problems you try to fix with a single transformation, the more complex that transformation has to be.
In this case you could try applying a text filter to make sure you only transform names which will be successfully transformed by your transform statement. For example, if you create Text Filter with the case-sensitive and regular expression boxes checked, and use the following expression:
This will find all cells that match the pattern -> Start with any number of characters followed by a comma, then a repeated pattern of: optional space, uppercase char, optionally any number of lowercase chars. Anything that matches this pattern can be transformed using the GREL suggested by Ettore.
If you then 'Invert' this filter (you need to be using OpenRefine 2.8 for this - it's a new feature introduced in 2.8) you can see all the rows that might not work well with Ettore's suggestion, and then try to work out other transformations to fix these variants.
Owen