Hi Ed,
well, matching data by people's names is a very difficult task; but it is required to match datasets like this. I'm not 100% certain how the OA team matches up all the data, but I do know how they try to match names: they use this rather gnarly sql/python function:
The first giant block is to normalize the name, and the last few lines are the rather simple way of matching these normalized strings. This basically means that if 2 people have the same first + last name after going through the normalization process, OpenAlex will merge these people into a single Author.
This is something that can definitely be improved, but it's pretty difficult to do! In order to to proper automatic matching you will need to code a long, long, long list of specific exceptions to the rules.
For my own application, I perform some additional cleanup on all data I get from OpenAlex (e.g. all publications with an affiliation to my university) to filter out false positives when I retrieve works for my institution. This is way easier to do than trying to fix the entire OpenAlex dataset, as not only is this a way smaller subset of items, I can also use a master list of affiliated authors from my own university, as well as a list of all known publications we keep track of in our repository. I use that to grab all the items from the OpenAlex dataset that match know items. The remaining results need some more detailed attention, which can be done in various ways. I'm currently quite pleased with the results of using embeddings on normalized string representations of publications and/or authors to match them.
Cheers,
Samuel