For those revisiting this thread in the future, I resolved my issue with merging directors' data by utilizing the SPlink Python package. SPlink efficiently merges datasets based on multiple criteria such as CUSIP, year, and director names. It performed well, especially with exact matches on CUSIP and year, while probabilistically matching names between the Compustat and RiskMetrics datasets. Despite the initial learning curve, SPlink’s comprehensive documentation, highlighted by this pivotal guide, proved invaluable.
To enhance matching efficiency, I split full names into first, middle, and last components and created a concatenated 'cy' column from the CUSIP and year fields, which I used for blocking in SPlink. This strategy significantly reduced the iterations required during the matching process. My SPlink configuration included settings for linkage type and blocking rules, along with a series of comparisons and deterministic rules based on Jaro-Winkler similarity for robust name matching.