Hi,
I have been using the ldif single and hadoop instance on a test basis and needed help in selecting the best transformation to use in the linkspec file.
My use case is to match company names and generate sameAs links. In order to do so I need to remove the suffixes in the company names such as Ltd, Co, Corp etc.
I started by using the replace transformation, but i find i can only replace one string using that and not give it a list.
I tried using the removeValues transformation but that does not seem to work at all.
I tused the regexReplace transformation and while this works, it replaces the regex from the entire company name and not just the suffix. So for eg. It would not work for
Colgate Co Ltd. because it would convert it into "lgate" since (Co and Ltd) are regex i have specified.
I tried using tokenize but that reduces the amount of matches i get in the result. Please suggest what I can use.
Best,
Rutvik.