N-Gram Fingerprint method need to add stop words

0 views
Skip to first unread message

Sergio Letuche

unread,
Nov 18, 2020, 7:13:17 AM11/18/20
to OpenRefine Development
Dear All,

i face some issues with my github account, it does not let me create an issue.
The issue i need to create is the following:

I need to accomplish the same as this issue describes:
https://github.com/OpenRefine/OpenRefine/issues/3200
but for the N-Gram Fingerprint method.
in which part of the code of the N-Gram Fingerprint algorithm, should i make the alteration, like below?
In the line of code that punctuation is removed, I need to add some words like publisher, editor, etc...
https://github.com/OpenRefine/OpenRefine/blob/c76e2b9a461ed5b353ebf5c80e0e0cad2163331c/main/src/com/google/refine/clustering/binning/FingerprintKeyer.java#L93

in the above code line, one can alter the string to be processed, so in there i can add my stop words. But in the N-Gram Fingerprint algorithm, i have not found the line of code that i could make the same with the string, i am looking at the place of code that the punctuation is removed from the string, so there i can add my string alteration with the removal of the stop words.

If you believe it is necessary, please create an issue on github, since my account is flagged.

Do not know why, i just created from my campus network.

Thank you in advance for your kind help,
best
Reply all
Reply to author
Forward
0 new messages