I've searched through the forum here and apologies upfront if this has already been answered:
Is there a data set where a unique integer or universal/unique id has been assigned to a given token with a given POS in the English language?
Has someone already created such a list? Is it updated when new words are added to a given language?
>>> [(token, POS, universal_id)]
For example ...
[('read', VB, 245), ('read', VBD, 246), ('read', N, 247), ('well', RB, 1124), ('well', JJ, 1125), ('well', UH, 1126), ('well', IN, 1127)]
From a naive, uninformed standpoint, it would seem that if a POS tagger had identified the POS for a given token that a universal ID or unique integer could be assigned to that token with that specific POS in a given language ... this universal id/integer would be useful in many other types of software applications.
This is commonly done in custom information retrieval engines (e.g. index, inverted index), but just wondering if someone had already done this. I believe Wordnet did this for synsets ... Thanks,
Gerry