Interesting insight's Lin and Atro,
I was thinking about the issue with other languages, there could perhaps be two approaches passive and active keywords.
Passive keyword Identification
- Using simple rules to identify words or delimited phrases perhaps excluding a simple word list.
- This should be easy to implement in any language with a little knowledge of that language and its word sentence structures.
Active keyword identification
- Using more sophisticated textual analysis and keyword databases
- More sensitive to the language in use, potentially third party solutions
- We can consider the temporary installation of a tool for keyword identification, or even a utility wiki for analysis of submitted tiddlers.
- This requires once a keyword is identified save a change to the text or special tiddlers so the tool can be removed from the wiki reducing it's size.
- I would think there should be data sources that are made available after the analysis of languages using big data, that is smaller than the input (te language) to that system but reflects the Machine learning about that language we can use.
- Makes me wonder if grammatical information about words could be used in word smithing tiddlywiki content. eg search for nouns only etc...
lin,
Your links led me to Stop Words are words which do not contain important significance to be used in Search Queries. Usually these words are filtered out from search queries because they return vast amount of unnecessary information. A better definition is provided below:
See attached, the stop word list in a single tiddler, it is quite short.
Regards
Tones