Julien ÉLIE
unread,Aug 10, 2022, 4:29:30 AM8/10/22You do not have permission to delete messages in this group
Sign in to report message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to
Hi all,
INN (and perhaps other servers) has the possibility to provide keywords
in overview data. It advertises "Keywords:full" in response to LIST
OVERVIEW.FMT and then adds "Keywords: a,b,c,d" in OVER responses.
No Keywords header field is added in the articles, and the contents of
an existing one is kept at the beginning of the generated one in overview.
I'm wondering whether:
- it shouldn't be advertised as ":keywords" instead of "Keywords:full"
as the header field is not in the original article.
I am unsure though if such a change would break implementations that
look for it in overview (but is there any such news client? ...)
- and naturally before that, the question of whether the feature should
remain in INN has to be raised.
Currently, it only takes all the words, removes punctuation, removes a
list of "known" words (like pronouns...), strips non ASCII characters
(sic) and lists the sorted result by number of appearances.
This should obviously be improved to be smarter (but it there any need
for that?)
So I would suggest code for generating such basic keywords be removed
unless there's a real current use case behind. (Which does not prevent
from a possible reintegration in the future with a smarter algorithm.)
Here are examples of what is currently generated:
[from latest discussion "Re: naming concept of newsgroups"]
Keywords:
newsgroup,news,net,eagle,eyrie,used,org,hierarchies,questions,wondering,messages,general,periods,always,https,names,taken,think,back,don,dot,www,configuration,distinction,distributed,punctuation,introduced,presumably,processing,lowercase,sethhurst,although,choosing,directly,explains,original,predates,renaming,software,truscott,allbery,analogy,control,current,however,mailing,prevent,colons,daniel,dashes,domain,insead,levels,naming,picked,please,prefix,rather,scheme,trivia,aware,based
[from news.lists.filters!]
Keywords:
message,ncm,begin,body,spam,notice,pgp,googlegroups,spamassassin,signature,pasdenom,signed,usenet,info,com,end,est,lkabxolsadsxuurahpalo,trfvzkfamlybfeacgkqie,fxquozybxsows,hfcozufhlkorn,pothgfqhddwoc,tcbhokunxbviy,probablement,xkcmgcjvkghy,wzykxofxigf,xjjqrzmwsth,iezppdjfhb,referenced,tememrtmfh,utzjdlgunu,vznwaqwahg,akefkllaj,cyimrhktz,following,plmxkyvqo,satellite,xijwqvhwm,zzpginwcr,detected,ethernet,followup,koxyluau,pikaxokm,probably,english,gzkzeqt,headers,reseaux,tdczttb,version
[from a spam...]
Keywords:
drug,www,channel,running,https,com,ucdtdenqhwst,xfzsllvprc,bitchute,exorcist,military,brendon,connell,talpiot,youtube,zealand,mafias,anzus,below,dsfug,endtx,world,best,html,http,runs,bet,new,ops,org
[from an article written in French]
Keywords:
crit,nous,pas,recommencer,comptons,magicien,chaines,chanson,declara,oubliez,cessit,gestes,ubuntu,actes,barri,faire,acte,gump,joli,mage,mais,marc,pour,vous,cet,des,les,lou,non,res,sur,une
Obviously, in messages written in another language than English, the
generation is totally wrong and unusable. And even for English, I am
unsure the generated keywords are really usable (too many of them, and
not enough specific).
--
Julien ÉLIE
« J'ai un copain, il est pilote d'essai… Enfin, il ne l'est pas encore ;
pour l'instant, il essaie d'être pilote ! » (Raymond Devos)