Jason Evans <
jse...@mailfence.com> wrote:
>For the past month, I have been downloading and sorting Usenet archives from
>a news server (with their permission) of everything from 2003 until today.
>My next step is to decide how to upload them to
archive.org.
So you'd be relying upon their indexing and its likely inability to tell
the difference between the article body, the .sig, and headers?
We've already got that. Google indexed Usenet articles as if they were
posted on the Web in the first place as the lousy Google Groups Web
interface was treated like a real Web page. Within Google Groups itself,
searching became seriously hideous because Google stopped devoting staff
resources to making sure the indexes were being maintained. The indexing
services weren't great but they were better than what they became.
An extremely serious problem with Google Groups indexing of the article
body, when it was working, was it didn't do a great job distinguishing
between the author's own text and the quoted text if it was a followup.
Usenet archives lack decent indexes. Is there a way for you to upload a
very small archive, then work on the indexing and presentation of the
articles so it in some way resembles walking the thread tree? Can the
index be developed along with the archive, and then tested tested tested
to avoid another Google Groups?
>. . .
>One final note. In case you're wondering, I am not archiving any binary
>groups or any group that I think could get deleted because of the extremely
>distasteful subject matter. I think you can get my gist about what I mean.
>Everything else is here. Even the stupid spammy revenge froops.
Are you literally saying that you're archiving cancellable spam and
those various smaller-scale attacks on Usenet with articles uploaded by
the thousands from anonymyzing servers that aren't preventing abuse?
Revenge froups weren't any more spammy than any other part of Usenet.
Spam is spam regardless of the newsgroup.