Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

A script to download our archives?

4 views
Skip to first unread message

Morgoth's Curse

unread,
Jun 9, 2009, 12:30:59 AM6/9/09
to
I just received a message from AT & T notifying me that Usenet service
will be eliminated completely on July 15. It is yet another nail in
the coffin of the Usenet, but it caused me to ponder the fearful
question of just what would happen if Google decided to discard our
archives.

Someone once suggested that I could write a script to download all of
the threads for the Tolkien newsgroups. I am not really that
proficient with computers and would prefer something that I can
download and install. Does anyone know where I might obtain such a
program? Thanks in advance!

Morgoth's Curse

Troels Forchhammer

unread,
Jun 9, 2009, 3:28:31 AM6/9/09
to
In message <news:8mor25pff7ccr9qih...@4ax.com>
Morgoth's Curse <morgoths...@nospam.yahoo.com> spoke these
staves:
>
> I just received a message from AT & T notifying me that Usenet
> service will be eliminated completely on July 15.

B<censored>!

What are those <beep> thinking of? They're free to cut down the
binary groups, but the text groups also include thriving scholarly
groups, and this is completely insane.

I wonder if it's on a related to tell now that the Swedes have put a
member of the Pirate Party into the European Parliament -- from what
I gather their message isn't to abolish copyright protection, but
rather that the internet changes the game, and that legislation
should adapt to the internet, not the other way around. I'm not sure
I agree in every detail, but I certainly hope that they can make a
difference.

For yourself, I hope you have alternatives? There's a number of free
services out there (sometimes associated with various kinds of
advertising), and also some rather cheap ones with some service (I
use individual.net which is 10 Euros per year, and who has excellent
spam filtering, but they don't have the binary groups).

> Someone once suggested that I could write a script to download all
> of the threads for the Tolkien newsgroups. I am not really that
> proficient with computers and would prefer something that I can
> download and install. Does anyone know where I might obtain such
> a program? Thanks in advance!

The only archive that I know of which would be reasonably complete
would be Google -- does anyone know of other, perhaps more easily
accessible, archives with the same (or better) degree fo
completeness?

Having recently written a script to download from Google not the
messages, but merely the thread IDs, I have some vague idea of what
this would entail, and though it is possible (in particular if you
take my list of thread-URLs as input) it is not a task that I would
think easy. Basically, what you need to do is to visit each
individual thread, then parse the tree-list frame to get the IDs for
the individual messages, and then you can download the individual
messages in raw format using their IDs. I do suppose it's doable, but
I susupect that the automated HTML parsing (to get the individual
message IDs) will be beyond my meagre skills in that department. If
someone else would help with the HTML parsing, I could easily write
the code to retrieve the thread and the code to retrieve the message
-- when I did the thread collecting script, I simply used regular
expressions to retrieve the thread information without parsing the
HTML, but I'm not sure that the same is possible in the other case.

And of course the actual running will take ages. Google stops working
if you connect too often, so I use a random delay in my script to
avoid that, but it also means that it takes forever to get anywhere.
Getting the information on RABT took about 18 hours (some twenty-
seven thousand six hundred and something threads), and you'd need
quite a bit of disk space.

Next step would be the presentation -- and who knows, we might even
set up our own NNTP server for AFT and RABT if we put our minds to it
;-) But now I'm dreaming.

--
Troels Forchhammer
Valid e-mail is <troelsfo(a)gmail.com>
Please put [AFT], [RABT] or 'Tolkien' in subject.

Relativity applies to physics, not ethics.
- Albert Einstein (1875-1955)

hen...@swirve.com

unread,
Jun 15, 2009, 11:52:48 AM6/15/09
to
On Jun 9, 9:28 am, Troels Forchhammer <Tro...@ThisIsFake.invalid>
wrote:
> In message <news:8mor25pff7ccr9qih...@4ax.com>
> Morgoth's Curse <morgothscurse2...@nospam.yahoo.com> spoke these
> staves:

> Next step would be the presentation -- and who knows, we might even
> set up our own NNTP server for AFT and RABT if we put our minds to it
> ;-)  But now I'm dreaming.

Kill Gagool! That is to say, kill Googal! That is to say, kill
Googool... I give up.

Horus Engels

Noel Q. von Schneiffel

unread,
Jun 15, 2009, 1:24:28 PM6/15/09
to
On 15 Jun., 17:52, heng...@swirve.com wrote:
>
> Kill Gagool! That is to say, kill Googal! That is to say, kill
> Googool... I give up.

Kill a Googol heretics!

Googal, as everyone knows, is the westernmost Irish province. It is
where the inhabitants of Donegal go to take their dumps.

Noel

Morgoth's Curse

unread,
Jul 14, 2009, 7:30:06 PM7/14/09
to
On Tue, 09 Jun 2009 07:28:31 GMT, Troels Forchhammer
<Tro...@ThisIsFake.invalid> wrote:

>In message <news:8mor25pff7ccr9qih...@4ax.com>
>Morgoth's Curse <morgoths...@nospam.yahoo.com> spoke these
>staves:
>>
>> I just received a message from AT & T notifying me that Usenet
>> service will be eliminated completely on July 15.
>
>B<censored>!
>
>What are those <beep> thinking of? They're free to cut down the
>binary groups, but the text groups also include thriving scholarly
>groups, and this is completely insane.

Welcome to the 21st Century, Troels, where the motto of every
corporation is "Charge more, provide less."

On a more positive note, twenty years from now when your grandchildren
are paying ten Euros per millisecond for Internet access, you will be
able to blow their minds by describing how _you_ were able to access
the Internet free of charge!

>For yourself, I hope you have alternatives? There's a number of free
>services out there (sometimes associated with various kinds of
>advertising), and also some rather cheap ones with some service (I
>use individual.net which is 10 Euros per year, and who has excellent
>spam filtering, but they don't have the binary groups).

Yes, I'll have to shop around tomorrow, but I wanted to take advantage
of this access as long as I could since I paid for it.


>
>> Someone once suggested that I could write a script to download all
>> of the threads for the Tolkien newsgroups. I am not really that
>> proficient with computers and would prefer something that I can
>> download and install. Does anyone know where I might obtain such
>> a program? Thanks in advance!
>
>The only archive that I know of which would be reasonably complete
>would be Google -- does anyone know of other, perhaps more easily
>accessible, archives with the same (or better) degree fo
>completeness?

Well, I am really only interested in the text messages. I have
something like 200,000 posts from four different newsgroups in my
archives now and that only requires a few hundreds megabytes of space.
External hard drives are also cheap now, so I could buy one
specifically for the archives or I could spread it across several
DVDs. I just need a script to download all of the messages from a
specific newsgroup with perhaps a few filters to weed out the obvious
spam.

I believe that this is a project that we should tackle (either as a
group or individually) as soon as possible. As AT & T just
demonstrated, corporations cannot be trusted. It's possible that
Google will decide that the archives are not worth the expense now
that the Usenet is dying.

Morgoth's Curse

Morgoth's Curse

unread,
Jul 23, 2010, 2:08:48 AM7/23/10
to

I am just resurrecting this thread to find out if anyone has had
any new ideas in the year since I originally asked the question. I
sure hope so: My own archives only extend back to 2002 and I would
hate to lose all of the thousands of posts from the previous ten
years.

Morgoth's Curse

Morgoth's Curse

unread,
Dec 11, 2013, 10:38:55 AM12/11/13
to
* sigh *
It has been three years and I still haven't found anyone who can
supply the script that I need to download messages prior to 2002. I
am still convinced that it is just a matter of time before Google
discards the Usenet archives entirely.

Morgoth's Curse
0 new messages