Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Future of Deja's Usenet archive?

6 views
Skip to first unread message

Rich Lafferty

unread,
Oct 17, 2000, 3:00:00 AM10/17/00
to
Well, That What Many Suspected appears to be coming true. This may be
old news (it's been a couple DAYS!) but seeing no mention of it here
or in the Other Place I thought I'd bring it up.

The story from which I excerpt is at
<http://www.zdnet.com/intweek/stories/news/0%2C4164%2C2640446%2C00.html>.

"Another dot-com, this time Deja.com, is
seeking shelter from the Internet economy by way of a merger. The New
York company's business units - its longstanding Deja News Usenet
discussion business and its newer Precision Buying Service buying
guide - will be sold separately, according to sources familiar with
the negotiations."

[...]

"However it has not grown fast enough, or done well enough, to support
the buying service that Deja has been developing for the past two
years, sources said. The company claimed about 5 million unique users,
90 percent of whom use the Usenet service. About 10 percent to 20
percent use the Precision Buying Service."

Insert standard grumbles about why Usenet archives need to be
maintained without profit-motive. I wonder what sort of resource
consumption it uses up; somewhere in the back of my mind is the idea
that it might be worthwhile to start a nonprofit Usenet archive *now*
(although that's probably something that should've been done ages ago)..

-Rich

--
Rich Lafferty ----------------------------------------
Nocturnal Aviation Division, IITS Computing Services
Concordia University, Montreal, QC
ri...@bofh.concordia.ca -------------------------------

Russ Allbery

unread,
Oct 17, 2000, 3:00:00 AM10/17/00
to
Rich Lafferty <ri...@bofh.concordia.ca> writes:

> Insert standard grumbles about why Usenet archives need to be maintained
> without profit-motive. I wonder what sort of resource consumption it
> uses up; somewhere in the back of my mind is the idea that it might be
> worthwhile to start a nonprofit Usenet archive *now* (although that's
> probably something that should've been done ages ago)..

Depends on how much you want to archive, but text Usenet is probably
running around 2GB a day, very roughly. Other people likely have better
numbers.

So figure 500GB a year would do fine for a while at least with
compression.

50GB a year would probably be enough to handle the Big Eight at least
right now.

--
Russ Allbery (r...@stanford.edu) <http://www.eyrie.org/~eagle/>

Rob Partington

unread,
Oct 17, 2000, 3:00:00 AM10/17/00
to
On 17 Oct 2000 09:23:34 -0700, Russ Allbery <r...@stanford.edu> wrote:
>50GB a year would probably be enough to handle the Big Eight at least
>right now.

Would it be possible to do this in some kind of distributed manner
like Pluribus or Napster? So, say, I could volunteer to archive a set
of groups that comes to around 4G or so, whilst someone with more
resources could archive a greater set? That way the load is spread,
and by careful overlapping, you get redundancy free.
--
r...@frottage.org

Russ Allbery

unread,
Oct 17, 2000, 3:00:00 AM10/17/00
to

The only real problem as near as I can tell is the interface. If someone
can design a decent interface that doesn't involve having to run a
full-blown web server on each archiving box, that would go a long way.
NNTP isn't the right tool, unfortunately.

Something like traditional spool but breaking up each directory so it
doesn't get too large would be a nice, convenient storage format, although
not ideal for compression purposes. If you ran an rsync daemon, people
could easily grab copies of the groups you archived so that there would be
some redundancy....

Rich Lafferty

unread,
Oct 17, 2000, 3:00:00 AM10/17/00
to
In net.subculture.usenet,

Russ Allbery <r...@stanford.edu> wrote:
> Rich Lafferty <ri...@bofh.concordia.ca> writes:
>
> > Insert standard grumbles about why Usenet archives need to be maintained
> > without profit-motive. I wonder what sort of resource consumption it
> > uses up; somewhere in the back of my mind is the idea that it might be
> > worthwhile to start a nonprofit Usenet archive *now* (although that's
> > probably something that should've been done ages ago)..
>
> Depends on how much you want to archive, but text Usenet is probably
> running around 2GB a day, very roughly. Other people likely have better
> numbers.

I suspect disk would be the easy part of the equation, though. I
shudder to think of what Deja uses to *search* through that as quickly
as they do.

> 50GB a year would probably be enough to handle the Big Eight at least
> right now.

Awfully tempting, although obtaining 50GB/year plus cycles to search
would be Slightly Nontrivial. *sigh* It'd be hard to pick and choose
non-big-8 groups, too. I can see why Deja just took anything they
could get that wasn't a binary.

Russ Allbery

unread,
Oct 17, 2000, 3:00:00 AM10/17/00
to
Rich Lafferty <ri...@bofh.concordia.ca> writes:
> Russ Allbery <r...@stanford.edu> wrote:

>> Depends on how much you want to archive, but text Usenet is probably
>> running around 2GB a day, very roughly. Other people likely have
>> better numbers.

> I suspect disk would be the easy part of the equation, though. I shudder
> to think of what Deja uses to *search* through that as quickly as they
> do.

Yes, that's harder. The nice thing about that, though, is that if you
have the content, you can deal with the search at your leisure, try
different indexes, wait for someone else to come along and do it for you,
etc. As long as the content is there, the search problem will eventually
solve itself.

> Awfully tempting, although obtaining 50GB/year plus cycles to search
> would be Slightly Nontrivial. *sigh* It'd be hard to pick and choose
> non-big-8 groups, too. I can see why Deja just took anything they could
> get that wasn't a binary.

The cooperative approach would be the best, I think. Each person take a
little chunk. I could probably find resources around here to archive some
bits, particularly in hierarchies like comp.* and sci.* that have
long-term educational merit....

Jim Kingdon

unread,
Oct 17, 2000, 3:00:00 AM10/17/00
to
> http://www.zdnet.com/intweek/stories/news/0%2C4164%2C2640446%2C00.html

OK, so here's one scenario:

The usenet archive is a cash cow but instead of focusing on that they
are taking all the profits and dumping them into The Suits Latest Pet
Project. (That's what happened at Cygnus for many years, with
s/usenet archive/open source/ and s/buying guide/closed source/).

Or the other is that:

Both are losing money, but the buying guide is somewhat more
doomed than the usenet archive.

Either way, separating the two strikes me as a good thing for the
usenet archive. Although whether it is going to go bankrupt anyway is
less clear to me.

> it might be worthwhile to start a nonprofit Usenet archive *now*

Well, the idea of looking into whether this can be distributed, a la
freenet, strikes me as an interesting avenue to pursue (I might start
out by reading some of the freenet/gnutella/&c docs to see what
they're doing about things like searching and keeping one node from
getting overloaded and such). Might be possible to have the storage
be existing news servers (with some way of limiting how much
bandwidth/CPU/IO they are volunteering) with the user interface being
the new part (not that I've tried to design this in any detail...).

Rich Lafferty

unread,
Oct 17, 2000, 3:00:00 AM10/17/00
to
In net.subculture.usenet,
Jim Kingdon <kin...@panix.com> wrote:
>
[Rich wrote:]

> > it might be worthwhile to start a nonprofit Usenet archive *now*
>
> Well, the idea of looking into whether this can be distributed, a la
> freenet, strikes me as an interesting avenue to pursue

I'm left with a feeling that it doesn't need to be that Massive. It
seems as though time would be best put into getting friends of Usenet
to offer a certain amount of resources which are known to be reliable
than it would to start worrying about redundancy and dialups and users
with fragile systems, especially when it becomes important to ensure
that the content of a particular message isn't changed (and while
there are means by which we can tell if a message changes, I'm not
aware of any that'd let us change it back). It's the difference
between storage and an archive, in other words.

Of course, magnetic platters aren't a particularly *reliable* way of
making archives; electronic archiving is to a large extent something I
don't entirely understand, although our head archivist has explained
the general problems to me a few times. :-) I'd say that the first
priority would be a system that puts Usenet into a reliable storage
medium after which we can start worrying about providing access to
it. :-/ Unfortunately, that doesn't have many bells and whistles
attached.

I don't think the problem with Deja was centralization, though -- just
that their interests were in making money from a Usenet archive,
rather than being in maintaining an archive of Usenet for its own
sake. They *do* have 10TB of data kicking around, though. When did
they start archiving?

Meg Worley

unread,
Oct 17, 2000, 3:00:00 AM10/17/00
to

Rich writes:
>... somewhere in the back of my mind is the idea
>that it might be worthwhile to start a nonprofit Usenet archive *now*

>(although that's probably something that should've been done ages ago)..

Brewster Kahle's Internet Archive
(http://www.archive.org/collections/index.html#Usenet)
is working on this currently. I actually encouraged him
to try to hire Russ away from Stanford, bastard that I am.
Last I heard, though, he was hiring, so maybe it's time
for People With Clue (which would not be me) to put their
money where their mouths are.

Or not.

Rage away,

meg


--

Meg Worley _._ m...@steam.stanford.edu _._ Comparatively Literate

-dsr-

unread,
Oct 17, 2000, 3:00:00 AM10/17/00
to
In article <yl7l77x...@windlord.stanford.edu>,

Russ Allbery <r...@stanford.edu> writes:
> Rob Partington <ne...@frottage.org> writes:
>> On 17 Oct 2000 09:23:34 -0700, Russ Allbery <r...@stanford.edu> wrote:
>
>> Would it be possible to do this in some kind of distributed manner like
>> Pluribus or Napster? So, say, I could volunteer to archive a set of
>> groups that comes to around 4G or so, whilst someone with more resources
>> could archive a greater set? That way the load is spread, and by
>> careful overlapping, you get redundancy free.
>
> The only real problem as near as I can tell is the interface. If someone
> can design a decent interface that doesn't involve having to run a
> full-blown web server on each archiving box, that would go a long way.
> NNTP isn't the right tool, unfortunately.

Interface is the job of the client. HTTP is a perfectly reasonable
transport agent, and any of the lightweight servers would do. Web servers
start getting bulky when they add features beyond the basic listen-
parse-return cycle.

> Something like traditional spool but breaking up each directory so it
> doesn't get too large would be a nice, convenient storage format, although
> not ideal for compression purposes. If you ran an rsync daemon, people
> could easily grab copies of the groups you archived so that there would be
> some redundancy....

$NEWSROOT/hierarchy/YYYY/MM/DD/ for instance? High traffic groups *need*
to split daily (e.g. rasfw) to maintain a reasonable number of articles/
directory. And it's all same-language text (or close enough); compression
will be perfectly reasonable.

No, I've thought better of it. Use $NEWSROOT/hierarchy/YYYY/MM/DD.tgz instead.
You get a number of files any filesystem can handle, an individual download
which is not too large, and your interface tool can handle the threading and
correlation and so forth. Side benefit, all of the tools to manipulate this
sort of thing are easy to build.

It would be nice to build a Subject index at the end of each month, too.

-dsr-

Rich Lafferty

unread,
Oct 17, 2000, 3:00:00 AM10/17/00
to
In net.subculture.usenet,

Meg Worley <m...@steam.stanford.edu> wrote:
>
> Rich writes:
> >... somewhere in the back of my mind is the idea
> >that it might be worthwhile to start a nonprofit Usenet archive *now*
> >(although that's probably something that should've been done ages ago)..
>
> Brewster Kahle's Internet Archive
> (http://www.archive.org/collections/index.html#Usenet)
> is working on this currently.

Hey, those guys stole my idea! :-) That could be very useful. I wonder
if they know about the Deja situation.

Jim Kingdon

unread,
Oct 17, 2000, 3:00:00 AM10/17/00
to
> Brewster Kahle's Internet Archive
> (http://www.archive.org/collections/index.html#Usenet) is working on
> this currently.

Well, it is for scholarship and research purposes only and you need a
password to access the materials. At least assuming the generic
policies at http://www.archive.org/proposal.html#Terms apply to the
usenet collection.

While that's still of some use, it does strike me as a somewhat
different service than a fully public one like what deja.com provides.

Mark Atwood

unread,
Oct 17, 2000, 3:00:00 AM10/17/00
to
Jim Kingdon <kin...@panix.com> writes:
>
> Either way, separating the two strikes me as a good thing for the
> usenet archive. Although whether it is going to go bankrupt anyway is
> less clear to me.

If Deja is going bankrupt, maybe we/someone could form a non-profit to
purchase their database for "pennies on the dollar"?

And stashing the data in Freenet or MojoNation would work. All that
is "needed" is to convert every article into a single unique "pathname".
Heck, /usenet/messageid would work perfectly.

--
Mark Atwood | Freedom from want, freedom from fear, freedom from choice.
m...@pobox.com | Is that the freedom you want?
http://www.pobox.com/~mra

Kate Wrightson

unread,
Oct 18, 2000, 3:00:00 AM10/18/00
to
In article <m3r95fp...@flash.localdomain>,
Mark Atwood <m...@pobox.com> wrote:

>If Deja is going bankrupt, maybe we/someone could form a non-profit to
>purchase their database for "pennies on the dollar"?

They're not going bankrupt. They're being acquired. The article on
ZDNet that started all this implied that the deal was pretty much done
but that the details weren't public yet, at least for the archiving
segment.


--
-------------------------------------------------------------------------
ka...@eyrie.org | Please do not e-mail me copies of material posted
Kate Wrightson | to newsgroups. I read the groups to which I post.


Jim Kingdon

unread,
Oct 20, 2000, 3:00:00 AM10/20/00
to
> The usenet archive is a cash cow but instead of focusing on that they
> are taking all the profits and dumping them into The Suits Latest Pet
> Project.

Sorry for follow up to a post of mine, and a somewhat old one at that,
but a closer reading of the story confirms this.

The key word is "profitable".

"The company believes the profitable Usenet business unit"....
http://www.zdnet.com/intweek/stories/news/0,4164,2640446,00.html

justg...@gmail.com

unread,
Dec 8, 2013, 7:33:06 AM12/8/13
to
Usenet rules forever!

Ok, just kidding. Usenet will be around forever, though.

I found that Google Wave was similar to usenet's glory days (most of the discussions were intelligent and came from intelligent people). It is closed now, however.
0 new messages