size of comp.os.linux.* hierarchy

Edward Buck

unread,

May 25, 2003, 8:55:30 PM5/25/03

to

Hello all,

Not sure if this is off or on topic but...

I'm thinking of setting up a free news server (accessible to anyone on
the net) dedicated to linux newsgroups. These would primarily be
newsgroups in the comp.os.linux hierarchy plus maybe a few others.
The server would only take these particular feeds so in theory,
bandwidth and server load should be reasonable. Of course, this is
only in theory and since I am new to INN, I have no idea what the
actual server load and bandwidth will be.

I'm wondering if anyone here can quickly check their news spool and
report on what they see as the total size of the comp.os.linux tree.
If it's not too big, I'll proceed with my idea.

Also, if there are people willing to supply feeds to my server, I
would be extremely grateful. Please send me an e-mail or just reply
to this post. It probably won't be an equal exchange of feeds since
I'll only be carrying linux newsgroups but it's for a good cause!!

Best regards,
Edward

Russ Allbery

unread,

May 25, 2003, 10:31:39 PM5/25/03

to

Edward Buck <edwar...@netscape.net> writes:

> I'm wondering if anyone here can quickly check their news spool and
> report on what they see as the total size of the comp.os.linux tree. If
> it's not too big, I'll proceed with my idea.

news:~/spool/articles/comp/os> du -sh linux
50M linux

(21 day retention.)

--
Russ Allbery (r...@stanford.edu) <http://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<http://www.eyrie.org/~eagle/faqs/questions.html> explains why.

Edward Buck

unread,

May 26, 2003, 1:29:15 AM5/26/03

to

Russ Allbery wrote ...

> Edward Buck writes:
>
> > I'm wondering if anyone here can quickly check their news spool and
> > report on what they see as the total size of the comp.os.linux tree. If
> > it's not too big, I'll proceed with my idea.
>
> news:~/spool/articles/comp/os> du -sh linux
> 50M linux
>
> (21 day retention.)

Thanks Russ for the quick check. Wow, that's much less than I
thought. That translates into average daily traffic of about 2.4 MB.
Assuming each message is approximate 2-3k each, that's 800-1200
messages a day. I'm pretty sure my server and bandwidth can handle
that. Not sure how well INN scales for concurrent newsreader access
though.

In setting up the server, I'm considering leaving the server open to
external posting. A free news server just for reading IMHO is too
clumsy to be of any use. If I leave the server open to external posts
by anyone, is that akin to having an open relay mail server? I really
don't want to invite abuse by spammers. User authentication via
username/password seems overkill for something like this. Thoughts?

Edward

Russ Allbery

unread,

May 26, 2003, 1:36:08 AM5/26/03

to

Edward Buck <edwar...@netscape.net> writes:

> Thanks Russ for the quick check. Wow, that's much less than I thought.
> That translates into average daily traffic of about 2.4 MB. Assuming
> each message is approximate 2-3k each, that's 800-1200 messages a day.
> I'm pretty sure my server and bandwidth can handle that. Not sure how
> well INN scales for concurrent newsreader access though.

If you're doing text-only on modern hardware, the power of the hardware is
so much more than the demands that unless you have a *ton* of readers, you
generally don't have to worry about it.

> In setting up the server, I'm considering leaving the server open to
> external posting. A free news server just for reading IMHO is too
> clumsy to be of any use. If I leave the server open to external posts
> by anyone, is that akin to having an open relay mail server?

Yup, pretty much.

> I really don't want to invite abuse by spammers. User authentication
> via username/password seems overkill for something like this. Thoughts?

The only really good alternative is really strong filtering, like holding
posts for approval until someone looks at them.

Edward Buck

unread,

May 26, 2003, 2:21:41 AM5/26/03

to

Russ Allbery wrote:

> Edward Buck writes:
>>In setting up the server, I'm considering leaving the server open to
>>external posting. A free news server just for reading IMHO is too
>>clumsy to be of any use. If I leave the server open to external
posts
>>by anyone, is that akin to having an open relay mail server?
>
>
> Yup, pretty much.
>
>
>>I really don't want to invite abuse by spammers. User
authentication
>>via username/password seems overkill for something like this.
Thoughts?
>
>
> The only really good alternative is really strong filtering, like holding
> posts for approval until someone looks at them.
>

Hmm. The idea of moderating an unmoderated forum is pretty
unexciting. Not to mention resource-intensive.

So, if I leave my server open and hope that spammers don't find me,
will other news admins get upset? I'm pretty familiar with how things
work on the mail server side of things and there are few things worse
to a spam fighter than an open relay. Not sure if open news servers
are similarly loathed.

I guess in theory, if a spammer posted to all the newsgroups I carry,
the spam would eventually go upstream to my peers and eventually
become permanent relics of Usenet. There must be a workable solution
to this problem beyond authentication? Are most public news servers
read-only or authentication-based?

Thanks.
Edward

J.B. Moreno

unread,

May 26, 2003, 3:50:20 AM5/26/03

to

In article <ylwugel...@windlord.stanford.edu>,
Russ Allbery <r...@stanford.edu> wrote:

> Edward Buck <edwar...@netscape.net> writes:
-snip-

> > In setting up the server, I'm considering leaving the server open to
> > external posting. A free news server just for reading IMHO is too
> > clumsy to be of any use. If I leave the server open to external posts
> > by anyone, is that akin to having an open relay mail server?
>
> Yup, pretty much.
>
> > I really don't want to invite abuse by spammers. User authentication
> > via username/password seems overkill for something like this. Thoughts?
>
> The only really good alternative is really strong filtering, like holding
> posts for approval until someone looks at them.

Has anyone done anything with delaying post approval based upon the
number of posts/groups posted to?

I.e. 1 post delayed 3 minutes, another post within that three minutes
and both posts are delayed 5 minutes, another post and all 3 are
delayed 10 minutes, a 4th post and everything is delayed 15 -- keep on
and at some point it gets kicked to a person to.

Or the same thing except instead of basing everything on the number of
posts, include the number of groups in the calculation.

This wouldn't be fool-proof, but it'd might be good enough to let more
people run text-only servers, without drastically increasing the amount
of spam.

--
J.B. Moreno

Russ Allbery

unread,

May 26, 2003, 4:06:27 AM5/26/03

to

Edward Buck <edwar...@netscape.net> writes:

> So, if I leave my server open and hope that spammers don't find me, will
> other news admins get upset? I'm pretty familiar with how things work
> on the mail server side of things and there are few things worse to a
> spam fighter than an open relay. Not sure if open news servers are
> similarly loathed.

If the server can *only* post to those groups and you can stop it quickly
if someone starts doing something they shouldn't, you may be okay
*provided* that you don't allow any crossposting outside of
comp.os.linux.* and you don't allow any control messages at all. Either
of the latter will cause a lot of problems.

> I guess in theory, if a spammer posted to all the newsgroups I carry,
> the spam would eventually go upstream to my peers and eventually
> become permanent relics of Usenet. There must be a workable solution
> to this problem beyond authentication?

Not really.

> Are most public news servers read-only or authentication-based?

Yup.

Jeffrey M. Vinocur

unread,

May 26, 2003, 9:54:25 AM5/26/03

to

In article <ylfzn2k...@windlord.stanford.edu>,

Russ Allbery <r...@stanford.edu> wrote:
>Edward Buck <edwar...@netscape.net> writes:
>

>> I guess in theory, if a spammer posted to all the newsgroups I carry,
>> the spam would eventually go upstream to my peers and eventually
>> become permanent relics of Usenet. There must be a workable solution
>> to this problem beyond authentication?
>
>Not really.

It's funny that mailing lists can get away with restricting to
posting to subscribers using only a confirmatory email mechanism.
I wonder if something similar could be worked out for public news
servers.

>> Are most public news servers read-only or authentication-based?
>
>Yup.

To be more specific, almost all of the public servers carrying
standard usenet hierarchies are read-only; one requires free
registration. There aren't so many of these left, though. Most
people have usenet access through their ISP or equivalent, and
those who don't but care enough about usenet to notice are
willing to register for news.cis.dfn.de.

I'm not sure what servers carrying internal hierarchies (e.g. for
users tech support forums) do any more, although I think they
were unrestricted at one point.

--
Jeffrey M. Vinocur
je...@litech.org

Russ Allbery

unread,

May 26, 2003, 3:41:15 PM5/26/03

to

Jeffrey M Vinocur <je...@litech.org> writes:

> I'm not sure what servers carrying internal hierarchies (e.g. for users
> tech support forums) do any more, although I think they were
> unrestricted at one point.

Generally they don't allow crossposting and they don't make the control.*
groups available for reading or send any outgoing feeds, so the suck feeds
that people use with them can't pull any control messages down. They
occasionally still have problems with Supersedes (which you also really
want to reject in that situation).

Edward Buck

unread,

May 26, 2003, 4:14:50 PM5/26/03

to

Russ Allbery wrote in message news:<ylfzn2k...@windlord.stanford.edu>...

> Edward Buck writes:
>
> > So, if I leave my server open and hope that spammers don't find me, will
> > other news admins get upset? I'm pretty familiar with how things work
> > on the mail server side of things and there are few things worse to a
> > spam fighter than an open relay. Not sure if open news servers are
> > similarly loathed.
>
> If the server can *only* post to those groups and you can stop it quickly
> if someone starts doing something they shouldn't, you may be okay
> *provided* that you don't allow any crossposting outside of
> comp.os.linux.* and you don't allow any control messages at all. Either
> of the latter will cause a lot of problems.

So, regarding my INN 2.3.2 configuration, I would have only the
comp.os.linux hierarchy in my 'active' file. My newsfeeds file would
include:

ME:!*,comp.os.linux*/!local,!collabra-internal::

Incoming feeds in incoming.conf and outgoing feeds in newsfeeds would
limit feeds to comp.os.linux* on a server basis as well. If I only
have comp.os.linux* in my active list, does it matter whether I
exclude/include these newsgroups specifically in newsfeeds and
incoming.conf? What happens when a newsfeed is pulled/pushed to a
server via incoming.conf or newsfeeds that is NOT listed in the local
active file?

I'm not clear on the default permissions for cross-posting. My sense
is that cross-posting is not possible as long as newsfeeds and
incoming.conf are set properly. Also, the default control message
settings in control.ctl seem reasonable to me. Most settings default
to mail, drop or PGP verify. If I'm wrong in this assessment, please
let me know.

If I do all the above and have cleanfeed do some basic spam filtering,
do you think I can reasonably allow external posting without
authentication?

Thanks for all the help. I've done quite a bit of web-based forum
setup (i.e., OOoForum.org), but INN is new to me. I have to admit
there's something really cool and even mysterious about the way Usenet
works. By offering free linux newsgroups, I'm hoping we'll have more
good searchable content for Google Groups, which I've found to be an
invaluable search tool.

Best,
Edward

Curt Welch

unread,

May 26, 2003, 6:17:44 PM5/26/03

to

edwar...@netscape.net (Edward Buck) wrote:
> So, if I leave my server open and hope that spammers don't find me,
> will other news admins get upset?

No one is likely to care until you have an abuse problem. How much they
care depends on how fast you spot it and deal with the issue.

If the only groups you carry are the linux groups and you configure your
server to prevent people from posting to other groups, then spammers won't
be interested in your server even if they do find it.

> I'm pretty familiar with how things
> work on the mail server side of things and there are few things worse
> to a spam fighter than an open relay. Not sure if open news servers
> are similarly loathed.
>
> I guess in theory, if a spammer posted to all the newsgroups I carry,
> the spam would eventually go upstream to my peers and eventually
> become permanent relics of Usenet. There must be a workable solution
> to this problem beyond authentication? Are most public news servers
> read-only or authentication-based?

Yeah, pretty much.

The clasic way was to limit access by IP address so only local users could
access the server. And any site/ISP which did access control that way
would have other ways to track down the user given the IP address.

If you can't use that, then name/password authentication is normally used.

You can't leave a full-text server open for posting with no way to block
abuse and not run into an abuse problem some day. But if you limit to only
the linux groups, the odds of abuse are probably much less but still likely
to bite you one day.

And it's not just spammers you have to look out for. Some one might just
get pissed off at one of your groups in general and then flood it just for
revenge.

But, for such a small set of groups, I think the amount of problems you are
likely to run into will be very low.

--
Curt Welch http://CurtWelch.Com/
cu...@kcwc.com Webmaster for http://NewsReader.Com/

Curt Welch

unread,

May 26, 2003, 6:40:27 PM5/26/03

to

edwar...@netscape.net (Edward Buck) wrote:
> Russ Allbery wrote ...
> > Edward Buck writes:
> >
> > > I'm wondering if anyone here can quickly check their news spool and
> > > report on what they see as the total size of the comp.os.linux tree.
> > > If it's not too big, I'll proceed with my idea.
> >
> > news:~/spool/articles/comp/os> du -sh linux
> > 50M linux
> >
> > (21 day retention.)
>
> Thanks Russ for the quick check. Wow, that's much less than I
> thought. That translates into average daily traffic of about 2.4 MB.
> Assuming each message is approximate 2-3k each, that's 800-1200
> messages a day.

That matches what my stats show as well:

Group Pattern: comp.os.linux.*
Groups: 35
Articles/Day: 1042
KBytes/Day: 2450
Art Size: 2407

> I'm pretty sure my server and bandwidth can handle
> that.

Any machine that can run INN will have no problem at all with that. You
are talking about less than one article per minute. The bandwidth is less
than 300 bits/sec for the feed.

> Not sure how well INN scales for concurrent newsreader access
> though.

INN can deal with a lot of readers, but the more readers, the bigger the
machine you will need (especially memory).

Because the groups are easy to find everywhere, I suspect you will have
problems finding people who will even want to bother using your server.
You will be lucky to have 10 concurrent users I'd guess. Any small PC will
work fine for that.

Most small PCs would have no problem getting all the text groups.

To do a normal news feed however, you need a static IP. If you don't have
a static IP, you will probably have to suck the articles from another
server.

Russ Allbery

unread,

May 26, 2003, 7:47:50 PM5/26/03

to

Edward Buck <edwar...@netscape.net> writes:

> So, regarding my INN 2.3.2 configuration, I would have only the
> comp.os.linux hierarchy in my 'active' file. My newsfeeds file would
> include:

> ME:!*,comp.os.linux*/!local,!collabra-internal::

> Incoming feeds in incoming.conf and outgoing feeds in newsfeeds would
> limit feeds to comp.os.linux* on a server basis as well. If I only have
> comp.os.linux* in my active list, does it matter whether I
> exclude/include these newsgroups specifically in newsfeeds and
> incoming.conf?

No.

> What happens when a newsfeed is pulled/pushed to a server via
> incoming.conf or newsfeeds that is NOT listed in the local active file?

The articles are rejected.

> I'm not clear on the default permissions for cross-posting.

By default, INN allows any crossposting. There isn't a way to configure
it directly to stop crossposting; you have to use the embedded filters to
do that. Unfortunately.

> Also, the default control message settings in control.ctl seem
> reasonable to me. Most settings default to mail, drop or PGP verify.
> If I'm wrong in this assessment, please let me know.

Yes, if you don't want to add new groups beyond those particular groups,
you should strip out control.ctl to be basically an empty file.

> If I do all the above and have cleanfeed do some basic spam filtering,
> do you think I can reasonably allow external posting without
> authentication?

You also want to block reading of control and control.* in readers.conf,
and in the Perl filter you want to block all crossposting outside of
comp.os.linux.* and block all messages with Supersedes headers. (Although
that may lose you some FAQs; you may have to tune that a touch.) With
those precautions, it should be safe.

Edward Buck

unread,

May 27, 2003, 2:03:33 PM5/27/03

to

Russ Allbery wrote:
> By default, INN allows any crossposting. There isn't a way to configure
> it directly to stop crossposting; you have to use the embedded filters to
> do that. Unfortunately.

If my server has only comp.os.linux* groups in active and a user
connects and tries to cross-post to alt.binaries.xxx, will that
cross-post fail? I assume it should definitely fail if I have only
comp.os.linux* groups listed for each peer in my newsfeeds file.

> You also want to block reading of control and control.* in readers.conf,
> and in the Perl filter you want to block all crossposting outside of
> comp.os.linux.* and block all messages with Supersedes headers. (Although
> that may lose you some FAQs; you may have to tune that a touch.) With
> those precautions, it should be safe.
>

Cool. Thanks so much for all the tips. I'll post an update to the
list when I've got everything working.

By the way Russ, in reading up on INN (I've read quite a bit over the
course of the last week), I came across your "rant" on the usenet2
page. That was thoroughly enjoyable reading! Inspiring too.

Best,
Edward

Edward Buck

unread,

May 27, 2003, 2:28:16 PM5/27/03

to

cu...@kcwc.com (Curt Welch) wrote in message news:<20030526184027.373$o...@newsreader.com>...

> INN can deal with a lot of readers, but the more readers, the bigger the
> machine you will need (especially memory).
>
> Because the groups are easy to find everywhere, I suspect you will have
> problems finding people who will even want to bother using your server.

The less users, the better, right? :-)

Seriously though, it's not for those who are willing to "find" access.
It's for those who don't have the time to look. It's about
simplicity. Yes, there are ISPs that offer newsgroup access. But
there are lots of ISP's that don't. Mine doesn't. All my posting
(including this one) is done via Google Groups, which is wonderful for
searching but not so good for reading and posting.

So, yes I can fix this problem I have. I can sign up at
news.cis.dfn.de and post using that. But do I want to spend time
doing that? Not really. Why not offer a content specific server that
allows you to read and post without doing a thing?

If a server is setup for open access, it's possible to provide a near
seamless user experience using OE or Mozilla. Click on a newsgroup
link on a website and voila, OE or Mozilla's newsreaders pops up
magically and you're browsing and posting in seconds. Frankly, this
is what I want. Where can I get this today????

Russ Allbery

unread,

May 27, 2003, 3:12:37 PM5/27/03

to

Edward Buck <edwar...@netscape.net> writes:

> If my server has only comp.os.linux* groups in active and a user
> connects and tries to cross-post to alt.binaries.xxx, will that
> cross-post fail?

No. Crossposting allows crossposts to groups that your server doesn't
carry. (I vaguely remember there may be some knob somewhere to tweak
that, but I don't remember where.)

> By the way Russ, in reading up on INN (I've read quite a bit over the
> course of the last week), I came across your "rant" on the usenet2 page.
> That was thoroughly enjoyable reading! Inspiring too.

Thanks! Usenet 2 is at this point completely defunct, but I think the
rant still applies pretty well.

J.B. Moreno

unread,

May 27, 2003, 3:20:05 PM5/27/03

to

In article <e5c11262.03052...@posting.google.com>,
Edward Buck <edwar...@netscape.net> wrote:

> In setting up the server, I'm considering leaving the server open to
> external posting. A free news server just for reading IMHO is too
> clumsy to be of any use. If I leave the server open to external posts
> by anyone, is that akin to having an open relay mail server? I really
> don't want to invite abuse by spammers. User authentication via
> username/password seems overkill for something like this. Thoughts?

I don't know how much pre-processing/filtering you can do, but you
might limit posts to followups where the parent is present in the group
(and disallow xposting completely).

That would almost certainly prevent it from being used to spam
(although it might create a problem of it's own).

--
J.B. Moreno

Curt Welch

unread,

May 27, 2003, 5:54:44 PM5/27/03

to

edwar...@netscape.net (Edward Buck) wrote:
> cu...@kcwc.com (Curt Welch) wrote in message
> news:<20030526184027.373$o...@newsreader.com>...
> > INN can deal with a lot of readers, but the more readers, the bigger
> > the machine you will need (especially memory).
> >
> > Because the groups are easy to find everywhere, I suspect you will have
> > problems finding people who will even want to bother using your server.
>
> The less users, the better, right? :-)
>
> Seriously though, it's not for those who are willing to "find" access.
> It's for those who don't have the time to look. It's about
> simplicity. Yes, there are ISPs that offer newsgroup access. But
> there are lots of ISP's that don't. Mine doesn't. All my posting
> (including this one) is done via Google Groups, which is wonderful for
> searching but not so good for reading and posting.

Yeah, don't get me wrong. Running a news server is fun. And being able to
offer something to the net for free is great. I was just trying to point
out that I wouldn't expect you to have such a huge user load that it would
cause you a problem.

> So, yes I can fix this problem I have. I can sign up at
> news.cis.dfn.de and post using that. But do I want to spend time
> doing that? Not really. Why not offer a content specific server that
> allows you to read and post without doing a thing?

Sure, go for it.

But there are services that offer bascially "free" or with low one-time
sign-up fees that work great for getting access to all the text groups, so
there are plenty of resonable options out there for people. But, if you
offer a service for free, people will find it and use it, so you will get
users, there's no doubt about that.

> If a server is setup for open access, it's possible to provide a near
> seamless user experience using OE or Mozilla. Click on a newsgroup
> link on a website and voila, OE or Mozilla's newsreaders pops up
> magically and you're browsing and posting in seconds. Frankly, this
> is what I want. Where can I get this today????

Well, that's just the point. If someone were to program in your server,
which only has the linux groups, most the things they click on won't work.
So it's far better overall to find a server which carries all the groups if
you really want that to work.

If you want near-free access for text, teranews has free access for a one
time setup charge of $3.95.

news.readfreenews.net is free but read-only. They have many months of
retention.

bubbanews.com seems to have a free account, but I don't know if you can
post with that. I'm guessing you can.

If you read news:alt.free.newsservers you can find out more.

Jeffrey M. Vinocur

unread,

May 27, 2003, 6:09:47 PM5/27/03

to

In article <yladd8c...@windlord.stanford.edu>,

Russ Allbery <r...@stanford.edu> wrote:
>Edward Buck <edwar...@netscape.net> writes:
>
>> If my server has only comp.os.linux* groups in active and a user
>> connects and tries to cross-post to alt.binaries.xxx, will that
>> cross-post fail?
>
>No. Crossposting allows crossposts to groups that your server doesn't
>carry. (I vaguely remember there may be some knob somewhere to tweak
>that, but I don't remember where.)

It probably suffices to put

newsgroups: "@*,comp.os.linux.*"

in readers.conf, although I'm not sure about that. (As long as
the @* is included in any outgoing entries in newsfeeds, no spam
will get out to the world, which is probably an acceptable first
step -- after all, only your own users will get mad at you for
that.)

Russ Allbery

unread,

May 27, 2003, 6:29:17 PM5/27/03

to

Jeffrey M Vinocur <je...@litech.org> writes:

> It probably suffices to put

> newsgroups: "@*,comp.os.linux.*"

> in readers.conf, although I'm not sure about that.

Nope, @ isn't supported in readers.conf currently.

Nicholas Suan

unread,

May 27, 2003, 7:54:49 PM5/27/03

to

Curt Welch wrote in news:20030527175444.363$h...@newsreader.com

> bubbanews.com seems to have a free account, but I don't know if you can
> post with that. I'm guessing you can.
>

You can, and the account is limited to 50MB/day

Edward Buck

unread,

May 28, 2003, 1:15:52 AM5/28/03

to

Russ Allbery wrote:
> Thanks! Usenet 2 is at this point completely defunct, but I think the
> rant still applies pretty well.
>

It's unfortunate that Usenet 2 is now defunct. It was a great idea
that sounded quite promising. And yes, your rant definitely still
applies.

It's interesting to look at the history of Usenet. It was essentially
born out of the dial-up world (uucp connections). NNTP and modern
hardware now allow for faster connections but the fundamental
technology behind usenet has pretty much stayed the same. One server
has a bunch of content and connects to another server to share all
that content. I wonder if there's a way to leverage the new
technology being developed in peer-to-peer technology to make a
better, faster, more manageable Usenet?

I see two major problems in current Usenet technology. One, all the
content is essentially being mirrored on every major Usenet server.
This is in my opinion a terrible waste of bandwidth and storage.
There is also no correlation between storage and usage. All the least
accessed content maintains the same storage preference as the most
accessed content. Second, manual peering arrangements are a bit at
odds with the self-service, always on, always connected paradigm of
the modern Internet. This reality makes setting up and maintaining a
news server very expensive. Combine that with the huge upfront
storage requirement of setting up a "complete" server and it's no
wonder that many ISP's have dropped Usenet as a basic service.

For both of these problems, I like the idea of a Usenet registry of
sorts. The basic hierarchy and naming paradigm of Usenet would stay
the same, but there might be a central server somewhere that helps
Usenet be more efficient. Here's an example of how this might work:

Anyone who wishes to provide a news server can register their host/ip
address to this central database. You might think of this as the
Usenet equivalent of the dns root server paradigm. When a nntp server
goes online, the daemon would first ping the registry and see what
news peers are available within a particular geographic range. The
news daemon would connect to those local peers and do a database dump
on available groups and available posts. The body of the posts would
be kept separate. As users post messages, the post header info would
be updated in the local database and all connected databases in real
time, but the post body would stay on the local server until it was
"requested" by somebody on the Internet. The header information could
include a path to where the content can be downloaded. So unless the
body is requested, it never goes anywhere, saving bandwidth and
storage. Once downloaded, the header is updated to include the new
server carrying a copy of the body. Now, when the updated header is
shared with other news servers on the net, the body can be downloaded
from the most recent server in the path rather than going all the way
back to the original server. Basically, the registry would provide
direction and peers would provide smart caches of where everything can
be found. If the content is never requested, it never leaves the
original server.

Just thinking out loud...

Edward

Curt Welch

unread,

May 28, 2003, 3:01:20 AM5/28/03

to

edwar...@netscape.net (Edward Buck) wrote:
> I see two major problems in current Usenet technology. One, all the
> content is essentially being mirrored on every major Usenet server.

Yes, this "problem" has been looked at a lot. And every possible way to
"fix" it has probably been explored and many implmented. What you find is
that the "problem" is there for a reason.

> This is in my opinion a terrible waste of bandwidth and storage.

As it turns out, the cost is worth the expense.

> There is also no correlation between storage and usage. All the least
> accessed content maintains the same storage preference as the most
> accessed content.

As it turns out, the cost is worth the expense.

> Second, manual peering arrangements are a bit at
> odds with the self-service, always on, always connected paradigm of
> the modern Internet.

Not at all.

The internet is millions of wires and _every one of then_ was manual put
into place and are manually set up as "peering" arangements between all the
routers which make up the internet. Many of those connections even come
with large legal contracts which take months to put into place. Usenet
peering is not at odds with that at all.

More important however, the cost of setting up and maintaining Usenet
peering is so small compared to the cost of the hardware and bandwidth,
it's not relevant.

> For both of these problems, I like the idea of a Usenet registry of
> sorts.

> Anyone who wishes to provide a news server can register their host/ip
> address to this central database. When a nntp server

> goes online, the daemon would first ping the registry and see what
> news peers are available within a particular geographic range. The
> news daemon would connect to those local peers

Yeah, we've talked about a lot of ideas like that. Technically, it could
work fine. But because of practical matters, it would fail.

The problem is exactly what you see happening with P2P. It doesn't work as
well as you would like because you can't trust other people to run their
servers correctly or to run "high quality" servers. The quality of the
service any end-user receives is a fucntion of not one server, or one
company, but hundreds of them. If the article you want happens to be on
only one server (not a very popular article), and that server is down, or
happens to be 4000 ms away from you on the other side of the world, it may
take forever to get a copy of it. The current structure of usenet fixes
that. It allows each server to provide whatever level of service they can
justify.

Then there's the cost structure issue. When you sell access to Usenet, it
works fairly well, because most of your cost is consumed by the people who
are paying you to run the server. If your server doesn't work well, it's
your customers that notice - and choose to go elsewhere if you are not not
providing a completitive service.

If I instead ran a server which was providing articles to any peer in the
world that wanted them, what motivation would I have to making that work
well? I'd have almost none. And because of that, most servers would end
up being configued not to provide "good" service to the "freeloaders". Or,
like most "free" servers on usenet, they are speed limited and/or almost
always overloaded.

If peering was "free" like that with any client then it would also be
simple to get free access to usenet just be setting up a server. That's
fine in theory (just like P2P), but it is has problems in practice because
then you have to trust that everyone else is providing a good quality
server.

Usenet servers are expensive to run. But that cost is offset by the number
of users you service from each server. As it gets more expensive, you
simply build a larger userbase. If the cost goes up and the total number
of people using usenet does not go up, then this means you need fewer
servers in the world. This is why you don't have thousands of small ISPs
running their own servers anymore. They go in with others and simply share
a fewer number of larger servers. Why is it better to run 10,000,000 small
servers instead of 100,000 large servers?

When the distributed idea works ok is when one adminstrative domain runs
the entire distributed set of servers or when the "leaf" nodes are paying
for the services of the "net". This is already being done in many places
using suck-on-demand, and caching servers, and head-only feeds to diablo
front-end servers. What won't work is expecting the entire net to
transform to a model like that because if you don't pay for the back-end
service you can't control the quality and you will end up with with stuff
that never works very well.

So, for distirbuted usenet to work as well as usenet works today, you end
up having to pay for your "peering" (i.e. access to backbone servers). And
people are already running servers like that today. But even that is
limited because the performance of your server is not only a function of
the "quality" of the backbone server you pay for, it's a function of the
network quality between the servers. And since adding more network latency
and unpredictablity into the equation tends to only create a server which
is slower and more problem-prone than a standard full-feed server, it's not
done a lot. You end up with a more competitive offering if you run a large
server and offer it to thousands of users instead of a small caching server
of some type and offering it to only hundreds of users.

> Just thinking out loud...

Yeah, it's fun to do that. I do it a lot here.

John Miller

unread,

May 28, 2003, 8:52:23 AM5/28/03

to

Edward Buck wrote:
<snip>

> For both of these problems, I like the idea of a Usenet registry of
> sorts. The basic hierarchy and naming paradigm of Usenet would stay
> the same, but there might be a central server somewhere that helps
> Usenet be more efficient. Here's an example of how this might work:

<snip>

I enjoyed reading your article, Edward. Sometimes we need to get a fresh
perspective on things.

However, anything that would create a single point of failure would be a
fatal flaw, as far as Usenet is concerned.

Actually, from the point of view of the original ethic and aim of Usenet,
the present system is pretty well evolved.

--
John Miller

The ideal voice for radio may be defined as showing no substance, no sex,
no owner, and a message of importance for every housewife.
-- Harry V. Wade

Edward Buck

unread,

May 28, 2003, 12:34:17 PM5/28/03

to

Curt Welch wrote:

> edwar...@netscape.net (Edward Buck) wrote:
>>Anyone who wishes to provide a news server can register their host/ip
>>address to this central database. When a nntp server
>>goes online, the daemon would first ping the registry and see what
>>news peers are available within a particular geographic range. The
>>news daemon would connect to those local peers
>

> The problem is exactly what you see happening with P2P. It doesn't work as
> well as you would like because you can't trust other people to run their
> servers correctly or to run "high quality" servers. The quality of the
> service any end-user receives is a fucntion of not one server, or one
> company, but hundreds of them.

I agree that many of today's P2P networks have problems, but P2P is
still relatively new and evolving at a very fast pace. Some of the
newer technologies handle bandwidth and cached data more effectively,
improving overall performance and throughput. BitTorrent is a great
example of this. I agree that "low quality" servers could become a
bottleneck but there are workarounds to this. The news daemon could
ping the current list of peers on a regular basis to determine QoS to
those servers. If it ever drops below a certain threshold, you could
automatically query the central registry for a new set of servers. As
long as there are enough good servers out there, it should always be
possible to maintain a decent level of service. The good servers end
up helping each other, which is what happens today anyway.

> If the article you want happens to be on
> only one server (not a very popular article), and that server is down, or
> happens to be 4000 ms away from you on the other side of the world, it may
> take forever to get a copy of it. The current structure of usenet fixes
> that. It allows each server to provide whatever level of service they can
> justify.

If the server that originally posted the article is offline, then I
would argue that's a good thing. That content should be ignored.
It's no different from a 404 message at a website. Since the user who
posted that message will be a subscriber on that server, that user
will complain that nobody read his brilliant post and take his
business elsewhere. This provides enough incentive for people to
maintain their servers. Otherwise, it's their users who suffer. If
the message does indeed propagate once or twice to other servers
(which in theory should always happen on popular groups), then you
might have a number of servers listed in the download path, providing
alternate routes to the content. This is similar to having multiple
redundant dns servers that can be queried for any given domain name.
It's unlikely that all servers will be down at the same time.

> Then there's the cost structure issue. When you sell access to Usenet, it
> works fairly well, because most of your cost is consumed by the people who
> are paying you to run the server. If your server doesn't work well, it's
> your customers that notice - and choose to go elsewhere if you are not not
> providing a completitive service.

I don't think the cost structure necessarily changes. Users will
still be paying for the service, whether via their ISP or via a
third-party. But it would lower the bar to entry for anyone who wants
to offer service. That means more service providers, more servers,
more bandwidth, more distributed content, and ultimately more users.

> If I instead ran a server which was providing articles to any peer in the
> world that wanted them, what motivation would I have to making that work
> well? I'd have almost none. And because of that, most servers would end
> up being configued not to provide "good" service to the "freeloaders". Or,
> like most "free" servers on usenet, they are speed limited and/or almost
> always overloaded.

I think in practice, there will be very few "freeloaders" who aren't
giving something back in the way of storage or bandwidth. The central
registry would ensure that participating peers are online and
available. Normal consumers are NOT going to setup Usenet servers.
It's just too difficult.

> Usenet servers are expensive to run. But that cost is offset by the number
> of users you service from each server. As it gets more expensive, you
> simply build a larger userbase. If the cost goes up and the total number
> of people using usenet does not go up, then this means you need fewer
> servers in the world. This is why you don't have thousands of small ISPs
> running their own servers anymore. They go in with others and simply share
> a fewer number of larger servers. Why is it better to run 10,000,000 small
> servers instead of 100,000 large servers?

This is also why the percentage of Internet users who use Usenet has
gone down considerably over the last few years. Many of my "savvy"
online friends have NEVER used Usenet and don't even know what it is!
We need to look for ways to expand the community and making it
easier for ISP's to get their users online is one way.

>>Just thinking out loud...
>
>
> Yeah, it's fun to do that. I do it a lot here.
>

I agree it's fun. Thanks for taking the time to read my post. All
your points are well taken. But I think work-arounds are possible.
Now, if only I had some time to code...

Best,
Edward

Curt Welch

unread,

May 28, 2003, 4:53:21 PM5/28/03

to

Edward Buck <edwardbuc...@netscape.net> wrote:
> Curt Welch wrote:

> I agree that many of today's P2P networks have problems, but P2P is
> still relatively new and evolving at a very fast pace. Some of the
> newer technologies handle bandwidth and cached data more effectively,
> improving overall performance and throughput. BitTorrent is a great
> example of this. I agree that "low quality" servers could become a
> bottleneck but there are workarounds to this.

Yeah, there's a lot of potential for smart networking algorithms to
identify and deal with the problem sites so that everyone can maximize the
use of what is out there (by not wasting time trying to contact or pull
articles from an overloaded server when other servers out there already
have the article and are not overloaded). But I think anyone who belives
that will "solve" the problems is missing the big picture. All that does
is drag down all the "public" resources equally instead of allowing only
10% of the resources to create a bottleneck. It doesn't address the supply
and demand problem that happens any time you give stuff away for "free".
Making it work twice as well will never remove the fact that supply will
always outstrip demand. The better you make it work, the worse the problem
actually gets because you are offering something of more value yet you are
still giving it away for free.

The only thing left to bring the supply and demand back into balance is
performance. More and more people will use it until they drag it down to
the point that the demand is reduced to meet the actual supply. This is
the guaranteed "stable state" of any free P2P network. They will always
suck in terms of performance and quality of service.

Some people have agured with me that the real problem is the asysmetric
bandwidth - the fact that consumers are trying to build P2P networks using
bandwidth with higher download rates than upload rates - i.e. what they can
suck from the net is always higher than what they can provide.

Once again, thinking that ADSL is the "problem" is missing the big picture.
Anyone that wants to make P2P work can buy a T1 and serve as much content
(or more) than they suck. But most don't. Why? Because no one is paying
them to spend their money to help others so they choose not to do it.
Human nature guarantees that people as a group will always try to get more
than they give.

If you want to "get" high quality, there must be something that forces the
people to "give" something of equal value in return. Maybe P2P clients can
attempt to enfoce that through software - i.e. limit what you can download
based on what you provide to the net. But in the long run, people will
simply create clients to violate that I suspect and break the system again.

> The news daemon could
> ping the current list of peers on a regular basis to determine QoS to
> those servers. If it ever drops below a certain threshold, you could
> automatically query the central registry for a new set of servers. As
> long as there are enough good servers out there, it should always be
> possible to maintain a decent level of service.

Yeah, but my thesis is that there will never be enough good servers out
there.

> > If the article you want happens to be on
> > only one server (not a very popular article), and that server is down,
> > or happens to be 4000 ms away from you on the other side of the world,
> > it may take forever to get a copy of it. The current structure of
> > usenet fixes that. It allows each server to provide whatever level of
> > service they can justify.
>
> If the server that originally posted the article is offline, then I
> would argue that's a good thing.

Most usenet uses I think would not. That's because when you try to read
the article, your client won't be able to tell you that the server is
off-line. Instead, it will simply hand for a many seconds first and then
tell you it's down - just like what happens with the web. That type of
user interaction is such a pain that I'm sure most usenet uses would choose
the current system over that system if given a choice. i.e., if the server
tells me there's an article, it better be able to give it to me quickly
when I ask for it, or tell me quickly that it's gone. Havin to "hang" for
30 seconds to find out if it is or is not avaible would really suck - even
more so if it happened on every 10th article you tried to read.

> That content should be ignored.
> It's no different from a 404 message at a website. Since the user who
> posted that message will be a subscriber on that server, that user
> will complain that nobody read his brilliant post and take his
> business elsewhere. This provides enough incentive for people to
> maintain their servers.

It provides an incentive for sure. But I don't think it provides "enough".

> Otherwise, it's their users who suffer. If
> the message does indeed propagate once or twice to other servers
> (which in theory should always happen on popular groups), then you
> might have a number of servers listed in the download path, providing
> alternate routes to the content. This is similar to having multiple
> redundant dns servers that can be queried for any given domain name.
> It's unlikely that all servers will be down at the same time.

Yeah, but do you know what happens when dns servers go down? All access
start to take an extra 30 seconds or so. I think on a widely distributed
"free" version of usenet, you would have a very high percentage of articles
which were unavailable but which would cause your user interface to hang
for many seconds as your server trys to "find" the article you asked for.

This is all a part of why the current system of running servers which cache
all the articles are the most popular.

> > Then there's the cost structure issue. When you sell access to Usenet,
> > it works fairly well, because most of your cost is consumed by the
> > people who are paying you to run the server. If your server doesn't
> > work well, it's your customers that notice - and choose to go elsewhere
> > if you are not not providing a completitive service.
>
> I don't think the cost structure necessarily changes. Users will
> still be paying for the service, whether via their ISP or via a
> third-party. But it would lower the bar to entry for anyone who wants
> to offer service. That means more service providers, more servers,
> more bandwidth, more distributed content, and ultimately more users.

Well, this already works in some ways. Lots of stuff has already been
created for doing stuff like this. The mynews server is a small personal
server designed to be used much like that already. You can archive and
share a small set of groups which you are interested in with other mynews
users. And I'm not sure, but I think mynews has an automated system for
finding peers so it addresses your ideas about the problems of finding and
setting up peers.

And anyone is free to set up a traditional server which only carries a
small set of groups and peer with any other similar small servers
interested in the same groups. It requires the admin overhead of finding
and setting up peers, but that overhead is not really what stops more
people from doing that. It's the fact that $10 per month for access to a
full usenet server is far easier for most people to justify than spending
even 1 hour a month running their own server no matter how automated it is.

> > If I instead ran a server which was providing articles to any peer in
> > the world that wanted them, what motivation would I have to making that
> > work well? I'd have almost none. And because of that, most servers
> > would end up being configued not to provide "good" service to the
> > "freeloaders". Or, like most "free" servers on usenet, they are speed
> > limited and/or almost always overloaded.
>
> I think in practice, there will be very few "freeloaders" who aren't
> giving something back in the way of storage or bandwidth. The central
> registry would ensure that participating peers are online and
> available. Normal consumers are NOT going to setup Usenet servers.
> It's just too difficult.

Yeah, but there has to be some type of system in place to force people to
supply as much (or more) than they take from the network. Human nature
guarantees that if you aren't being watched, people won't do it.

Money is just one way to make that happen. Peer presure is another. But
to make peer presure work, you have to exposue your network usage to the
public - i.e. document what you supply and what you take from the
"community". That seems to go against what most of these networks are
about. Maybe there's some way to make it work other than just making the
"quality" suck to the point that only the truely desparte are willing to
put up with it. But I haven't seen it yet.

> > Usenet servers are expensive to run. But that cost is offset by the
> > number of users you service from each server. As it gets more
> > expensive, you simply build a larger userbase. If the cost goes up and
> > the total number of people using usenet does not go up, then this means
> > you need fewer servers in the world. This is why you don't have
> > thousands of small ISPs running their own servers anymore. They go in
> > with others and simply share a fewer number of larger servers. Why is
> > it better to run 10,000,000 small servers instead of 100,000 large
> > servers?
>
> This is also why the percentage of Internet users who use Usenet has
> gone down considerably over the last few years. Many of my "savvy"
> online friends have NEVER used Usenet and don't even know what it is!
> We need to look for ways to expand the community and making it
> easier for ISP's to get their users online is one way.

Well, the big difference between P2P and Usenet is that Usenet was designed
for transporting text messages and not large files. P2P was designed to
share files. P2P even with it's supply and demand issues, does a much
better job at sharing files than raw Usenet does. Usenet also is based on
the idea of all articles expiring after a short time where as many P2P
networks have files that can live on forever.

The problems with usenet are not the cost of running servers or the huge
bandwidth issues, it's the fact that it just doesn't work very well for
file sharing. If we improve that (and a lot of the stuff I've talked about
with xBin is a path to doing that), then I suspect you would find a lot
more people using Usenet for file sharing.

> >>Just thinking out loud...
> >
> >
> > Yeah, it's fun to do that. I do it a lot here.
> >
>
> I agree it's fun. Thanks for taking the time to read my post. All
> your points are well taken. But I think work-arounds are possible.
> Now, if only I had some time to code...

Yeah, just my thought. :)

> Best,
> Edward

Edward Buck

unread,

May 29, 2003, 6:20:17 PM5/29/03

to

Curt Welch wrote:
> The only thing left to bring the supply and demand back into balance is
> performance. More and more people will use it until they drag it down to
> the point that the demand is reduced to meet the actual supply. This is
> the guaranteed "stable state" of any free P2P network. They will always
> suck in terms of performance and quality of service.

This is certainly possible. I tend to be a little more optimistic when
it comes to networking technology and the sharing of digital content.
There are some trends worth noting:

* Bandwidth is doubling every year

There are many indications that bandwidth is increasing even faster than
Moore's law of CPU speed doubling every 18 months. It's already
possible to move 2.56 terabits of data per second over a distance of
2,500 miles. That's more than enough bandwidth for Usenet and then some.

* Content compression algorithms are getting better

mpeg4 is a big step up from mpeg2 (the current dvd standard). As
compression technology improves, the files that carry multimedia content
(music, video, etc.) are getting smaller not bigger. Of course, the
next format to replace dvd will probably have more data packed in there.
But a fast rate of increase in bandwidth combined with a slower rate
of increase in content size should contribute to a better P2P experience
over time.

> This is all a part of why the current system of running servers which cache
> all the articles are the most popular.

Certainly, caching all the data will ensure an improved quality of
service. Perhaps that can be a configuration option. Some servers can
opt to cache everything and others can opt to cache only when requested.

Regards,
Edward

Curt Welch

unread,

May 30, 2003, 12:30:51 AM5/30/03

to

Edward Buck <edwardbuc...@netscape.net> wrote:
> Curt Welch wrote:
> > The only thing left to bring the supply and demand back into balance is
> > performance. More and more people will use it until they drag it down
> > to the point that the demand is reduced to meet the actual supply.
> > This is the guaranteed "stable state" of any free P2P network. They
> > will always suck in terms of performance and quality of service.
>
> This is certainly possible. I tend to be a little more optimistic when
> it comes to networking technology and the sharing of digital content.
> There are some trends worth noting:
>
> * Bandwidth is doubling every year
>
> There are many indications that bandwidth is increasing even faster than
> Moore's law of CPU speed doubling every 18 months. It's already
> possible to move 2.56 terabits of data per second over a distance of
> 2,500 miles. That's more than enough bandwidth for Usenet and then some.

:)

Usenet doubles in size every 10 months. It's currently 80 Mbps. 2.56 Tbs
is only 32 times that. It will take Usenet only 5 years to grow to the
point that a 2.56 Gbps pipe will be too small to hold a full feed.

However, the reduced cost of technolgy is the primary factor that allows
usenet to grow at these rates. So if for some reason, bandwidth and
computer costs didn't keep droping as quickly, Usenet's growth would slow
or even stop. But like you say, there's not a lot of reason to suspect
that the growth of technology is going to slow any time soon.

What happens is that as the technology improves, it just opens the door to
store and share larger things. There's really no limit to what you can
share as a digital file. The huge vido files of today will become small
things that copy in seconds just like the jps of yesterday which were so
huge and took so long to download are now nothing.

100 GB video files with multiple camera angles (tracks) and multiple audio
tracks could be created and traded.

One day, 3D videos may be created where multiple cameras angles are
digitally processed to recreate a 3D model of what was filmed and then as
you play it back, you can position the virtual camera anywhere you want.
With the help of some type of 3D goggles, you could walk around the room as
the film was playing back to see the action from any angle inside of the
scene. These files of course will be TBs in size for the high-res version.

And of course software has no limits. The more memory and CPU you have to
work with, the large programs of all types will become.

As the price of technolgy comes down, and the amount of CPU speed and
bandwidth you have available to you goes up, it doesn't make things work
better, it simply allows you to do more. The supply and demand problem
with the current P2P model will still be there no matter how much bandwidth
we have access to or how cheap it becomes. Yes, the P2P networks of the
future may work great for swapping the type of files people have problems
trying to swap today, but in the future, they won't use P2P to swap those
becaue those are so cheap to swap they will simply be put on web sites and
sent as instant messages. In the future, people will be trying to use P2P
networks to swap those 100 TB files and waiting hours for them to download.

> * Content compression algorithms are getting better

Once again, this is just another improvement in technology that allows us
to do more for less. Cheaper bandwidth, cheaper computers, better
software, it's all the same thing. What's important is that the price
never drops to zero, and the speed and size never hit infinity. It always
cost something and it always has it's limits. Are ability to think up new
uses for the technolgy however is infinite. The supply will always
outstrip the demand.

> But a fast rate of increase in bandwidth combined with a slower rate
> of increase in content size should contribute to a better P2P experience
> over time.

Even if bandwidth increases faster than content size, it's not going to
change anything. We will simply use the excuse to move more content.
Years ago, the only real files posted to usenet were pictures. As
bandwidth and storage and scanners and color monitors and high-res color
display systems came down in price, and jpg and mpg and divx were created,
it just gave everyone the power and excuse to move content in new ways
which they had never even dreamed of trying to move in the past. I worked
with computers for 10 years before I ever saw a color digital picture
displayed on a computer screen. It was a hell of a thing to see for the
first time. And I knew one day we would be watching TV quality video on a
home computer, though at the time, it seemed impossible (a small number of
image files tended to fill up entire hard drives in those days).

These advancements will make moving the stuff we move today simple and
easy, but in the future, the files of today will be as unintersting as a
single 100x100 pixel image file is to us today.

> > This is all a part of why the current system of running servers which
> > cache all the articles are the most popular.
>
> Certainly, caching all the data will ensure an improved quality of
> service. Perhaps that can be a configuration option. Some servers can
> opt to cache everything and others can opt to cache only when requested.

dnews already works like that. And there are plenty of news servers which
act as caching proxy servers. If you want to run one of those instead of a
normal server, you can. But in the end, very few sites do that. If a
distributed version of usenet where servers cached only the articles that
were requesed made sense, they would be used more.

However, after saying all this, I think there might be more justification
for a distributed version of Usenet when it comes to the large files that
are moved across Usenet. If you decide to download a large file (as
opposed to a text article), you are already forced to wait a long time for
the file to show up. So the delays and problems of finding the parts of
the file from multiple sites on the net is not a big change from what the
user is already dealing with. To some extent, the current download tools
which allow people to suck articles from mutliple servers are already doing
a form of this. But with how files are posted to usenet, it's very hard
for servers to do any type of intelligent caching of only the files which
were requested by their users. This is because Usenet servers move
articles, not files. The files are hidden inside the articles and the
servers for the most part know nothing about the files hidden in the
articles. There's no way for a client to ask a server: "do you have any
parts of file XYZ available"? or "can you find file XYZ for me?". The xBin
posting convention I talk about is an attempt to move Usenet to a place
where this type of thing will become possible. When Usenet servers know
more about the files they are transporting, it might be possible and
resonable to do more of the type of file caching you are talking about.

For small text articles, it's easy and better to move the entire text file
around instead of wasting bandwidth asking multiple servers to see which
one might have a copy of the article. It makes little sense to broadcast
1K of header data about an article and spend 20 seconds waiting for the
server to query multiple remote sites to see who has a copy of the article
which was only 2K total in size. And this holds for larger articles as
well. But when you start getting up to the 1MB and large files, there's
plenty to be gained by not copying all the files to all the servers and
since the download of a 500MB file is going to take hours anyway, who cares
if the software has to spend 5 minutes to find all the parts first?

But the reason it doesn't work well for Usenet today is because Usenet
servers are not moving hundreds of 500MB files (as far as they know), they
are moving millions of 400KB articles instead. To get to the place where a
distribted caching system might make sense, I believe you first need to
transform Usenet into a system that knows about the files it's moving.
Once the servers become "file aware" then I think it would be more likely
that distibuted caching schemes could start to make sense.

Kai Henningsen

unread,

May 31, 2003, 9:58:00 AM5/31/03

to

cu...@kcwc.com (Curt Welch) wrote on 28.05.03 in <20030528165321.545$6...@newsreader.com>:

> The problems with usenet are not the cost of running servers or the huge
> bandwidth issues, it's the fact that it just doesn't work very well for
> file sharing. If we improve that (and a lot of the stuff I've talked about
> with xBin is a path to doing that), then I suspect you would find a lot
> more people using Usenet for file sharing.

So I should fight your stuff to get a Usenet that's better suited to *my*
needs (i.e. not used for file sharing)?

Just kidding.

Anyway, I happen to think that I get fairly high-quality Usenet, and I pay
nobody for news. (Instead, I pay for some generic net infrastructure (such
as an ADSL link and a dedicated server) and work on keeping those running.
Of course, it *does* mean I have to know how all this stuff works. OTOH,
that is what makes it fun.)

Kai
--
http://www.westfalen.de/private/khms/
"... by God I *KNOW* what this network is for, and you can't have it."
- Russ Allbery (r...@stanford.edu)

J.B. Moreno

unread,

Jun 2, 2003, 11:49:57 AM6/2/03

to

In article <8mtRa...@khms.westfalen.de>,
Kai Henningsen <kaih=8mtRa...@khms.westfalen.de> wrote:

> cu...@kcwc.com (Curt Welch) wrote on 28.05.03 in
> <20030528165321.545$6...@newsreader.com>:
>
> > The problems with usenet are not the cost of running servers or the huge
> > bandwidth issues, it's the fact that it just doesn't work very well for
> > file sharing. If we improve that (and a lot of the stuff I've talked about
> > with xBin is a path to doing that), then I suspect you would find a lot
> > more people using Usenet for file sharing.
>
> So I should fight your stuff to get a Usenet that's better suited to *my*
> needs (i.e. not used for file sharing)?
>
> Just kidding.

But there's a serious answer -- making usenet more binary aware should
have the effect of making it a better medium for text.

At the moment it is doing two things, but pretending it is only doing
one. This is inefficient, especially considering that it is /mainly/
doing what it is pretending it isn't.

Admitting that it is actually doing two thing will open it up to
improvements in both.

--
J.B. Moreno

Kai Henningsen

unread,

Jun 2, 2003, 1:34:00 PM6/2/03

to

pl...@newsreaders.com (J.B. Moreno) wrote on 02.06.03 in <020620031149577152%pl...@newsreaders.com>:

> In article <8mtRa...@khms.westfalen.de>,
> Kai Henningsen <kaih=8mtRa...@khms.westfalen.de> wrote:
>
> > cu...@kcwc.com (Curt Welch) wrote on 28.05.03 in
> > <20030528165321.545$6...@newsreader.com>:
> >
> > > The problems with usenet are not the cost of running servers or the huge
> > > bandwidth issues, it's the fact that it just doesn't work very well for
> > > file sharing. If we improve that (and a lot of the stuff I've talked
> > > about with xBin is a path to doing that), then I suspect you would find
> > > a lot more people using Usenet for file sharing.
> >
> > So I should fight your stuff to get a Usenet that's better suited to *my*
> > needs (i.e. not used for file sharing)?
> >
> > Just kidding.
>
> But there's a serious answer -- making usenet more binary aware should
> have the effect of making it a better medium for text.

I *really* don't think so.

J.B. Moreno

unread,

Jun 4, 2003, 12:03:08 AM6/4/03

to

In article <86brxfz...@number6.magda.ca>,
David Magda <dmagda+tr...@ee.ryerson.ca> wrote:

> "J.B. Moreno" <pl...@newsreaders.com> writes:
> [...]

> > But there's a serious answer -- making usenet more binary aware
> > should have the effect of making it a better medium for text.

> [...]
>
> What problems do you perceive in Usenet's current handling of text?

The two most obvious problems are binaries in text only groups, and
text messages getting lost in primarily binary groups. Retention could
be better...(for instance, what's the oldest message on your server for
alt.binaries.sounds.mp3.d?).

--
J.B. Moreno

J.B. Moreno

unread,

Jun 4, 2003, 12:08:23 AM6/4/03

to

In article <8n77N...@khms.westfalen.de>, Kai Henningsen
<kaih=8n77N...@khms.westfalen.de> wrote:

> pl...@newsreaders.com (J.B. Moreno)
-snip-

> > But there's a serious answer -- making usenet more binary aware should
> > have the effect of making it a better medium for text.
>
> I *really* don't think so.

How could knowing the real nature of what it is doing /not/ improve
things?

--
J.B. Moreno

Kai Henningsen

unread,

Jun 4, 2003, 6:22:00 PM6/4/03

to

By shifting resources even further to the dark side.

J.B. Moreno

unread,

Jun 4, 2003, 7:10:27 PM6/4/03

to

In article <3edd80c9$0$633$a186...@newsreader.visi.com>,
Mike Horwath <drec...@visi.com> wrote:

> J.B. Moreno <pl...@newsreaders.com> wrote:
> : The two most obvious problems are binaries in text only groups, and

> : text messages getting lost in primarily binary groups. Retention
> : could be better...(for instance, what's the oldest message on your
> : server for alt.binaries.sounds.mp3.d?).
>

> I have 12/25/2002 as the oldest in mine...
>
> What do you have?

Depends upon the server -- two have it set to a ~week, another to a ~2
days.

--
J.B. Moreno

bill davidsen

unread,

Jul 7, 2003, 5:10:23 PM7/7/03

to

In article <e5c11262.03052...@posting.google.com>,
Edward Buck <edwar...@netscape.net> wrote:

| ME:!*,comp.os.linux*/!local,!collabra-internal::
|
| Incoming feeds in incoming.conf and outgoing feeds in newsfeeds would
| limit feeds to comp.os.linux* on a server basis as well. If I only
| have comp.os.linux* in my active list, does it matter whether I
| exclude/include these newsgroups specifically in newsfeeds and

| incoming.conf? What happens when a newsfeed is pulled/pushed to a

| server via incoming.conf or newsfeeds that is NOT listed in the local
| active file?

An unwanted group should be dropped, but a crosspost including an
unwanted group will unless you prevent it.

| If I do all the above and have cleanfeed do some basic spam filtering,
| do you think I can reasonably allow external posting without
| authentication?

You really want your posting filter to block all cross-posting and any
group you don't carry. However, that would limit the ability to reply to
external posts. You *can* block posts unless they are replies to an
existing message already on the server, with the original newsgroups.
Takes a small bit of perl added to cleanfeed, but doable if your volume
is low.

If you run an open server you will be used to send spam, sad fact of
life. There are no painless ways to prevent it, strong filtering will
probably keep you off blocking lists, but just stock cleanfeed probably
won't be enough.

--
Bill Davidsen <davi...@tmr.com> CTO, TMR Associates
As we enjoy great advantages from inventions of others, we should be
glad of an opportunity to serve others by any invention of ours; and
this we should do freely and generously.
-Benjamin Franklin (who would have liked open source)

bill davidsen

unread,

Jul 7, 2003, 5:20:53 PM7/7/03

to

In article <bb0nnb$gb2$1...@puck.litech.org>,

Doesn't work that way. However, since the perl filter is persistent, you
can take a startup overhead hit (only once) and load the active file
into an associative array so you can filter on only groups carried. As
noted, this prevents replies to posts, to do that you need to do far
more and actually check that the "reply" is to a currently existing post
and has the same newsgroups. That's somewhat hardwer, and the simple way
is fairly high overhead. Since this server is going to be low use that's
not a problem.

SMOP.

bill davidsen

unread,

Jul 7, 2003, 5:40:06 PM7/7/03

to

In article <e5c11262.03052...@posting.google.com>,
Edward Buck <edwar...@netscape.net> wrote:

| It's interesting to look at the history of Usenet. It was essentially
| born out of the dial-up world (uucp connections). NNTP and modern
| hardware now allow for faster connections but the fundamental
| technology behind usenet has pretty much stayed the same. One server
| has a bunch of content and connects to another server to share all
| that content. I wonder if there's a way to leverage the new
| technology being developed in peer-to-peer technology to make a
| better, faster, more manageable Usenet?

There is not. You can develop something use, but it isn't usenet, even
if you use some existing technology. See below.

|
| I see two major problems in current Usenet technology. One, all the
| content is essentially being mirrored on every major Usenet server.

usenet <> internet

| This is in my opinion a terrible waste of bandwidth and storage.
| There is also no correlation between storage and usage. All the least
| accessed content maintains the same storage preference as the most
| accessed content.

Every server decides what content to accept, to keep, and how long. A
server can have it's own groups, its own peers, and its own policies.
There is no single point of failure unless you choose to set up peering
that way.

| Second, manual peering arrangements are a bit at
| odds with the self-service, always on, always connected paradigm of
| the modern Internet.

usenet <> internet

| This reality makes setting up and maintaining a
| news server very expensive. Combine that with the huge upfront
| storage requirement of setting up a "complete" server and it's no
| wonder that many ISP's have dropped Usenet as a basic service.

The problem started the moment we started allowing multipart binary
content. And I'm guilty, I moderated comp.binaries.ibm.pc and wrote
shar2, the first program (AFAIK) distributed via comp.sources which
would break up a binary file into the parts needed to post.

. . .

Your idea is not bad in any way, it's just not usenet. Usenet is just
not internet, and not just internet. It can be fed by many protocols
other than NNTP, the most common being compressed batches sent by
uucico. The advantage is that even dialup sites can pull these over the
network with one interaction per batch, and bandwidth is low. Satelite
feeds with NNTP are painful due to latency, uucp streams. Compressed (or
even raw) batches can be put on CD, DVD, or old-tech mag tape. Did a
feed to a secure site for several years, nothing that went in ever came
back out.

Please don't try to make usenet something else, make a new something
which works differently. Call it something else. Don't try to bind
usenet to the limits of what can be done with an end-to-end connection,
or a client server model.

BTW: you can get some (most?) of what you want with Diablo, which will
chain back to fetch articles not online and pull them from a server.

If you want a service with centralized control, limited to the Internet,
by all means create one. Usenet is anarchy, let's not give it up!

bill davidsen

unread,

Jul 7, 2003, 5:54:54 PM7/7/03

to

In article <020620031149577152%pl...@newsreaders.com>,
J.B. Moreno <pl...@newsreaders.com> wrote:

| But there's a serious answer -- making usenet more binary aware should
| have the effect of making it a better medium for text.
|
| At the moment it is doing two things, but pretending it is only doing
| one. This is inefficient, especially considering that it is /mainly/
| doing what it is pretending it isn't.
|
| Admitting that it is actually doing two thing will open it up to
| improvements in both.

My personal usenet is very binary aware, anything which is detected as
binary gets dropped ;-)

But you're on the right track here, what's needed is to stop people from
using usenet as a binary transport mechanism. The cost is high, the
return low, and there are better ways to do large binary files which
don't depend on pretending they're bundles of text messages.

The nice thing is that if we (the net collectively) provide such a
thing, use of usenet for binary will go away by itself, perhaps hastened
by a bit of prodding caused by major sites dropping groups.

John Miller

unread,

Jul 7, 2003, 7:19:36 PM7/7/03

to

bill davidsen wrote:
> But you're on the right track here, what's needed is to stop people from
> using usenet as a binary transport mechanism. The cost is high, the
> return low, and there are better ways to do large binary files which
> don't depend on pretending they're bundles of text messages.
>
> The nice thing is that if we (the net collectively) provide such a
> thing, use of usenet for binary will go away by itself, perhaps hastened
> by a bit of prodding caused by major sites dropping groups.

We can only hope. Not only is it a spectacular misapplication of
technology, using Usenet to move big binaries; everyone ends up unhappy:
the users who don't get "completion," and the ISPs who get loaded down with
both traffic and complaints.

Just out of curiosity, what would be the size of a full feed today without
the binary groups? Awareness of that alone should be enough to spur some
action.

--
John Miller, here since bnews

Captain Penny's Law:
You can fool all of the people some of the time, and
some of the people all of the time, but you Can't Fool Mom.

Road Warrior

unread,

Jul 7, 2003, 9:10:01 PM7/7/03

to

On Mon, 07 Jul 2003 21:54:54 GMT, bill davidsen wrote:

> In article <020620031149577152%pl...@newsreaders.com>,
> J.B. Moreno <pl...@newsreaders.com> wrote:

> My personal usenet is very binary aware, anything which is detected as
> binary gets dropped ;-)

And I'll bet that your filters are just as bad as those used by some
major ISP's. I've seen ISP's drop messages just because they had a
filename in the subject line or because they had something (usually a
date) that resembeled the standard multipart indicators in the subject
line.

> But you're on the right track here, what's needed is to stop people from
> using usenet as a binary transport mechanism. The cost is high, the
> return low, and there are better ways to do large binary files which
> don't depend on pretending they're bundles of text messages.

Sorry, but the cost (at least on my end) is paid for by my monthly
subscription to my premium Usenet server. As far as "better" ways of
transferring binary files, you are right, and fortunately yEnc brought
us a little closer to that (thanks Jurgen!).

> The nice thing is that if we (the net collectively) provide such a
> thing, use of usenet for binary will go away by itself, perhaps hastened
> by a bit of prodding caused by major sites dropping groups.

"WE" have been down that road already, and lots of people already use
the technology that you propose (it's called P2P, if you've been under a
rock for the last 3 or 4 years). Unfortunately, for the users, it has
more drawbacks than advantages, such as the speed that you can download
something is entirely dependent on the UPLOAD speed of the person
sending the file, and how many other people are trying to get files from
that same person. You seem to be speaking from the point of view of
someone who has never used either binary usenet nor P2P programs.

As for "major" sites dropping groups, which sites would you be talking
about? I know you couldn't possibly be referring to any of the premium
servers, because the day they stop carrying binary groups is the day
they go out of business.

--
Road Warrior
Please read the following before complaining about yEnc.
http://www.roadwarriorcomputers.com/yenc/yencfacts.html (link for most
of the world)
http://roadwarriorcomputers.netfirms.com/yenc/yencfacts.html (mirror for
UK & Europe)

Juergen Helbing

unread,

Jul 8, 2003, 2:02:20 AM7/8/03

to

John Miller <m...@privacy.net> wrote:

>We can only hope. Not only is it a spectacular misapplication of
>technology, using Usenet to move big binaries; everyone ends up unhappy:
>the users who don't get "completion," and the ISPs who get loaded down with
>both traffic and complaints.

We've been discussing this a lot here in n.s.nntp.
Basically Usenet _is_ a brilliant way to distribute binaries:
Reducing the traffic for a broadcast network to the lines
between the huge servers is a good - and very efficient - way.
The flood algorithm even gives the fastest possible distribution.
I've been designing an "FTP-Usenet" - but it always ends up
with the same strategies NNTP is using (groups, overviews,expirey...).

We are currently ending up with three simple problems:
* labelling binaries & multiparts
* creating additional 'fill-paths' to guarantee completeness.
* Avoiding the huge header traffic for splitted multiparts.
(yesterday I fetched 500.000 headers from 15 hosts just to find T3 ;-)

>Just out of curiosity, what would be the size of a full feed today without
>the binary groups? Awareness of that alone should be enough to spur some
>action.

AFAIK we are over the 800 Gig/day for entire Usenet and the
'Text-Usenet' is less than 1% of this.

But we're having already a "de-facto" split of Usenet already today:
* Plain text servers (almost free)
* ISP/institutional server with historical binary groups
* Full binary servers (almost pay-services)

The only group of servers which is involved into the real binary
flood problems are nowadays PAY-NSPs - and they are making
money with Binary Usenet. So they are happy with Binary Usenet
as it is.

IMHO the only thing we can (and should) do would be to have more
reliable paths between the "full servers" - and more convenience
(and reliability) for the users.

--
Juergen

Road Warrior

unread,

Jul 8, 2003, 5:01:41 AM7/8/03

to

> John Miller <m...@privacy.net> wrote:

Well said. I may not spend much time here, but I do get rather upset
every time someone comes around and suggests that binary Usenet should
be abandoned in favor of P2P type programs. Anyone who has ever used
both on a regular basis, especially for very large files, knows that
Usenet is FAR more reliable and, depending on the newsreader used,
easier to use.

Francois Petillon

unread,

Jul 8, 2003, 5:18:01 AM7/8/03

to

On Tue, 08 Jul 2003 08:02:20 +0200, Juergen Helbing wrote:
> IMHO the only thing we can (and should) do would be to have more
> reliable paths between the "full servers" - and more convenience
> (and reliability) for the users.

I am afraid this is not as simple. Most servers already have redundant
feeds but are not able to keep up with nowadays usenet traffic. I
stopped/limited backlogging for somes of my peers as their queues are
always filled up (backlogging is really I/O costy).

François

J.B. Moreno

unread,

Jul 8, 2003, 12:13:54 PM7/8/03

to

In article <O_lOa.2641$ji1....@newssvr17.news.prodigy.com>,
bill davidsen <davi...@tmr.com> wrote:

> J.B. Moreno <pl...@newsreaders.com> wrote:
>
> | But there's a serious answer -- making usenet more binary aware should
> | have the effect of making it a better medium for text.
> |
> | At the moment it is doing two things, but pretending it is only doing
> | one. This is inefficient, especially considering that it is /mainly/
> | doing what it is pretending it isn't.
> |
> | Admitting that it is actually doing two thing will open it up to
> | improvements in both.
>
> My personal usenet is very binary aware, anything which is detected as
> binary gets dropped ;-)

But it probably mis-identifies things rather frequently.

> But you're on the right track here, what's needed is to stop people from
> using usenet as a binary transport mechanism. The cost is high, the
> return low, and there are better ways to do large binary files which
> don't depend on pretending they're bundles of text messages.

No, that's not what is needed. What is needed is to clearly separate
text and binaries -- then those that want to pay for binaries can, and
those that don't won't.

There *are* problems with binaries (the most obvious being the huge
header traffic for multiparts), but the solution isn't to just say "do
away with them", as people obviously find it desirable even with the
existing problems.

--
J.B. Moreno

Mirco Romanato

unread,

Jul 8, 2003, 1:09:26 PM7/8/03

to

"Road Warrior" <roadwarrio...@yahoo.com> ha scritto nel
messaggio news:129lbrk8886t3.1shu31mtuth9a$.dlg@40tude.net...

Usenet must not be abandoned for P2P.
But it can be integrated with P2P.
If you look at the xBin threads in this group, you can see that they
are paths that are converging.

Usenet can fast transport much of data and proxying it near the user.
P2P (Gnutella/Gnutella2/eDonkey/etc.) can manage the "lost posts"
problem, because are developed for untrustable/unsecure/unaffidable
networks.

I think that from the moment the first Usenet aware Gnutella
application will emerge, we will see a gargantuan surge of the file
sharing trend.

This require only a medium skilled programmer to implement a raw
utility to post binary files with the need metadata, and to retrieve
the post and the metadata and feed the file chunks to the Gnutella
servent.

99% of the code is there.
It need only a few lines to glue all together.

Sorry I'm not a developer.
I try to assembly this, but I lack of minimal ability in programming
and I'm short of time.

Mirco

Juergen Helbing

unread,

Jul 8, 2003, 2:11:27 PM7/8/03

to

"Mirco Romanato" <painl...@yahoo.it> wrote:

>99% of the code is there.
>It need only a few lines to glue all together.

I wrote the remaining 1% and called it 'MyNews':
This is the P2P app for Usenet.
Users are sharing Usenet messages there (by message-id).
And they are using simple NNTP - because this is the only
way to share missing parts today.

But overall Usenet is still too perfect - and neticens are
not the type of people who are 'sharing P2P'.
They are posting - and paying.
(Something the copyright fanatics will never understand ;-)

So long
--
Juergen

Juergen Helbing

unread,

Jul 8, 2003, 2:13:14 PM7/8/03

to

"Francois Petillon" <fan...@proxad.net> wrote:

>I am afraid this is not as simple. Most servers already have redundant
>feeds but are not able to keep up with nowadays usenet traffic. I
>stopped/limited backlogging for somes of my peers as their queues are
>always filled up (backlogging is really I/O costy).

Thanks for the info.
I'm pretty sure that the standard flooding mechanism does its
very best. But there are more possibilities in the pipeline.
So much to do - so little time...

CU
--
Juergen

Kai Henningsen

unread,

Jul 8, 2003, 2:11:00 PM7/8/03

to

roadwarrio...@yahoo.com (Road Warrior) wrote on 07.07.03 in <ez7rw34iqxg3$.g40q3umt...@40tude.net>:

> On Mon, 07 Jul 2003 21:54:54 GMT, bill davidsen wrote:
>
> > In article <020620031149577152%pl...@newsreaders.com>,
> > J.B. Moreno <pl...@newsreaders.com> wrote:
>
> >| But there's a serious answer -- making usenet more binary aware should
> >| have the effect of making it a better medium for text.
> >|
> >| At the moment it is doing two things, but pretending it is only doing
> >| one. This is inefficient, especially considering that it is /mainly/
> >| doing what it is pretending it isn't.
> >|
> >| Admitting that it is actually doing two thing will open it up to
> >| improvements in both.
>
> > My personal usenet is very binary aware, anything which is detected as
> > binary gets dropped ;-)
>
> And I'll bet that your filters are just as bad as those used by some
> major ISP's. I've seen ISP's drop messages just because they had a
> filename in the subject line or because they had something (usually a
> date) that resembeled the standard multipart indicators in the subject
> line.

Well, we all know there are incompetently run news servers.

The most trivial out-of-the-box filtering solution - cleanfeed - never had
these problems, to my knowledge. It's pretty good at binary filtering.

Why anyone would willingly run something worse ...

> > But you're on the right track here, what's needed is to stop people from
> > using usenet as a binary transport mechanism. The cost is high, the
> > return low, and there are better ways to do large binary files which
> > don't depend on pretending they're bundles of text messages.

I've stopped hoping for that. I *am* hoping that we get a cleaner
separation between text and binary Usenet, so the latter doesn't put so
much stress on the former. And so I can forget that binary Usenet even
exists, outside of standardization working groups.

> Sorry, but the cost (at least on my end) is paid for by my monthly
> subscription to my premium Usenet server. As far as "better" ways of
> transferring binary files, you are right, and fortunately yEnc brought
> us a little closer to that (thanks Jurgen!).

It is pretty clear to me that binary Usenet works best with a small number
of big servers, all of which need to get paid by their users, whereas text
Usenet works better with a large number of much smaller, heavily
crosslinked servers, many of which can be free, or essentially free.

Binary Usenet needs serious muscle. Text Usenet doesn't.

> As for "major" sites dropping groups, which sites would you be talking
> about? I know you couldn't possibly be referring to any of the premium
> servers, because the day they stop carrying binary groups is the day
> they go out of business.

To me, major sites are stuff like the news.cis.dfn.de (*the* largest-by-
count-of-users text Usenet server world-wide, I believe) or news.t-
online.de (*the* largest German ISP), which run binary-free and have for
as long as I've known about them - that is, years. Or perhaps Google (the
best known access to Usenet). Also binary-free.

Hmm. I think I will expand my analysis script to include a path entry
counter, and see which sites occur most in the subset of text Usenet I
get.

Marco d'Itri

unread,

Jul 8, 2003, 5:15:35 PM7/8/03

to

kaih=8pUI4...@khms.westfalen.de wrote:

>It is pretty clear to me that binary Usenet works best with a small number
>of big servers, all of which need to get paid by their users, whereas text

I do not think that separating text and binaries is a big deal, because
misplaced binaries are already considered net.abuse (or close) and there
are plenty of good technological tools (cleanfeed, the diablo feed
classification engine, etc) which allow to get a feed without binaries.

Nowadays the big problem is differentiating between single part and
multipart binaries. INN does not support classifying articles, there are
no canonical "multipart binaries newsgroups", and even filtering by size
is hard because there are assholes posting them in 200-300 KB parts to
avoid filters.

--
ciao, |
Marco | * The Internet is full. Go away. -- Joel Furr *

Kai Henningsen

unread,

Jul 8, 2003, 5:12:00 PM7/8/03

to

ka...@khms.westfalen.de (Kai Henningsen) wrote on 08.07.03 in <8pUI4...@khms.westfalen.de>:

> Hmm. I think I will expand my analysis script to include a path entry
> counter, and see which sites occur most in the subset of text Usenet I
> get.

And here's the unfiltered top 20, as seen from colo:

68142 newsfeed-east.nntpserver.com
71127 cox.net
72650 supernews.com
75811 nntpserver.com
79453 uni-berlin.de
83159 fu-berlin.de
84848 53ab2750
102074 elnk-pas-nf2
106051 postnews1.google.com
111773 newsfeed.earthlink.net
134994 newsfeed.fjserv.net
134994 newsfeed.icl.net
158316 newshub.sdsu.edu
168497 newsfeed.stanford.edu
259841 headwall.stanford.edu
617950 news-spur1.maxwell.syr.edu
624366 news.maxwell.syr.edu
737642 news.litech.org
904240 not-for-mail
1030876 Path: colo.khms.westfalen.de

(Obviously, 100% of all articles I see must have that last path element,
so this was a spool of 1030876 articles. About 90% obviously ended in not-
for-mail; about 10% were posted at Google. The only "premium servers" I
recognize in there are in the 7-8% range. Hmm, my link to Russ seems
b0rken.)

Russ Allbery

unread,

Jul 8, 2003, 5:37:41 PM7/8/03

to

Kai Henningsen <kaih=8pVY5...@khms.westfalen.de> writes:

> (Obviously, 100% of all articles I see must have that last path element,
> so this was a spool of 1030876 articles. About 90% obviously ended in
> not- for-mail; about 10% were posted at Google. The only "premium
> servers" I recognize in there are in the 7-8% range. Hmm, my link to
> Russ seems b0rken.)

My test server has been up and down a lot lately. I need to upgrade it to
the latest stable version of INN and that should help a lot.

You'll see Stanford hosts on a lot of the text Usenet because I appear to
be Google's primary outgoing feed currently. (I also peer with
fu-berlin.)

--
Russ Allbery (r...@stanford.edu) <http://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<http://www.eyrie.org/~eagle/faqs/questions.html> explains why.

Kai Henningsen

unread,

Jul 9, 2003, 2:55:00 AM7/9/03

to

ab...@1.2.0.1.0.0.1.e.f.f.3.ip6.int (Marco d'Itri) wrote on 08.07.03 in <befc9n$mho$1...@wonderland.linux.it>:

> Nowadays the big problem is differentiating between single part and
> multipart binaries.

Aah, but that's in the part of Usenet I don't want to get anyway.

Mirco Romanato

unread,

Jul 9, 2003, 4:38:33 AM7/9/03

to

"Juergen Helbing" <arch...@i3w.com> ha scritto nel messaggio
news:2711k....@archiver.winews.net...
> "Mirco Romanato" <painl...@yahoo.it> wrote:

> >99% of the code is there.
> >It need only a few lines to glue all together.

> I wrote the remaining 1% and called it 'MyNews':
> This is the P2P app for Usenet.

How many people use it?

> Users are sharing Usenet messages there (by message-id).
> And they are using simple NNTP - because this is the only
> way to share missing parts today.

I don't write about today.
I write about tomorrow.

Can I use a Magnet link or a CDM4 link (or like) from a web service
like Sharereactor or Sharelive to search and DL a file?
Can I be sure that the file I DL is the same I requested?
Can I detect bad part and re-DL it from a different source
automatically?

> But overall Usenet is still too perfect - and neticens are
> not the type of people who are 'sharing P2P'.
> They are posting - and paying.
> (Something the copyright fanatics will never understand ;-)

ISP's newsserver are paid from customer.
And it is their gain to reduce outbound P2P traffic.

Mirco

Road Warrior

unread,

Jul 9, 2003, 5:57:12 AM7/9/03

to

On Wed, 09 Jul 2003 08:38:33 GMT, Mirco Romanato wrote:

> Can I detect bad part and re-DL it from a different source
> automatically?

FWIW, that is one of the amazing features of yEnc, when it is fully
implemented in a multi-server newsreader.

GregMo

unread,

Jul 9, 2003, 12:56:35 PM7/9/03

to

Marco d'Itri <ab...@1.2.0.1.0.0.1.e.f.f.3.ip6.int> wrote in
news:befc9n$mho$1...@wonderland.linux.it:

> and even filtering by size is hard because there are assholes posting
> them in 200-300 KB parts to avoid filters.

Eh? Not to say that some people don't get long winded, and there's the
occasional bad poster after bad poster that doesn't snip, but really... How
often do you think a text article will go over 150k? Over 200k? I'm not
saying it doesn't happen, but imho, anything over 200k is either in bad need
of snipping, or else would probably be better served as a web page somewhere.
I personally kill everything over 180k myself and I've not noticed anything
my filters killed that was text in the last 2 months, and I could very likely
lower that another 30k and still be safe.

Cheers,
GregMo

--
UNIX Sex
{look;find;gawk;talk;date;grep;touch;finger;find;flex;unzip;head;strip;
top;mount;workbone;fsck;yes;gasp;fsck;more;fsck;more;more;yes;tail;
fsck;eject;umount;make clean;zip;done;split;exit;uptime>/var/log/brag}

Juergen Helbing

unread,

Jul 9, 2003, 1:55:29 PM7/9/03

to

"Mirco Romanato" <painl...@yahoo.it> wrote:

>> I wrote the remaining 1% and called it 'MyNews':
>> This is the P2P app for Usenet.
>
>How many people use it?

In the last four weeks I had > 1000 active hosts
contacting one of the root servers.

A few dozens are using MyNews as a filler heavily.

>> Users are sharing Usenet messages there (by message-id).
>> And they are using simple NNTP - because this is the only
>> way to share missing parts today.
>
>I don't write about today.
>I write about tomorrow.

Yes, I understand.
Combining Usenet with alternative paths might be possible.
But there must be a lot of work done before it will be possible.

>Can I use a Magnet link or a CDM4 link (or like) from a web service
>like Sharereactor or Sharelive to search and DL a file?

news://<msgid> ?

>Can I be sure that the file I DL is the same I requested?`

LOL, this depends on the MsgId - and the quality of the fakes.
The 'xBin' solution makes it pretty secure.

>Can I detect bad part and re-DL it from a different source
>automatically?

Good news-servers will not deliver xBin protected messages
at all if their md5 code does not match ;-)
Reloading bad parts is easy to do on Usenet. (I'm doing this
all the time). The problem is to detect corruption properly.
And news-readers which support this feature.

>> But overall Usenet is still too perfect - and neticens are
>> not the type of people who are 'sharing P2P'.
>> They are posting - and paying.
>> (Something the copyright fanatics will never understand ;-)
>
>ISP's newsserver are paid from customer.
>And it is their gain to reduce outbound P2P traffic.

Yes, really.
Their interest in P2P for Usenet Binaries is limited.

CU
--
Juergen

Juergen Helbing

unread,

Jul 9, 2003, 1:58:38 PM7/9/03

to

drec...@visi.com (Mike Horwath) wrote:

>Road Warrior <roadwarrio...@yahoo.com> wrote:
>: FWIW, that is one of the amazing features of yEnc, when it is fully

>: implemented in a multi-server newsreader.
>

>Huh?
>
>Sure, the newsreader has something to do with it, but yEnc has
>*nothing* to do with it.

Reloading a 'bad part' requires proper detection that a message
has been corrupted. yEnc is the only used encoding which offers
a checksum 'per message'.

Perhaps xBin will replace it one day - with an even better method.
But "the next round" is still waiting for gzip-8bit ;-)
The next 'official' Binary Usenet Message Format will be a big hit
if we succeed to combine all the work which is already in progress.

CU
--
Juergen

Dr.Ruud

unread,

Jul 9, 2003, 4:55:56 PM7/9/03

to

Juergen Helbing skribis:

> news://<msgid> ?

It's <URL:news:<msgid>>
and <URL:news://news-server.domain.tld/news.group.name>

--
Affijn, Ruud

Gene Mat

unread,

Jul 9, 2003, 5:33:33 PM7/9/03

to

>>
>> And I'll bet that your filters are just as bad as those used by some
>> major ISP's. I've seen ISP's drop messages just because they had a
>> filename in the subject line or because they had something (usually a
>> date) that resembeled the standard multipart indicators in the subject
>> line.

You can just filter on Article byte size. If it's over 1MB then it must be
binary -).

-Gene Mat

Marco d'Itri

unread,

Jul 9, 2003, 5:44:01 PM7/9/03

to

Greg-no...@wyld-please-ryde.net wrote:

>> and even filtering by size is hard because there are assholes posting
>> them in 200-300 KB parts to avoid filters.
>Eh? Not to say that some people don't get long winded, and there's the
>occasional bad poster after bad poster that doesn't snip, but really... How
>often do you think a text article will go over 150k? Over 200k? I'm not

The point is not text articles, but being able to sort single part
binaries (pr0n pics, which lusers like a lot) among the multipart
ones (which are what is really expensive to carry).

GregMo

unread,

Jul 9, 2003, 6:24:57 PM7/9/03

to

Marco d'Itri <ab...@1.2.0.1.0.0.1.e.f.f.3.ip6.int> wrote in

news:bei2b1$8oo$1...@wonderland.linux.it:

>>> and even filtering by size is hard because there are assholes posting
>>> them in 200-300 KB parts to avoid filters.
>>
>> Eh? Not to say that some people don't get long winded, and there's the
>> occasional bad poster after bad poster that doesn't snip, but really...
>> How often do you think a text article will go over 150k? Over 200k?
>

> The point is not text articles, but being able to sort single part
> binaries (pr0n pics, which lusers like a lot) among the multipart
> ones (which are what is really expensive to carry).

Hrmmm, my last post came out to roughly 390kb. Given that pics can often
exceed this number, I guess that makes me an asshole too huh? They're not
doing it to get pass filters. What they're aiming for is better
propergation. I could sit here all day long posting 4mb articles and it
would get to EasyNews, SuperNews, NewsReader, and the like, just dandy, but
I'd have people from RR, AT&T, etc, bitching like hell because their servers
bounced the articles.

If someone was really posting small just for the sake of getting past filters
why do they stop at 200k? Why not 50k article sizes? Why not 10k?

Marco d'Itri

unread,

Jul 9, 2003, 7:01:39 PM7/9/03

to

Greg-no...@wyld-please-ryde.net wrote:

>Hrmmm, my last post came out to roughly 390kb. Given that pics can often
>exceed this number, I guess that makes me an asshole too huh? They're not
>doing it to get pass filters. What they're aiming for is better
>propergation. I could sit here all day long posting 4mb articles and it
>would get to EasyNews, SuperNews, NewsReader, and the like, just dandy, but
>I'd have people from RR, AT&T, etc, bitching like hell because their servers
>bounced the articles.

Now try to think about *why* these sites are limiting the size of
incoming articles.
BTW, the it.* newsgroups carrying multipart binaries use 1 MB parts and
this is has never been a problem for local sites (i.e. the ones
interested in this content).

>If someone was really posting small just for the sake of getting past filters
>why do they stop at 200k? Why not 50k article sizes? Why not 10k?

Given enough time I do not doubt that more people will try to do this.

Road Warrior

unread,

Jul 9, 2003, 9:17:48 PM7/9/03

to

On 09 Jul 2003 13:24:02 GMT, Mike Horwath wrote:

> Road Warrior <roadwarrio...@yahoo.com> wrote:
>: FWIW, that is one of the amazing features of yEnc, when it is fully

>: implemented in a multi-server newsreader.

> Huh?

> Sure, the newsreader has something to do with it, but yEnc has
> *nothing* to do with it.

Sorry, but you are showing your ignorance of yEnc here. yEnc has CRC
checks both for the entire file, and also for each part of a multipart
binary. UUencode does not have CRC checks at all. What the person I
replied to had asked (which you snipped when replying to me) was this

"Can I detect bad part and re-DL it from a different source

automatically?". The "detect bad part" is only possible with yEnc, and
that is what I was pointing out. A good, multi-server newsreader, on
detecting a bad CRC value in a yEnc encoded part of a multipart binary,
can request that same part from a different server. A UUencoded post
doesn't have this option. The only way you can tell that it is bad is
if the poster provides a SFV or PAR file, but that only will check the
entire file, and only after it has been completely downloaded.

Road Warrior

unread,

Jul 9, 2003, 9:19:28 PM7/9/03

to

> -Gene Mat

That's hardly accurate. What about picture newsgroups where the
standard file is under 512KB?

Mirco Romanato

unread,

Jul 10, 2003, 8:52:53 AM7/10/03

to

"Road Warrior" <roadwarrio...@yahoo.com> ha scritto nel
messaggio news:1n0pzq5h19fhu.113jcqvpkyfpu$.dlg@40tude.net...

Rad Warrior and Mike Horwath are right, in the same time.
The problem is that they are right about different aspects of the
problem.

Yenc have CRC for the message and the full file (I trust Road Warrior
about this, because I don't know Yenc well enough).
But Yenc messages can not be checked against the full file before
having the full file DLed.
For this is needed a Merkle Hash Tree type of hash.
One example of this is TigerTree that is adopted from Gnutella
servent.
With it they can check that the file chunk downloaded is matching with
the tree and the tree is matching with the root value of the tree.

magnet:?xt=urn:bitprint:XUIKIUJXWXWRY4YK6OSYSSPPEB6CINZG.U75CG2ZRH5CFX
BPAVPAPZF2EYHBWUJPMPJXNZIY&dn=Eve_Burst_Error.CD2of3.HentaiHeaven.dr.a
g.ShareReactor.rar

With this I can check the SHA-1 of the file (the part from "bitprint:"
to ".") when I have downloaded the full file, but I can request the
tree data about the file requesting the root TigerTreeHash value (the
part from "." to "&dn").
Then I can check the tree data against the TTH and be secure they
match.
Then I can check downloaded chunks against their hashes and be sure
the data is not corrupted (intentionally or not).

If, with a click of my mouse on an hyperlink, I can start a search
from my newsreader in the newsserver, to DL chunks of the desidered
file, the

Mirco Romanato

unread,

Jul 10, 2003, 8:58:07 AM7/10/03

to

"Road Warrior" <roadwarrio...@yahoo.com> ha scritto nel
messaggio news:1n0pzq5h19fhu.113jcqvpkyfpu$.dlg@40tude.net...

Rad Warrior and Mike Horwath are right, in the same time.

The problem is that they are right about different aspects of the
problem.

Yenc have CRC for the message and the full file (I trust Road Warrior
about this, because I don't know Yenc well enough).
But Yenc messages can not be checked against the full file before
having the full file DLed.
For this is needed a Merkle Hash Tree type of hash.
One example of this is TigerTree that is adopted from Gnutella
servent.
With it they can check that the file chunk downloaded is matching with
the tree and the tree is matching with the root value of the tree.

magnet:?xt=urn:bitprint:XUIKIUJXWXWRY4YK6OSYSSPPEB6CINZG.U75CG2ZRH5CFX
BPAVPAPZF2EYHBWUJPMPJXNZIY&dn=Eve_Burst_Error.CD2of3.HentaiHeaven.dr.a
g.ShareReactor.rar

With this I can check the SHA-1 of the file (the part from "bitprint:"
to ".") when I have downloaded the full file, but I can request the
tree data about the file requesting the root TigerTreeHash value (the
part from "." to "&dn").
Then I can check the tree data against the TTH and be secure they
match.
Then I can check downloaded chunks against their hashes and be sure
the data is not corrupted (intentionally or not).

With a click of my mouse on an hyperlink, I want start a search

from my newsreader in the newsserver, to DL chunks of the desidered

file, then be sure that the data is right, then recorver lost parts
via P2P filesharing (Gnutella, Freenet, or other).

This is not Filesharing+Usenet;
this is Filesharing*Usenet in the most conservative OR
Filesharing^Usenet in the most optimistic.

Mirco

Mirco Romanato

unread,

Jul 10, 2003, 9:37:09 AM7/10/03

to

"Road Warrior" <roadwarrio...@yahoo.com> ha scritto nel
messaggio news:1n0pzq5h19fhu.113jcqvpkyfpu$.dlg@40tude.net...

Rad Warrior and Mike Horwath are right, in the same time.

Juergen Helbing

unread,

Jul 10, 2003, 9:22:58 AM7/10/03

to

"Mirco Romanato" <painl...@yahoo.it> wrote:

>With a click of my mouse on an hyperlink, I want start a search
>from my newsreader in the newsserver, to DL chunks of the desidered
>file, then be sure that the data is right, then recorver lost parts
>via P2P filesharing (Gnutella, Freenet, or other).

What do you believe I'm doing for more than four years now ?
I'm downloading everything I get from my main news-servers.
And the remaining rest comes from other news-servers
(secondary sources or other MyNews users)....
And I'm doing all this for a long time now without any P2P
app which requires _always_ that people are posting in
a specified way - with any kind of hash-trees.....

(Running a personal news-server is really easy today ;-)

>This is not Filesharing+Usenet;
>this is Filesharing*Usenet in the most conservative OR
>Filesharing^Usenet in the most optimistic.

You would be basically right IF:

* neticens would not split their huge binaries in many ways.
* netciens would not RAR/ZIP/ACE the splitted files.

Today you need to download all segments of a multipart
message. These are the RAR parts. Then you need to UNRAR
them (RAR is mostly used - it is not freeware/public domain).
Then you need to calculate the necessary hash-trees to make
the file available on other P2P systems.
People who are missing single messages are STILL not able
to get the entire binary together, because it is posted differently
to Usenet. And while YENC would permit to find out which
bytes you are missing (if neticens would post in 'raw' format -
what they dont) then you still have to solve the problem that
huge multiparts must be -double- splitted: Due to the current
limitattions of Usenet a multipart cannot have more than 999
segments - and a segment cannot be larger than 500 kBytes.
So everything larger than 500 Megs (most videos) must be
double splitted.

I did ask some neticens whether they would post differently.
And the answer was always: why ? it works perfectly with
RAR+Multiparts+PAR/Par2.....

Creating a 'new binary message format' which supports
all necessary features to be "really better" would mean:
* Split down to 999 segs.
* Split huge videos to raw parts
* Use a public domain compressor (gzip)
* Create proper positioning information BEFORE compressing
* Create proper MD5/SHA for segments and the entire file.
Last not least:
* Be compatible to existing newsreaders.

I'm working on solutions for this problem for a longer time
now and whenever I believe I could have found a solution
then it is invalidated by something new.
And the worst thing: Neticens are happy with Split+RAR+Seg+Par.
(And it really works pretty good - too good to make any changes).

I'm not seeing good chances for a marriage of Usenet and P2P.
The basic approach (and purpose) is too different - and Usenet
is too archaic.

Just a personal opinion.
--
Juergen

Juergen Helbing

unread,

Jul 10, 2003, 9:23:54 AM7/10/03

to

"Dr.Ruud" <'t.leven.is.fijn@life.i$.great> wrote:

>Juergen Helbing skribis:
>
>> news://<msgid> ?
>
>It's <URL:news:<msgid>>
>and <URL:news://news-server.domain.tld/news.group.name>

Thanks for the correction.
--
Juergen

Juergen Helbing

unread,

Jul 10, 2003, 9:37:53 AM7/10/03

to

Marco d'Itri <ab...@1.2.0.1.0.0.1.e.f.f.3.ip6.int> wrote:

>The point is not text articles, but being able to sort single part
>binaries (pr0n pics, which lusers like a lot) among the multipart
>ones (which are what is really expensive to carry).

A simple filter on the multipart indicator (m/n) does this easily.
Of course this requires a check on the sending side - and the
info that multiparts are not really wanted for that downstream.

But the problem to have just 'pictures' and no movies/sounds
is unsolvable: Today pics are often posted in 2-5 parts and
I've often seen HQ scans with 500-2000 kBytes.

Solution:
Hosts which want to filter binaries strongly cannot use a
pushing feed - but act as a normal newsreader with good
filtering on the xover. Done.

Solution2:
Hosts which are receiving POSTs are creating appropriate
headers and/or msgids themselves.
(Would never work - too many old hosts).

Solution3:
Someone is analysing a full binary feed and makes the
reliable info what-is-what available on a secure path
(signed articles or special routing) for the downloading
client of a small ISP who wants jsut 'filtered binaries'.

brain in a storm
--
Juergen

Marco d'Itri

unread,

Jul 10, 2003, 12:12:14 PM7/10/03

to

arch...@i3w.com wrote:

>A simple filter on the multipart indicator (m/n) does this easily.

This is highly unreliable. Guess why people tried to suggest you that
MIME is the solution?

Gene Mat

unread,

Jul 10, 2003, 4:17:17 PM7/10/03

to

Road Warrior <roadwarrio...@yahoo.com> wrote in <17qzktslmfdj1
$.7i6e30f56ptl$.d...@40tude.net>:

>
>That's hardly accurate. What about picture newsgroups where the
>standard file is under 512KB?
>

That's the idea, I want to keep text, pictures, and maybe small multimedia
posts on my server. Without ant DVD, CD, and large apps.

I think < 512KB filter should be adequate. You think large binaries can
still slip under 512KB? 256KB?

-Gene Mat

GregMo

unread,

Jul 10, 2003, 5:05:15 PM7/10/03

to

geneSPA...@yahoo.com (Gene Mat) wrote in
news:Xns93B4A5B5A7411ge...@167.206.3.2:

I think you're missing something here. You can't filter out DVD posts from
pic posts with certainty because of the fact that the the post of a DVD is
broken down into articles and those articles can be of any size that the user
posting set it to. As an example, if I were so inclined, in theory I could
make an image of my 80gb HD and post that image to usenet in many thousands
of articles, any one of which could be no bigger than 10k. Granted, this is
an extreme, but it accurately illistrates the issue.

Andrew - Supernews

unread,

Jul 10, 2003, 5:50:39 PM7/10/03

to

In article <Xns93B4A5B5A7411ge...@167.206.3.2>, Gene Mat wrote:
> That's the idea, I want to keep text, pictures, and maybe small multimedia
> posts on my server. Without ant DVD, CD, and large apps.
>
> I think < 512KB filter should be adequate. You think large binaries can
> still slip under 512KB? 256KB?

large binaries are more often than not posted with part sizes under 512k.
We see part sizes as small as 40k (we filter those out, we impose a minimum
part size on multiparts). A 256k cutoff will drop a noticable number of
pictures (and some text posts, e.g. FAQs) while still letting quite a lot of
big multiparts through.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

Jesper Harder

unread,

Jul 10, 2003, 6:12:56 PM7/10/03

to

"Dr.Ruud" <'t.leven.is.fijn@life.i$.great> writes:

> It's <URL:news:<msgid>>
> and <URL:news://news-server.domain.tld/news.group.name>

No, the last one is _not_ a valid URL. From RFC 1738:

Dr.Ruud

unread,

Jul 10, 2003, 8:14:22 PM7/10/03

to

Jesper Harder skribis:
> Dr.Ruud:

>> It's <URL:news:<msgid>>
>> and URL:news://news-server.domain.tld/news.group.name

> No, the last one is _not_ a valid URL.

I assume you are right there.

> From RFC 1738:

>> A news URL takes one of two forms:
>>
>> news:<newsgroup-name>
>> news:<message-id>

I read there that <message-id> should be without the enclosing
"<" and ">", but that wording is a bit strange because they
are part of the message-id, so not really 'enclosing'.
We could start calling it the naked form.

>> [...]
>> A nntp URL take the form:
>>
>> nntp://<host>:<port>/<newsgroup-name>/<article-number>

My current Windows-pc does not support that, it does however
support <URL:news://fb1.euro.net/nl.announce>
which results (in Outlook Express) in a new account pointing
to that host (if that account was not there yet), and a
subscription to that newsgroup, which all together is quite
handy.

--
Affijn, Ruud

Jesper Harder

unread,

Jul 10, 2003, 11:00:55 PM7/10/03

to

"Dr.Ruud" <'t.leven.is.fijn@life.i$.great> writes:

> Jesper Harder skribis:

>
>> From RFC 1738:
>
>>> A news URL takes one of two forms:
>>>
>>> news:<newsgroup-name>
>>> news:<message-id>
>
> I read there that <message-id> should be without the enclosing
> "<" and ">", but that wording is a bit strange because they
> are part of the message-id, so not really 'enclosing'.

That's because I didn't quote the entire section from RFC 1738 -- It's
explained in the part that I omitted.

Genemat

unread,

Jul 11, 2003, 4:04:58 PM7/11/03

to

Andrew - Supernews <andrew...@supernews.com> wrote in
news:slrnbgrntf.ev6...@trinity.supernews.net:

> In article <Xns93B4A5B5A7411ge...@167.206.3.2>, Gene
> Mat wrote:
>> That's the idea, I want to keep text, pictures, and maybe small
>> multimedia posts on my server. Without ant DVD, CD, and large apps.
>>
>> I think < 512KB filter should be adequate. You think large binaries
>> can still slip under 512KB? 256KB?
>
> large binaries are more often than not posted with part sizes under
> 512k. We see part sizes as small as 40k (we filter those out, we
> impose a minimum part size on multiparts). A 256k cutoff will drop a
> noticable number of pictures (and some text posts, e.g. FAQs) while
> still letting quite a lot of big multiparts through.
>

Andrew on my server most multi-part binaries are from 1MB - 4MB. I do still
want the small binaries (pictures, multimedia) on my server. So I thought
that 512KB would be adequate.

-GeneMat

Juergen Helbing

unread,

Jul 12, 2003, 2:26:03 AM7/12/03

to

Marco d'Itri <ab...@1.2.0.1.0.0.1.e.f.f.3.ip6.int> wrote:

>arch...@i3w.com wrote:
>
>>A simple filter on the multipart indicator (m/n) does this easily.

>This is highly unreliable.

? What ?
The entire Binary Usenet is currently based on that multipart indicator.
Why would a filter on this should be not reliable ?

>Guess why people tried to suggest you that
>MIME is the solution?

I agree - but everything which makes binary identification easier
is actually not very popular among neticens (RIAA).
Even the best proposals and solutions cannot be introduced
until we've found also a solution to protect the neticens.

Some people might wonder why I was so inactive in the last months:
I'm waiting for gzip-8, xBin and for an inspiration how to give
neticens privacy without opening Usenet entirely to spam and trolls.

A related question (concerning privacy) to you:
You are using a funny mail-address: ab...@1.2.0.1.0.0.1.e.f.f.3.ip6.int
Are you meanwhile using 'throw away' mail addresses to post to Usenet?
I'm also looking for a solution how a news-server can generate
the FROM header automatically so that reply mail is routed anonymously
through the news+mail-server of the ISP/NSP.
If you dont want to explain this in public then I'd be happy for a mail.

CU
--
Juergen

Andrew - Supernews

unread,

Jul 12, 2003, 8:49:38 AM7/12/03

to

In article <03268....@archiver.winews.net>, Juergen Helbing wrote:
> Marco d'Itri <ab...@1.2.0.1.0.0.1.e.f.f.3.ip6.int> wrote:
>>This is highly unreliable.
>
> ? What ?
> The entire Binary Usenet is currently based on that multipart indicator.
> Why would a filter on this should be not reliable ?

because it isn't. Lots of non-binary posts, or non-multipart binaries, have
content in the subject line which is indistinguishable from the (m/n)
part indicator. We get a trickle of very hard to prevent false-positives
from our too-many-parts/parts-too-small filter which is the only filter we
have that pays much attention to the (m/n).

Marco d'Itri

unread,

Jul 12, 2003, 8:29:21 AM7/12/03

to

arch...@i3w.com wrote:

>>>A simple filter on the multipart indicator (m/n) does this easily.
>
>>This is highly unreliable.
>
>? What ?
>The entire Binary Usenet is currently based on that multipart indicator.
>Why would a filter on this should be not reliable ?

Because it does not guarantees that an article with (m/n) in the subject
is really a multipart binary, it could be a long FAQ too...

>>Guess why people tried to suggest you that
>>MIME is the solution?
>I agree - but everything which makes binary identification easier
>is actually not very popular among neticens (RIAA).

This is stupid. If downloaders can find binaries then the RIAA can too,
and they even have many more resources.

>You are using a funny mail-address: ab...@1.2.0.1.0.0.1.e.f.f.3.ip6.int
>Are you meanwhile using 'throw away' mail addresses to post to Usenet?

I do not understand your comment. ab...@1.2.0.1.0.0.1.e.f.f.3.ip6.int is
a working email address, if this is what you are asking.

>I'm also looking for a solution how a news-server can generate
>the FROM header automatically so that reply mail is routed anonymously
>through the news+mail-server of the ISP/NSP.

This can be done trivially with the perl hooks of INN, some kind of
database and query access to the ISP accounting servers.
Details will depend on the mail server and accounting server used.

Andrew - Supernews

unread,

Jul 12, 2003, 9:06:22 AM7/12/03

to

In article <Xns93B5A393D149...@167.206.3.2>, Genemat wrote:
> Andrew - Supernews <andrew...@supernews.com> wrote in
> news:slrnbgrntf.ev6...@trinity.supernews.net:

>> large binaries are more often than not posted with part sizes under
>> 512k. We see part sizes as small as 40k (we filter those out, we
>> impose a minimum part size on multiparts). A 256k cutoff will drop a
>> noticable number of pictures (and some text posts, e.g. FAQs) while
>> still letting quite a lot of big multiparts through.
>
> Andrew on my server most multi-part binaries are from 1MB - 4MB. I do still
> want the small binaries (pictures, multimedia) on my server. So I thought
> that 512KB would be adequate.

either your server is highly atypical or you're looking at the size of
reassembled parts rather than the sizes of the actual articles.

Here are our stats for yesterday (counting only apparent multiparts):

size in KB articles gigabytes
0-50 17848 0.26
50-100 10112 0.75
100-150 47246 5.50
150-200 106973 18.46
200-250 181437 41.40
250-300 116027 29.52
300-350 488029 144.49
350-400 195816 70.55
400-450 150679 60.78
450-500 63253 28.42
500-550 33915 16.66
550-600 10396 5.72
600-650 301087 182.93
650-700 3488 2.22
700-750 9078 6.31
750-800 39149 28.65
800-850 1747 1.35
850-900 2923 2.46
900-950 2855 2.49
950-1000 7619 6.99
1000-1050 786 0.77
1050-1100 9 0.01
1100-1150 11 0.01
1150-1200 151 0.17
1200-1250 28 0.03
1250-1300 1733 2.11
1350-1400 59 0.08
1400-1450 220 0.30
1500-1550 513 0.74
1550-1600 107 0.16
1600-1650 1 0.00
1800-1850 17 0.03
1850-1900 1 0.00
1900-1950 1772 3.27
2050-2100 605 1.20
3450-3500 16 0.05
3700-3750 41 0.15
3800-3850 117 0.43

J.B. Moreno

unread,

Jul 12, 2003, 4:15:00 PM7/12/03

to

In article <slrnbh00v2.ev6...@trinity.supernews.net>,

Andrew - Supernews <andrew...@supernews.com> wrote:

> In article <03268....@archiver.winews.net>, Juergen Helbing wrote:
> > Marco d'Itri <ab...@1.2.0.1.0.0.1.e.f.f.3.ip6.int> wrote:
> >>This is highly unreliable.
> >
> > ? What ?
> > The entire Binary Usenet is currently based on that multipart indicator.
> > Why would a filter on this should be not reliable ?
>
> because it isn't. Lots of non-binary posts, or non-multipart binaries, have
> content in the subject line which is indistinguishable from the (m/n)
> part indicator. We get a trickle of very hard to prevent false-positives
> from our too-many-parts/parts-too-small filter which is the only filter we
> have that pays much attention to the (m/n).

Exactly -- this is one of the big things xBin is supposed to fix,
making it sometime in the future 100% possible to identify binary
posts. This will allow a clean spit between the two types of
information being sent, with all of the advantages that has.

A slash between two numbers just doesn't cut it (there was a poster
either here or in nsr where someone was having problems with a FAQ
because it had a date in the Subject, and it was being hit by binary
filters).

--
J.B. Moreno

Road Warrior

unread,

Jul 13, 2003, 10:05:10 PM7/13/03

to

On 12 Jul 2003 08:26:03 +0200, Juergen Helbing wrote:

> Marco d'Itri <ab...@1.2.0.1.0.0.1.e.f.f.3.ip6.int> wrote:

>>arch...@i3w.com wrote:
>>
>>>A simple filter on the multipart indicator (m/n) does this easily.

>>This is highly unreliable.

> ? What ?
> The entire Binary Usenet is currently based on that multipart indicator.
> Why would a filter on this should be not reliable ?

Well, unfortunately, I know of one specific example. There are a number
of text groups where the users (mainly OE users) have no clue about
multipart binaries, and regularly have the month and day in their
subject lines in the (##/##) format, which has been causing some
problems lately for some users of newsreaders like Dialog which
automatically combine messages that it THINKS are multipart binaries.

Filtering on the multipart indicator would therefore filter out these
posts.

David Magda

unread,

Jul 14, 2003, 3:03:44 PM7/14/03

to

Road Warrior <roadwarrio...@yahoo.com> writes:
[...]

> Filtering on the multipart indicator would therefore filter out
> these posts.

So you're saying that filtering out posts of clueless people is bad?

;>

--
David Magda <dmagda at ee.ryerson.ca>, http://www.magda.ca/
Because the innovator has for enemies all those who have done well under
the old conditions, and lukewarm defenders in those who may do well
under the new. -- Niccolo Machiavelli, _The Prince_, Chapter VI

Heiko Studt

unread,

Jul 16, 2003, 4:34:32 AM7/16/03

to

Wanted: Juergen Helbing; Last seen: 10 Jul 2003

> Solution2:
> Hosts which are receiving POSTs are creating appropriate
> headers and/or msgids themselves.

Changing Message-IDs is bad...

IMHO there should be a mixture of solutions:

Command:

| IHAVE <MID> [<content-type> [<compressed>]]

'content-type' is for example "image/gif" (==MIME-Type)
If it is empfty it's 'text/plain'
'compressed' Mmmh... perhaps the last method? So 'gzip2' or sthg.

(Same with TAKETHIS)

Response:
(added)
4xy Binaries not allowed. (short: BNA)

This would reduce the overhead of binary-feeding by far and increase the
plain-text-overhead of IHAVE in a small way. The 'content-type' is
generated out of the message by the part, which is used for PUSH.

Perhaps even this would be available if there'll be a change at all:
| IHAVE <MID> (["CANCEL" [<auth/pgp/*>]] | [<content-type> [<compressed>]])

This would be our new command "CANCEL", the 'auth' method should be defined
before. For example we can use the MD5-(SHA1-) Checksum used at SASL-
authentification. So the cancels are trusted too.
If the reaction would be something like '2xy send it' the PGPed Cancel-
Mesage is sended.

In my eys, now, this all looks nice and useful. Perhaps I got into an old
discussion again -- what do you thing?
Is something like the above compatible to the latest INNs/Diabolos?

MFG, HTH

Rene

unread,

Jul 16, 2003, 7:56:03 AM7/16/03

to

arch...@i3w.com (Juergen Helbing) wrote:
> I agree - but everything which makes binary identification easier
> is actually not very popular among neticens (RIAA).
> Even the best proposals and solutions cannot be introduced
> until we've found also a solution to protect the neticens.

Hm how would you like to protect them? In todays P2P networks the binaries
are quite easy identifiable and nothing big has happend up to now (its
probably about to change but it hasn't happend yet)

> Some people might wonder why I was so inactive in the last months:
> I'm waiting for gzip-8, xBin and for an inspiration how to give
> neticens privacy without opening Usenet entirely to spam and trolls.

Well and I'm waiting for free time in which to continue :-(
I think my xBin implementation is at about 40% finished but I don't see any
realistic time to continue it until October. However, IIRC somebody else
has long since announced that he has written a C-library for xBin.

I wanted to have my implementation so that we can cross-verify both of
them, but as I said, it won't happen soon.

> A related question (concerning privacy) to you:
> You are using a funny mail-address: ab...@1.2.0.1.0.0.1.e.f.f.3.ip6.int

That's a valid Ipv6 "encoded" address.

> I'm also looking for a solution how a news-server can generate
> the FROM header automatically so that reply mail is routed anonymously
> through the news+mail-server of the ISP/NSP.

If you happen to control your own mailserver (which is also quite easy
today) AND you own a domain, you can do that easily. I'm doing it since
quite some time ago.

CU

Rene

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service New Rate! $9.95/Month 50GB

David Magda

unread,

Jul 16, 2003, 2:43:03 PM7/16/03

to

Heiko Studt <XNew...@goldpool.org> writes:

> Wanted: Juergen Helbing; Last seen: 10 Jul 2003
>
> > Solution2:
> > Hosts which are receiving POSTs are creating appropriate
> > headers and/or msgids themselves.
>
> Changing Message-IDs is bad...
>
>
> IMHO there should be a mixture of solutions:
>
> Command:
>
> | IHAVE <MID> [<content-type> [<compressed>]]
>
> 'content-type' is for example "image/gif" (==MIME-Type)
> If it is empfty it's 'text/plain'
> 'compressed' Mmmh... perhaps the last method? So 'gzip2' or sthg.

[...]

I would do compressed=TYPE; where type is "compress", "gzip",
"bzip2" for the algorithms used in those utilities.

Mirco Romanato

unread,

Jul 16, 2003, 3:09:29 PM7/16/03

to

"Rene" <inv...@email.addr> ha scritto nel messaggio
news:20030716075603.155$Z...@newsreader.com...
> arch...@i3w.com (Juergen Helbing) wrote:

> > Some people might wonder why I was so inactive in the last months:
> > I'm waiting for gzip-8, xBin and for an inspiration how to give
> > neticens privacy without opening Usenet entirely to spam and
trolls.

> Well and I'm waiting for free time in which to continue :-(
> I think my xBin implementation is at about 40% finished but I don't
see any
> realistic time to continue it until October. However, IIRC somebody
else
> has long since announced that he has written a C-library for xBin.

> I wanted to have my implementation so that we can cross-verify both
of
> them, but as I said, it won't happen soon.

I hope you will support TigerTree and SHA-1 like implemented in
Gnutella (shareaza/limewire/bearshare/GnucDNA/etc.) and release
related sites.
eDonkey hashes could be an useful add-on.

As I writed in this group, a way to bridge the two network could be
very useful for the users and the networks.
And the work required is not very much (code is mainly around - Tiger
hash/TigerTree/etc.)
It only require a programmer with time and a bit of skill.

Mirco

Heiko Studt

unread,

Jul 16, 2003, 3:47:34 PM7/16/03

to

Hi David Magda,

On 16.07.2003 20:43:03 these lines were written down onto papyrus:

> > | IHAVE <MID> [<content-type> [<compressed>]]

> > 'compressed' Mmmh... perhaps the last method? So 'gzip2' or sthg.

> I would do compressed=TYPE; where type is "compress", "gzip",

> "bzip2" for the algorithms used in those utilities.

Yes, that would be fail-safe, but it would be waste of bandwidth.
IMHO IHAVE should be as small as possible.

Like:
IHAVE <a...@def.ggg.invalid> text/plain BZIP2

BTW: Why is it needed to know what 'compression'-method is used? Or do we
speak of sending *in* compressed mode?
Latter we should use a way which can stream more than one article by IHAVE.

Like this:

MOREIHAVE <mid1> <mid2> <mid3> <mid4>
200 I understand
IHAVE <mid1>
200 Yes I want
IHAVE <mid2>
200 Yes I want
IHAVE <mid3>
400 No, i have allready
IHAVE <mid4>
200 Yes I want
MORESEND
<Article1>
"."
<Article2>
"."
<Article4>
"."
200 Ok, I got all your articles. Thank you.

(With takethis this could be even enhanced to be 'MORETAKE [...]" and *then*
send all articles in line)

MFG, HTH

--
Heiko Studt
Heiko...@gmx.de
PS: opto tu pulchra regina!! ;-)

Jeffrey M. Vinocur

unread,

Jul 16, 2003, 5:55:18 PM7/16/03

to

In article <bf4h5n...@privat.goldpool.org>,

Heiko Studt <Heiko...@gmx.de> wrote:
>Hi David Magda,
>
>On 16.07.2003 20:43:03 these lines were written down onto papyrus:
>
>> > | IHAVE <MID> [<content-type> [<compressed>]]

I should state at this point that I think this whole approach is
silly (at least at the present time). Filtering on the source
side is *always* going to be more powerful, because more
information is available. And because peering links on Usenet
are negotiated manually with email between administrators (to
discuss what hierarchies are wanted and what other restrictions
should be in effect), there's no reason that admins can't
negotiate desired filtering rules in the same fashion.

It would be much easier, and not require changing NNTP, to design
a simple "feed description" language. One news admin would send
the desired "feed description" to peering news admins, who would
look it over and then input it to the news server (I had been
brainstorming a few weeks ago that INN could do nicely with Perl
hooks into the newsfeeds file, so this would tie in nicely with
that).

And if some day peers are not negotiated manually over email,
then it would be extremely simple to add an NNTP extension for --
at the beginning of the connection -- querying the remote host
for the "feed description" desired.

>IMHO IHAVE should be as small as possible.

*shrug* It should get out of hand, but I think the amount of
protocol-related traffic is generally dwarfed by the quantity of
news being exchanged.

>Latter we should use a way which can stream more than one article by IHAVE.

Um, what does the CHECK / TAKETHIS mechanism lack?

--
Jeffrey M. Vinocur
je...@litech.org

Heiko Studt

unread,

Jul 16, 2003, 8:08:33 PM7/16/03

to

Hi Jeffrey M. Vinocur,

On 16.07.2003 23:55:18 these lines were written down onto papyrus:

> >> > | IHAVE <MID> [<content-type> [<compressed>]]
> I should state at this point that I think this whole approach is
> silly (at least at the present time). Filtering on the source

Ok.

> side is *always* going to be more powerful, because more

Yes and no.
First yes: More Information is available, there'd be a very much better and
efficient way to filter out posts before going through to the peerpartner.
No: Policies are going to be changed. If they are changed *all* Peering-Partners
have to be contacted, all have to change the approach, all can do failures.
No: The needed power for scoring (etc) for *every* arcticle for *every*
peer-partner would be very much AFAIK.
No: It's IMHO not needed. You want to have *those* Hierachies of text-only
(HTML wanted?) And *those* groups/hierachies with binaries of such type.
All other information can't be done before very efficient -- the source
can't find out if a posting is really multipart for sure(!) if the next
articles of this multipart arn't available. (Then it is multipart of course)
There are other restrictions of course, but IMHO(!) those won't affect
so many posts as the cost of power (see above) has to be spent.

Perhaps am too text-oriented, but in my eyes Usenet should be text-only and a
'second' Usenet (with another name) can be binary-oriented. But we shouldn't
warm up this discussion again...

> information is available. And because peering links on Usenet
> are negotiated manually with email between administrators (to
> discuss what hierarchies are wanted and what other restrictions
> should be in effect), there's no reason that admins can't
> negotiate desired filtering rules in the same fashion.

Sure -- filtering on hierachy-base is not very hard work and can be done
"on the fly". Filtering based on the post have go through a complete and
long way for filtering. If such a script (what else do you want to use for
filtering?) is bad written it can take seconds for one post of wasting power.
If *all* peering-partners get the same information there won't be such a waste,
because it's only one time the article have to be scanned -- it can perhaps even
done while expanding PATH...

> It would be much easier, and not require changing NNTP, to design
> a simple "feed description" language. One news admin would send

Every 'simple' 'language' in this kind of meaning can't handle everything. So
it grows or die out, because no one 'can' use it. If it grows it comes to be
unhandy and wasting CPU-time. I can remember of one 'bug' (CPU-leak) in Hamster
I found on testing a *long* filter: For every single line iHamster created up to
5 'Article'-Objects. This creation alone needed up to 3 Seconds/Article).
I could get this object to be created only one time on that day and after that
one article was filtered in 500ms... (On ~5000 REGEXP-lines)

> And if some day peers are not negotiated manually over email,
> then it would be extremely simple to add an NNTP extension for --
> at the beginning of the connection -- querying the remote host
> for the "feed description" desired.

Yes, you're right here -- it would relativly simple -- and crazy! If every admin
can change 'his' rules on their source-servers so easyly they don't mind if they
need one millisecond or one second to get through the filter, as long as they
don't have to pay for the traffic.
You have to think of 10k lines of filters for 10-100 peers (at least).
All have to be used for every article...

> >IMHO IHAVE should be as small as possible.
> *shrug* It should get out of hand, but I think the amount of

'get out of hand'? What do you mean?

> protocol-related traffic is generally dwarfed by the quantity of
> news being exchanged.

I think it does matter: It is not only between two servers sent, but every peer
sends one to every peer -- it will get hundreds of thousands times *for every
article*...

> >Latter we should use a way which can stream more than one article by IHAVE.
> Um, what does the CHECK / TAKETHIS mechanism lack?

You misunderstood the whole. It got along the idea, you didn't follow...;-)

MFG

Juergen Helbing

unread,

Jul 18, 2003, 12:58:31 AM7/18/03

to

Rene <inv...@email.addr> wrote:

>> Even the best proposals and solutions cannot be introduced
>> until we've found also a solution to protect the neticens.
>
>Hm how would you like to protect them?

Anonymous posting servers with third party identification.
(Therefor I need also a good way to route eMail ;-)
Or by removing path entries and splitting the upstream
to a million users...

>In todays P2P networks the binaries
>are quite easy identifiable and nothing big has happend up to now (its
>probably about to change but it hasn't happend yet)

You are right: Nothing big has happened UP TO NOW.
But being prepared is always the best way.
Usenet is already a fortress - and adding another wall
which could be raised within a few days might be the
best guarantee to leave it the best free communiation
channel in the world (for all kind of bits).

>> Some people might wonder why I was so inactive in the last months:
>> I'm waiting for gzip-8, xBin and for an inspiration how to give
>> neticens privacy without opening Usenet entirely to spam and trolls.
>
>Well and I'm waiting for free time in which to continue :-(
>I think my xBin implementation is at about 40% finished but I don't see any
>realistic time to continue it until October. However, IIRC somebody else
>has long since announced that he has written a C-library for xBin.
>
>I wanted to have my implementation so that we can cross-verify both of
>them, but as I said, it won't happen soon.

I did implement some nice MD5 routines and I am currently working
on "auto-indexing" many Usenet groups. This way I can create
'third-party' hash-messages also for multiparts which are not sent
out with xBin. This will replace the need to download xover...

CU
--
Juergen

Juergen Helbing

unread,

Jul 18, 2003, 1:03:09 AM7/18/03

to

"Mirco Romanato" <painl...@yahoo.it> wrote:

>I hope you will support TigerTree and SHA-1 like implemented in
>Gnutella (shareaza/limewire/bearshare/GnucDNA/etc.) and release
>related sites.
>eDonkey hashes could be an useful add-on.

Do you have links to the algorithms / sourcecode ?
I'm currently creating "indexes" for some binary newsgroups
and it would not be too hard to add more values to a file index.

A good question might be:
How do you want to 'combine' the networks ?
Should a newsreader contact also P2P networks -
or should it transfer the partial binary to another
program and start the download there ?

I'm not too familiar with the internal operation of these
P2P programs. But I doubt that they are prepared to
"receive jobs" from another application (as a newsreader).

CU
--
Juergen

Juergen Helbing

unread,

Jul 18, 2003, 1:10:55 AM7/18/03

to

Heiko Studt <XNew...@goldpool.org> wrote:

>> Solution2:
>> Hosts which are receiving POSTs are creating appropriate
>> headers and/or msgids themselves.
>
>Changing Message-IDs is bad...

Yes - of course.
But creating a new type of IHAVE command is not realistic.

>IMHO there should be a mixture of solutions:
>Command:
>| IHAVE <MID> [<content-type> [<compressed>]]
>'content-type' is for example "image/gif" (==MIME-Type)
> If it is empfty it's 'text/plain'
>'compressed' Mmmh... perhaps the last method? So 'gzip2' or sthg.

<vbg> - basically we would do exactly the same thing:
Extending the MsgId by binary information.
You are adding the info behind the '<...>' - I would add it
within the <...>

Does anybody know what happens if we are using such
Message-IDs?

Message-ID: <......@xxx.yyy> binary 700 meg

My own news-server is actually scanning the msgid-header
for the '<', '@' and '>' - and discards everything in front of '<'
and after '>'. So a binary 'postfix' to the msgid would work
easily.

CU
--
Juergen

Heiko Studt

unread,

Jul 18, 2003, 9:26:38 AM7/18/03

to

Hi Juergen Helbing,

On 18.07.2003 06:58:31 these lines were written down onto papyrus:

> best guarantee to leave it the best free communiation
> channel in the world

I don't think anyone would be able to kill usenet -- if not the ISPs themselves.
If one country kill all the Usenet-Servers they have, all other countries are
providing as much Usenet-servers as they had before.

And how do you want to 'anonymisate'<sp> the Usenet without allowing SPAMMERS
and...

> (for all kind of bits).

...p*rn*graphic (or even kid-p*rn*) to post? This will be a REAL problem if this
medium is getting to be anonym.
IMHO it is even a 'Vorteil' (better thing) to have *no* real anonymous in Usenet,
because therefor everyone posting here has to think of what he say.
It's like a market-place where you don't know who you are speaking to, but if he
do bad things you can catch him and bring him to police.
I don't know what it is in (English) justice, but everything you post here is
'official' spoken as you would speak on that imagined market.

There are groups even not allowing anonymous-relayers and even not allowing
no-real-names, because of the problem that one (some ones) think they can do
*everything* because they are 'anonymous' without name.
I don't think we should provide those.

> I did implement some nice MD5 routines and I am currently working

[...]

Of course you can build up a new, an own system (as you did yet), but IMHO the
usenet should be completly seperated.

Heiko Studt

unread,

Jul 18, 2003, 9:15:51 AM7/18/03

to

Hi Juergen Helbing,

On 18.07.2003 07:10:55 these lines were written down onto papyrus:

> >Changing Message-IDs is bad...
> Yes - of course.
> But creating a new type of IHAVE command is not realistic.

It's by far moire realistic than to change the opinion(?)
'Never change a News-MID! NEVER!'

> <vbg> - basically we would do exactly the same thing:
> Extending the MsgId by binary information.
> You are adding the info behind the '<...>' - I would add it
> within the <...>

Yes, and IMHO my method is better, because...

> Does anybody know what happens if we are using such
> Message-IDs?

...dupes. *REAL* danger of dupes.
Perhaps someone else posted a posting with that MID (without binary data)
Changes between implementations (failure in programming and sthg like that, you
know) will of course generate extrem hard ping-pong-games.
Think of:

POST <mid1> on Server1

Server1 generates <mid1_2> and peer to Server2
Server2 (failure or so) generates <mid1_2_3> and sends to Server3
Server3 (of any type) peers to Server1
Server1 changes MID To <mid1_2_3_2> and sends to Server2
[and so on]

Of course this bug in Server2 appears only in ten posts a year or so, perhaps
even less. Now we have a problem -- because of the fastness of usenet this
system will bring up millions of dupes before the first servers are stopped
(because of Murphy it will happen 3 o'clock in night -- no one will notice)
Now all millions articles are peered to other hosts. Those can also have
failures, so they generate new articles ---- and so on.

Ok, now you argument that therefor the 'PATH'-header exists.
(arg. what an English ;-))

Why should we break the golden rule of changing MIDs? If the binary-group wants
to change them, all others want to change them too (for example to place 'ads'
there, or generate a MID because of beeing sure those are unique)

The unique -- how should *I* (as Poster) know if the MID is unique? *I* have to
proof so that it will be, but perhaps *I* sent such a MID before and the
post-server didn't get this article before generating the MID.

IMHO the fail-safe system based on never-changed-MIDs is too important.

> Message-ID: <......@xxx.yyy> binary 700 meg

> My own news-server is actually scanning the msgid-header
> for the '<', '@' and '>' - and discards everything in front of '<'
> and after '>'.

So you are save. I think many of the server-implementation (if not all) are
discarding the chars infront/after. So the 'new' system would be even compatible
to the new one.

> So a binary 'postfix' to the msgid would work
> easily.

Your system is IMHO too unsafe for failures, but let us see if anyone else has
an opinion... :)

Jeffrey M. Vinocur

unread,

Jul 19, 2003, 10:10:27 AM7/19/03

to

In article <bf50f2...@privat.goldpool.org>,

Heiko Studt <Heiko...@gmx.de> wrote:
>Hi Jeffrey M. Vinocur,
>
>On 16.07.2003 23:55:18 these lines were written down onto papyrus:
>

>> Filtering on the source side is *always* going to be more powerful

>
>No: Policies are going to be changed. If they are changed *all* Peering-Partners
>have to be contacted, all have to change the approach, all can do failures.

This is a valid point -- but not one that concerns me too much,
honestly. Just on gut instinct...anybody else who wants to
weight in here, feel free.

(If it ends up being important, I think the best way to fix it is
to still do source-side filtering but to provide an NNTP
extension for querying the recipient about what filter is
desired.)

>No: The needed power for scoring (etc) for *every* arcticle for *every*
>peer-partner would be very much AFAIK.

I don't understand why you say this. Several points:

- In the news server world, CPU tends to be plentiful -- the
bottlenecks are disk and memory and bandwidth. If the server
can filter, it saves some of its own resources.

- The filters we're proposing may in fact be quite simple. Keep
in mind that many servers are already analyzing headers and
body for their own purposes.

- The scoring is going to be done at one end or the other. And
in fact, a lot of the filtering constraints (e.g. checking for
binaries) are likely to be shared among many peers, so the
source can compute the answer once and use it to the benefit of
many recipients. So while it would certainly be more CPU usage
for the source than not filtering at all, the total CPU usage
of all the servers together would go down, and since news
servers are cooperative anyway, this should make everyone
happy. And of course, it's not entirely selfless -- if the
source can save the recipient some burden, then the recipient
can use the CPU to filter articles that it would otherwise be
sending the opposite direction.

>No: It's IMHO not needed. You want to have *those* Hierachies of text-only
> (HTML wanted?) And *those* groups/hierachies with binaries of such type.

Hmm, I think if we're going to implement a major change to the
way peering links exchange articles, it needs to be at least
reasonably generalizable.

> All other information can't be done before very efficient -- the source
> can't find out if a posting is really multipart for sure(!) if the next
> articles of this multipart arn't available.

I'm confused.

You wanted to add additional information to IHAVE commands so
that the receiving server would be able to filter. I'm saying,
why not just have the source server do the filtering with that
same information. But if the information isn't available yet,
then how can it be included in IHAVE?

>Perhaps am too text-oriented, but in my eyes Usenet should be text-only and a
>'second' Usenet (with another name) can be binary-oriented. But we shouldn't
>warm up this discussion again...

If you want text-only Usenet, then you should peer only with
people who can provide you a prefiltered feed. Then you wouldn't
have to be concerned about it at all.

>Sure -- filtering on hierachy-base is not very hard work and can be done
>"on the fly". Filtering based on the post have go through a complete and
>long way for filtering. If such a script (what else do you want to use for
>filtering?) is bad written it can take seconds for one post of wasting power.

I don't find "if poorly implemented, X will not work well" to be
a convincing argument that X is a bad idea.

>> It would be much easier, and not require changing NNTP, to design
>> a simple "feed description" language. One news admin would send
>
>Every 'simple' 'language' in this kind of meaning can't handle everything. So
>it grows or die out, because no one 'can' use it.

...

I don't think this is true, but I don't know how to convince you
of that.

(Actually, I was thinking XML might not be a bad technology for
this purpose. Anybody here know enough about it to comment?)

>If it grows it comes to be unhandy and wasting CPU-time.

See the above comment about "X".

>> And if some day peers are not negotiated manually over email,
>> then it would be extremely simple to add an NNTP extension for --
>> at the beginning of the connection -- querying the remote host
>> for the "feed description" desired.
>
>Yes, you're right here -- it would relativly simple -- and crazy! If every admin
>can change 'his' rules on their source-servers so easyly they don't mind if they
>need one millisecond or one second to get through the filter, as long as they
>don't have to pay for the traffic.

Keep in mind two points:

- If the filter language is sufficiently declarative rather than
computational, it may be possible for the source server admin
to configure restrictions on what sort of filters are
acceptable.

- All server-server links are negotiated manually, and the
recipient remains aware that if his link becomes too costly,
the source server admin is free to simply drop him as a peer.

[ Interlude: can you please fix your newsreader not to generate
lines of 80 characters? ]

>You have to think of 10k lines of filters for 10-100 peers (at least).

I think you're thinking of letting the filters be arbitrary code.
I'm not sure that's the best idea; I really think we can
formalize a filter language that will be sufficiently powerful
but still fast.

>> >IMHO IHAVE should be as small as possible.
>> *shrug* It should get out of hand, but I think the amount of
>
>'get out of hand'? What do you mean?

I meant "should not get out of hand" -- but if you're asking for
a definition of the term, "get out of hand" means "become
unreasonable".

>> protocol-related traffic is generally dwarfed by the quantity of
>> news being exchanged.
>
>I think it does matter: It is not only between two servers sent, but every peer
>sends one to every peer -- it will get hundreds of thousands times *for every
>article*...

By the way, this is exactly my point above; it's much more
efficient to filter on the source side than to try to give the
recipient enough information to filter in IHAVE.

But anyway, your point may be right (does anybody -- Curt? --
quantify how much bandwidth is spent on protocol vs articles?),
but your reasoning is wrong. Yes, there will be many IHAVEs for
each article...but there will also be many times the article is
sent over the network. Here, let's work it out:

If there are N servers in the network, each with L links to
other servers, each article is uniformly X bytes long, an
IHAVE exchange is uniformly I bytes in total traffic, and 0
<= P <= 1 is the fraction of servers that want each article
(or must receive it in order to decide they don't want it),
then the traffic due to articles is N * P * X, and the
traffic due to IHAVEs is N * L * I. Thus, as long as P * X
is much bigger than L * I, the traffic due to IHAVEs is
perhaps not so important.

So, if X is 100 000, I is 100, and L is 100, then P needs to
be at least 0.1 -- and I hope that it is! (Because right
now, I don't think any server can afford to receive a full
feed, determine that it doesn't want the great majority of
the articles, and drop them on the floor.)

>> >Latter we should use a way which can stream more than one article by IHAVE.
>> Um, what does the CHECK / TAKETHIS mechanism lack?
>
>You misunderstood the whole. It got along the idea, you didn't follow...;-)

Can you clarify? The above doesn't make sense in English.

Jeffrey M. Vinocur

unread,

Jul 19, 2003, 10:12:53 AM7/19/03

to

In article <bf92v7...@privat.goldpool.org>,

Heiko Studt <Heiko...@gmx.de> wrote:
>
>On 18.07.2003 07:10:55 these lines were written down onto papyrus:
>

>> But creating a new type of IHAVE command is not realistic.
>
>It's by far moire realistic than to change the opinion(?)
>'Never change a News-MID! NEVER!'

Wait wait!

You seem to be discussing the fact that a server should NEVER
modify a Message-ID in transport. This is absolutely true and
fundamental to the nature of the network.

But the people who want to encode information in the Message-ID
(e.g. xBin) want the *poster's software* to do that. That is,
they want to modify the method the newsreader uses to *generate*
the Message-ID -- not modify the Message-ID of an existing
article.