[FNTP] 'in-completeness'

Juergen Helbing

unread,

Sep 23, 2002, 2:06:58 AM9/23/02

to

Hi neticens, developers, admins.

After solving the overhead and corruption problem
(even if you dont like the way it was done :-)
it might be time to make the next step for Binary Usenet:

NNTP does not guarantee a perfect distribution (by definition).
And there are even more obstacles.
Missing a few messages now and then is not a big problem.
But whenever single messages are part of a larger series
(i.e: of a multipart or of a multi-post) then often enough
this causes confusion, unusability, reposts, waste of bandwidth....)

It is _not_ the target of FNTP to reduce the size of Usnet
- in contrary: If this would work properly then a _boost_
of new users must be expected because this would make
Usenet far more attractive.
It is _not_ the target of FNTP to solve the completeness
problem for all messages - but at least for the 'urgent'
binary multiparts and multiposts. The "rest" might be
handled as a side effect. ;-)

Actually there are these reasons for incompleteness:

* NNTP fails to distribute messages across Usenet.
This is mainly caused by downtime of transit servers,
overload, server malfunction.....
The message(s) _do_ enter Usenet, but are badly distributed.

* The user does not complete posting.
This is mainly caused by user-computers malfunction
or abortion of the dialup-connection - or timeouts
during the POST command - or timeout during the
confirmation of succesful posting.
The message(s) does _not_ enter Usenet.

* Some messages of a multipart are cancelled.
This is caused by malfunctioning antispam bots
or other kind of third party cancelling.
Someone does not _want_ the message to be on Usenet.

* Messages are filtered by transit servers or hosts.
An admin does not _want_ to have or relay the messages.

* Messages are malformatted.
The endusers newsreader cannot identify them as a multipart.
Or an incredible low size of parts (100 lines --> 10.000 parts)
is used - and often enough the (newbie) enduser stops posting
as soon as he recognizes what he did.

The last time I started a discussion about 'completeness'
I've got the answer that "adding more peering" would solve
the completeness problem. But I'm afraid that the 'conventional'
approaches would not be enough.
Perhaps "thinking out of band" - using new, unconventional ideas
might be necessary to solve these issues.

The development of "PAR" files has already entered the next stage.
Endusers _will_ find their own ways to solve the incompleteness.
And perhaps it would be even the best idea to let _them_ do the job.
But PAR files could also solve only 3 of the upper listed 5 issues.

If this topic is of "general interest" then we could either start
a kind of brainstorming about it. This could be the source of some
new ideas (which are perhaps necessary). Or we could discuss
some specific proposals. I am having some nice ideas - and I am
sure that most of you could also open their notepad and share
their imagination with us.

If there is work actually done on this topic (invisibly to the public)
which might be affected or damaged by this "call for ideas",
then please let me know.
I really dont want to be accused again for 'picking raisins'.

I am convinced that it is necessary to discuss - and act - upon
the incompleteness. Further development of Usenet must not
sleep - or be just done whenever someone can afford a few
hours for the next eMail.
news.software.nntp should be _full_ with dicussion
about the further development of Usenet.

I know that everybody is very busy all the time.
But I'm doing it again. Perhaps this time is works better.

TIA for your participation.
--
Juergen

Greg Andrews

unread,

Sep 23, 2002, 2:28:47 PM9/23/02

to

arch...@i3w.com (Juergen Helbing) writes:
>
>The last time I started a discussion about 'completeness'
>I've got the answer that "adding more peering" would solve
>the completeness problem. But I'm afraid that the 'conventional'
>approaches would not be enough.
>Perhaps "thinking out of band" - using new, unconventional ideas
>might be necessary to solve these issues.
>

As long as the ideas really are new, and not just the same ideas
dreamed up by others years ago and then abandoned as ineffectual
or impractical.

What have you done to research the past debates conducted on this
topic, so you're not merely rediscovering those old ideas?

-Greg
--
::::::::::::: Greg Andrews ::::: ge...@panix.com :::::::::::::
I have a map of the United States that's actual size.
-- Steven Wright

Curt Welch

unread,

Sep 24, 2002, 1:12:22 AM9/24/02

to

arch...@i3w.com (Juergen Helbing) wrote:
> Hi neticens, developers, admins.

As is mostly the case with me, I seldom have time to get involved in a
project like this to the level the task at hand really calls for. But I am
willing to throw out ideas....

> I am convinced that it is necessary to discuss - and act - upon
> the incompleteness.

The main issue I belive with incomplets starts with the very nature
of how Usenet works. i.e., no one is in charge, and no one can make
anyone fix a broken server.

This sets up a problem where making 50,000 servers "work" when no one is in
charge close to impossible. Articles will be lost in parts of the network
that you have no control over and no access to.

The second part of the problem is that anyone running a server has a lot of
power to get "most" the articles by first making their server work, and
second, add enough peers, to get as close to "perfect" as you would like.

Both of those things become very expensive. Making a server work, and
keeping them working as usenet expands requies a large and steady flow of
cash. Adding peers likewise cost money for every peer you add. Transit
servers can only handle so many peers, so the more peers you try to add,
the more money you have to speed on transit server technology, and on
bandwidth, and on administration costs. The cost of running the server
gets exponentially higher as you get closer and closer to "perfect"
completion.

On top of this basic dynamic of how usenet works, users don't really care
that much about missing articles, or looking from the other side, they are
only willing to pay so much for "completion". There is a point of
diminished returns where a usenet provider will just be wasting money if
they work any harder to prevent lost posts.

None of these issues are problems that I feel can be fixed by improving the
basic technology of Usenet. The technolgy is not what's causing Usenet to
be in a constant state of missing articles. Most providers know how to get
near perfect completion. What they lack is the business justification for
spending the money to do it.

So, what can be done?

I can think of three ways to approach the problem.

One, is to look for ways to make the users care more.

Two, look for ways to give Usenet providers the business justification they
need to do a better job.

And three, is to look for technology changes which might allow the missing
articles to be less of a problem when it does happen.

For example, if there was a daily usenet wheather report available to users
so they could find out just how good, and how bad, the various services
were, this would help motivate the users to care, and to direct more money
to the servers that worked and less to the ones that didn't. Giving the
users information like this makes them care, and in turns, gives the
compies the incentive they need to make the investment in makeing their
servers work. However, creating this report costs money, and if the users
really don't care, then who's going to pay for it? And publishing data
like this without the permission of the provider could get you in trouble.

Along the lines of "learning to live with it", we could explore ways to
make it work better even though there are servers dropping posts.

One idea that I've talked a lot about is to just get multipart posting
programs to split files into equal size articles instead of the the common
practice today which is to allow the last article to be smaller than the
others. This helps servers which are trying to thin out the feed by size
of articel to drop entire sets of a multipart post instead of dropping all
but the last article. Every "tail" article that sneaks though thining
filter like that just wastes bandwidth which could have been but to better
use.

Other ideas which are harder to address is to look for ways to prevent
partial posts. Some appraches to that problem include special "posting
servers" that would allow users to upload the raw file, and then, only
after the entire file was uploaded, the server would create all the parts
and post them closer together in time. This could be done as a differetn
service, or as a modivation to NNTP with a new NNTP command, it could be
done though a web interface (like NewsReader.Com already does - but the
file size is limited to it doesn't work for the really huge files which are
the ones that need it the most).

Some tools, like multi-sever newsreaders that allow users to find missing
posts across multiple servers only make the missling article problem worse.
That's because users that have those tools no longer care if the server
they pay for has missing articles - or he doesn't care as much. So instead
of canceling his subscription to a bad server, he instead pays for a
subscription to more "bad" servers (the cheap ones) in order to create one
"good" server for himself. But in the processes, he incourages the bad
servers to stay in operation, and that only makes it worse for everybody
else.

Use of PAR files is likewise soemthing that only tends to make things worse
instead of better in terms of how many missing files there are.

But both of those techniques might overall make things "better" for the
users because a cheap server that has missing parts combined with tools to
allow the user to "live with it" might be a cheaper and better overall
solution than what it would cost to make servers run with near-perfect
completion.

A more complete investication to the "missing article problem" should start
with a better understanding of just how bad the problem is, or isn't.
i.e., how good are the best servers, and how bad are the rest? How much do
you have to pay to get a "good" server and how much do you save by going
with a "poor" server.

And from the user side, how much is "completeness" really worth.

In order to have any type of intelligent discussion about the "need" to fix
this problem, you have to have to hard facts on how much of a problem it
really is, and some hard facts on how much people really care about it.

--
Curt Welch http://CurtWelch.Com/
cu...@kcwc.com Webmaster for http://NewsReader.Com/

Juergen Helbing

unread,

Sep 24, 2002, 6:49:16 AM9/24/02

to

drec...@yuck.net (Mike Horwath) wrote:

>: After solving the overhead and corruption problem

>: (even if you dont like the way it was done :-)
>: it might be time to make the next step for Binary Usenet:
>

>Make it standalone? Call it Busenet? Then we can just add an 'a' to
>the front?

We did discuss this already.
'Freenet' would be a base - but might be not powerful enough
for so many hosts.
Worse - there would be no chance to create a new powerful
structure like Usenet. No time, no money, no interest.

>: It is _not_ the target of FNTP to reduce the size of Usnet
>
>WTF is FNTP?
>Fast Network Time Protocol?

AFAIR I specified that abbreviation as:
Future News Transport Protocol.

>Yah, just picking on you a little bit, I look forward to the
>discussion.

Meanwhile I am for more patient than 1/2 years ago... :-)

--
Juergen

Juergen Helbing

unread,

Sep 24, 2002, 6:53:20 AM9/24/02

to

ge...@panix.com (Greg Andrews) wrote:

>>Perhaps "thinking out of band" - using new, unconventional ideas
>>might be necessary to solve these issues.
>>
>
>As long as the ideas really are new, and not just the same ideas
>dreamed up by others years ago and then abandoned as ineffectual
>or impractical.

Even the "old ideas" are not really bad.
But they must be _realized_ one day.

>What have you done to research the past debates conducted on this
>topic, so you're not merely rediscovering those old ideas?

In the last 18 months we discuss here a lot of possible changes
to Usenet. Most of the ideas stalled - or are continued 'behind
closed doors'.
I personally did stop the discussion when the solution
"give us more money" was strongly favoured - and other
project absorbed my entire time 'reserved for Usenet' :-)

It is also the purpose of this discussion to get a list of
things which could be NOT done - or have been already
tried and did not succeed.

--
Juergen

Juergen Helbing

unread,

Sep 24, 2002, 6:55:20 AM9/24/02

to

Bas Ruiter <<lord...@home.nl>> wrote:

>It may be a good idea to go over what has already been suggested
>and dismissed in the past. That way everyone gets "up to speed"
>and in the right frame of mind.

Good idea. I will try to find some older threads in my archive.

>Besides, the reasons WHY things were dismissed yesterday may
>no longer be true today.

Optimism is very welcome here :-)

--
Juergen

Juergen Helbing

unread,

Sep 24, 2002, 6:56:17 AM9/24/02

to

Bas Ruiter <<lord...@home.nl>> wrote:

>This is your 2nd attempt at FNTP... do you have a webpage
>relating to the first try?

No sorry, I am not good in webdesign.
I prefer the newsgroups (and the archive folder) over every site.

--
Juergen

Juergen Helbing

unread,

Sep 24, 2002, 8:47:31 AM9/24/02

to

Warning - this is a long reply.
There is a short summary at the end ;-)

cu...@kcwc.com (Curt Welch) wrote:

>The main issue I belive with incomplets starts with the very nature
>of how Usenet works. i.e., no one is in charge, and no one can make
>anyone fix a broken server.

There are other protocols than NNTP which are working properly
'without charge'. So perhaps there is something basically broken.

>This sets up a problem where making 50,000 servers "work" when no one is in
>charge close to impossible. Articles will be lost in parts of the network
>that you have no control over and no access to.

We need an approach closer to TCP/IP with dynamic routing.
I always thought this would be the Internet.
Even working after an atom bomb ruined a large part of it :-))

>The second part of the problem is that anyone running a server has a lot of
>power to get "most" the articles by first making their server work, and
>second, add enough peers, to get as close to "perfect" as you would like.
>
>Both of those things become very expensive.

If increasing completeness would be an exponential function of money
then the system is a shame for the computing specialists. I could
accept n*log(n) - but not more.

>On top of this basic dynamic of how usenet works, users don't really care
>that much about missing articles, or looking from the other side, they are
>only willing to pay so much for "completion".

One of my basic fears is that "hunting the incompletes" is a part
of the FUN on Usenet - and a part fo social life.

A lot of binary newsgroups I know would have no social life
without repost and problem discussion.

Some years ago I tried to create "perfect picture collections" for some
artists in a newsgroup. I had the strong fealing that the failure of that
'fun-project' has to do with the same (social) problem.

But unfortunately the endless reposts are not only fun for the users.
They are also a PITA for the admins - and the bandwidth/overall load.

>None of these issues are problems that I feel can be fixed by improving the
>basic technology of Usenet. The technolgy is not what's causing Usenet to
>be in a constant state of missing articles. Most providers know how to get
>near perfect completion. What they lack is the business justification for
>spending the money to do it.

I still dont agree with you :-)))

>One, is to look for ways to make the users care more.

Yes, I also believe that the users must be involved in the solution.

>Two, look for ways to give Usenet providers the business justification they
>need to do a better job.

Usenet has lost importance on the Internet all the time.
Making it more attractive might be one way.

But 'attractivity' requires also "Easy for daily usage".

Normally 'Web-Boards' and 'Workflow-Desktops' are far more
difficult and time-consuming than a newsreader. A lot of alternatives
are lacking even the 'basic' functions of Usenet.

I thought that the need for efficiency would cause a move to Usenet.
Until now I was wrong. Either people have no idea how good it could
be for their purpose - or its use and setup is far away from being
easy enough for wiedr use.

And of course most "alternatives' are making money. There is a
budget - and there are professional people doing the work.

>And three, is to look for technology changes which might allow the missing
>articles to be less of a problem when it does happen.

>[...]

Users are using "new technology" already.
Unfortunately this does not make Usenet any better.

>Along the lines of "learning to live with it", we could explore ways to
>make it work better even though there are servers dropping posts.

People are very happy with PAR files today.
And I am afraid we will never remove them from Usenet again.

>One idea that I've talked a lot about is to just get multipart posting
>programs to split files into equal size articles instead of the the common

>[...]

>Every "tail" article that sneaks though thining
>filter like that just wastes bandwidth which could have been but to better
>use.

I dont understand why a few 'tail' articles could affect completeness.
The only effect I see would be "subjective" incompleteness:
The users _see_ that there msut be more - but this type of
incompleteness is _wanted_ by their admin.

This becomes of course sick as soon as larger transit servers
are 'filtering' this way.

>Other ideas which are harder to address is to look for ways to prevent
>partial posts. Some appraches to that problem include special "posting
>servers" that would allow users to upload the raw file, and then, only
>after the entire file was uploaded, the server would create all the parts
>and post them closer together in time. This could be done as a differetn
>service, or as a modivation to NNTP with a new NNTP command, it could be
>done though a web interface (like NewsReader.Com already does - but the
>file size is limited to it doesn't work for the really huge files which are
>the ones that need it the most).

If we would be able to "mark a series of posts" to belong together
then every news-server which receives a POST could verify if
all the parts have already arrived.

I dont even see a need for WWW-ports.
The existing servers could do this - and the amount of host-softs is
small - and easy to update.

This was one of my own "basic" ideas for FNTP:
The first Usenet host which receives a multipart post is
_responsible_ for having it complete before he forwards
it to Usenet.

Even response codes to POST could be modified in a way
that the news-reader knows that he has to post "specific things"
before he sends new stuff.

The POST command is one of the worst parts of NNTP
(from my personal point of view) - for multiparts.

>Some tools, like multi-sever newsreaders that allow users to find missing
>posts across multiple servers only make the missling article problem worse.
>That's because users that have those tools no longer care if the server
>they pay for has missing articles - or he doesn't care as much. So instead
>of canceling his subscription to a bad server, he instead pays for a
>subscription to more "bad" servers (the cheap ones) in order to create one
>"good" server for himself. But in the processes, he incourages the bad
>servers to stay in operation, and that only makes it worse for everybody
>else.

The entire Usenet is decentralized - but the connection of the enduser
to Usenet is a monopoly. This is _one_ of the reasons why Usenet
is incomplete.

If ALL users would have redundant connectivity to Usenet
(similar to TCP/IP with automatic route finding) then single servers
could no longer be a main problem for us.

And my own fantasy goes even one step further:
The users does not only have one downstream NSP - no - he has
also more than one UPSTREAMS. And he is using time-shift
for feeding upstream.

I have now three years of experience with this technology
- and its works wonderfully.

And - even better - it is easily possible without any changes
to NNTP. (Just a few problems with the organisation must be fixed ;-)

>Use of PAR files is likewise soemthing that only tends to make things worse
>instead of better in terms of how many missing files there are.

The 'actual use' of PAR-Files is bad (from my point of view):
But PAR files are today the best way for the _endusers_.

If we could finally terminate the breaking up large posts to 50
RAR parts each of 80 messages then it becomes better.
Posting 10 PAR files - which protect only entire RAR parts
- (each PAR needs another 50 messages - and could be incomplete).

It would be necessary to post such a huge message _really_
with 4000 parts - and adding 100 PAR files. Then a lot of loss
could be easily - and very efficiently - corrected.

However the PAR-people are actually creating PAR V2.0
- and I hope that this would solve that problem.

Basically I like the idea to add "redundancy" information
to an unreliable medium. CDroms do it, Satellites are doing
this,..... why not the Usenet :-)

>But both of those techniques might overall make things "better" for the
>users because a cheap server that has missing parts combined with tools to
>allow the user to "live with it" might be a cheaper and better overall
>solution than what it would cost to make servers run with near-perfect
>completion.

From "information-transport" theory you are surely right.

However this would fail completely for one _important_ part of Usenet:
The "picture" Usenet - and the "sounds" Usenet.
The (2-10) multiparts are very often used - and PAR files dont make
any sense. We would need a new "transfer-encoding" which
includes the PARITY information in both/all parts.

But I am having a better solution for this:

New-Admins need _urgently_ to change their "transfer-filters":

We NEED urgently the general permission to transfer larger
articles than today. 10 Meg might be a good limit for binaries.
Then all this 'multipart' problem can be ended for a lot of binaries.

And all 'not-as-binary-labelled' posts must be strictly limited
to 10 kBytes. It is neccessary to prevent user from "fumbling
around" just to circumwent the filters which should prevent
binaries from entering non-binary groups.

I believe that it would be _very_ easy for the servers which
receive the POSTed articles to find out what a binary is
- and what not.

And it would be also extremely easy for them to add
a new header which indicates binary content.

One of my basic ideas is/was to use the message-id
(which is today an endless waste of bandwidth) for
proper labelling of multiparts/binaries.
It could be even delivered by the posting newsreader
- and could be easily verified by the receiving POST host.

>A more complete investication to the "missing article problem" should start
>with a better understanding of just how bad the problem is, or isn't.

Feel free to visit the binary newsgroups and make a search for
the word "Repost" (ROTFL).....

>i.e., how good are the best servers, and how bad are the rest?

My own 'completeness measurement equipment' (which is still
running all the time) does not show larger changes compared
to Nov2001-Feb2002 (when I created some reports for a.b.n-s-c).

>How much do
>you have to pay to get a "good" server and how much do you save by going
>with a "poor" server.

The "good" servers are meanwhile all the pay-servers.
The "bad" servers are almost the 'free' servers located at the ISPs.

Both with some exorbitant exceptions :-)

>And from the user side, how much is "completeness" really worth.

Those users who are using their ISPs news-server for
"sporadic picture viewing" with OE (or the AOL software :-) would not
pay anything. They take (for free) what they can get.

But the "power" users - and the frequent neticens are already
paying today. For multi-host software - and for NSP access.

---<Fantasy>

Of course we could do easily some changes which would force nearly
all small ISP host to close - and leave only the huge NSP hosts
(I am talking about BINARIES here !!!)
This could also be a "de-facto-split" of the Usenet.
Just those ISPs who are able to offer their newsgroups
_clean_, _properly_ and _complete_ would be still accepted.

Together with a download fee and a dedicated payment system
also for CONTENT - which is payed automatically Usenet could
also become the biggest (legal) seller of eBooks, Audio and Video
content.

There is an existing an _huge_ infrastructure with a lot of visions.
It only depends on us what we do with it.

---</Fantasy>

>In order to have any type of intelligent discussion about the "need" to fix
>this problem, you have to have to hard facts on how much of a problem it
>really is, and some hard facts on how much people really care about it.

So you want to tell me that "incompleteness" is not a problem at all ?
Perhaps you are right:

On the text-newsgroups completeness is not a problem.

On the picture newsgroups completeness is not important.
If someone is missing a few pics out of a series of 100 then
he asks for a repost directly. Most requests are answered
- and the amount of picture reposts is low.

On the music-newsgroups completeness becomes more
important: There are repost requests for practical _every_
incomplete multipost. An MP3 file is posted as 3-20 parts
- and whenever one is not there then the full song is usually
reposted. If a full song is not there from an album (or the
'12/1958 charts' then it is also fully reposted.

I am seeing it directly on my newsreader whenever a message
is badly distributed on Usenet. (My newsreader is monitoring
actually 30 servers :-)
Be assured: whenever one message is badly distributed
then it is either spam - or there is a request for a repost.

One part of the insanity here is that reposts based on
single messages is nearly impossible.
The fatal tendency of some news-admins to HIDE incomplete
multiparts has the catastrophic result that FULL songs are
reposted all the time. This insanity must be stopped (!)

The same thing as for MP3 applies basically to the multimedia
(and warez) newsgroups.

Two days ago I saw one message (out of a 800 msg video)
not leaving NewsGuy. There was a request for a repost even
before the post was fully finished for exactly this one part.
(I did a re-feed to Usenet manually - which helped, of course :-)

But in these newsgroups people are now using PAR files
regularily. 10-20% of the volume are meanwhile PAR files.

There are still many users (especially the newbies) who
have problems to understnd and use them - but I hope that
PAR V2 - together with direct newsreader support -
could improve the situation drastically.
In these newsgroups we could 'solve' the incompleteness
by redundancy information. For the price of higher volume.

If we would create a proper "binary usenet format" which
offers news-server the information that files and PARs
belong together, then reader-servers could even EXPIRE
PAR files as soon as they know that they have the full
binary correctly (yEnc ;-) available. Of course transits
have to forward them all.

Good Golly.
As usual your huge answers result in enormous replies
from my side. Please apologize.

Here a short summary:

* We can add "redundancy" information to all binary content.
* We can label binary content properly.
* We can involve the users - giving them more down and upstreams.
* Servers which receive POSTS can perform completeness tests.
* Usenet must become more attractive to new (paying) users.

CU
--
Juergen

magical truthsaying bastard roney!

unread,

Sep 24, 2002, 1:03:26 PM9/24/02

to

In article <2053c....@archiver.winews.net>,

Juergen Helbing <arch...@i3w.com> wrote:
>Even the "old ideas" are not really bad.
>But they must be _realized_ one day.

Old ideas MUST be realized one day? Are you mad?

rone
--
{Reagan's} presidency always reminded me of a remark made by a woman to
Heywood Broun following Secretariat's victory in the Triple Crown. After the
trauma of Vietnam and Watergate, she said, Secretariat had "restored her faith
in humanity." I like to think Reagan was the Secretariat of the eighties.
- Garry Trudeau

Simon Lyall

unread,

Sep 24, 2002, 6:08:45 PM9/24/02

to

Juergen Helbing <arch...@i3w.com> wrote:

> cu...@kcwc.com (Curt Welch) wrote:
>>This sets up a problem where making 50,000 servers "work" when no one is in
>>charge close to impossible. Articles will be lost in parts of the network
>>that you have no control over and no access to.

What I don't understand is *why* we are getting incompletes at all.
Assuming a series of articles is posted correctly then a full set will be
on the posting server. This server will then forward to the a local
transit server (or two) which will then forward to dozens of other
servers and then in a heavy duty mesh to a few hundred (even I assume for
large binary articles, certainly for small articles) servers.

Each machine in the mesh should be offered the article by most or all of
it's peers, this should completly g'tee that it gets through. The only
bottlenecks should be at the posting or the final reader end.

Why doesn't it work then?

- Too few machines handling full feeds?
- Incomplete peering
- Timeouts moving large articles which are then not resent (and other
peers don't resend since the receiver said it already had them)
- Physical lack of bandwidth/cpu/disk speed to handle the full feed by
transit servers?
- Something else?

> Even working after an atom bomb ruined a large part of it :-))

IMHO news is much more robust than email. For one thing it isn't as
vulnerable to short-term DNS outages and also has multiple paths of
delivery built in and the ability to scale well.

> And all 'not-as-binary-labelled' posts must be strictly limited
> to 10 kBytes. It is neccessary to prevent user from "fumbling
> around" just to circumwent the filters which should prevent
> binaries from entering non-binary groups.

Won't work, there are plenty of articles in text groups larger than 10
kilobytes, for example the article of yours I'm following up to is 15
kilobytes.

I think the current anti-binary filters are good enough for most people
and if a new posting method comes along it's easy enough to upgrade.

> I believe that it would be _very_ easy for the servers which
> receive the POSTed articles to find out what a binary is
> - and what not.

Well this blocks 95% of them:

*,!*bina*,!alt.mag*:\
Tm,Ap,H8,<80000:\
innfeed!

and cleanfeed gets 95% of the rest.

> The "good" servers are meanwhile all the pay-servers.
> The "bad" servers are almost the 'free' servers located at the ISPs.

A lot of people don't read binary newsgroups (or only picture groups). A
"bad" news server is good enough for those people. An ISP is faced with an
equation like (broad numbers):

"Bad" news server
=================

Hardware: $10,000 (upgraded every 3-4 years)
Bandwidth: 2-5Mb/s
Admin Time: 2 Hours per week (averaged over years, ie including upgrades)

"Good" news server
==================

Hardware: $100,000 (upgraded every year)
Bandwidth: 40-50Mb/s (doubles every year)
Admin Time: 20 Hours per week (averaged over years, ie including upgrades)

For an ISP to go for a "good" news server represents a big investment for
that ISP. Even for a large ISP with broadband users it might not be worth
it. I see people complaining about completion at EVERY news server
including the NSPs and very large ISPs (like Earthlink), obviously getting
a "good" news server is a huge task that is beyond just about everyone.

Not to mention that perhaps ISPs don't actually *want* customers who max
out their broadband links 24x7 , sensible pricing and limits is a good fix
here.

Curt Welch

unread,

Sep 24, 2002, 10:15:18 PM9/24/02

to

Simon Lyall <simon...@ihug.invalid> wrote:
> What I don't understand is *why* we are getting incompletes at all.

To really be able to discuss this, research needs to be done to answer
that.

From running a server, I know a lot about why it happens, but I don't have
hard facts about what's going on in the rest of the net. I think nobody
does. And if you are going to try and debate how to fix something, you
better understand the problem first or else everything we talk about is
just mostly hot air. But I like creating hot air, so I'll continue...

> Assuming a series of articles is posted correctly then a full set will be
> on the posting server. This server will then forward to the a local
> transit server (or two) which will then forward to dozens of other
> servers and then in a heavy duty mesh to a few hundred (even I assume for
> large binary articles, certainly for small articles) servers.
>
> Each machine in the mesh should be offered the article by most or all of
> it's peers, this should completly g'tee that it gets through. The only
> bottlenecks should be at the posting or the final reader end.
>
> Why doesn't it work then?
>
> - Too few machines handling full feeds?

I think too few with "working" full feeds is true.

> - Incomplete peering

If you get one part, then you should get all.

> - Timeouts moving large articles which are then not resent (and other
> peers don't resend since the receiver said it already had them)

That's not true for servers that are working correctly. But this touches
on the fact that 1) the correct way for NNTP to work is not documented
_anywhere_ and 2) there could be many servers on Usenet that are broken,
and no one would know it.

In regards to 1), what I'm talking about is the fact that none of the RFCs
attempt to even touch on the correct algorithm for dealing with a "try
again later" response. How long should you keep trying it again later?
How often should you try it again later? Some servers I suspect never try
it again later, but I know the major ones do.

> - Physical lack of bandwidth/cpu/disk speed to handle the full feed by
> transit servers?

For sure, this is the #1 problem. But a more interesting question is how
many servers are broken because they are not large enough to keep up, but
yet the admin of the server doesn't know it's broken? This is a real
problem on Usenet but I don't have any facts about how many of the broken
servers are broken, and the admin knows it, or it's broken and the admin
doesn't know it.

Gnerally speeking, it's very very hard to know when your server is broken.
They never look broken - they get lots of news, they have free CPU time,
they "look" like the same as they did when they were working.

> - Something else?

The redundancy of Usenet gives admins a false sense of security. Just like
you outlined above, if you have lots of peers, how is it possible I'm not
getting all the articles?

Most larger sites run multiple transit servers these days as well. That
adds an even higher level of redudancy.

But what happens is you see a "feed" problem, you ignore it, because you
know you have "lots of redundancy and so what if 1 out 50 connections is
having problems. In fact, I think there are a lot of links dropping
articles and no one knows it. Or they find out about it and fix it, but
only after it has been going on for a month. If you put enough of these
bad links into a mesh, then articles will get dropped.

Problem two is that when an outgoing feed is either 1) using up too much
bandwidth, or 2) constantly backing up, or 3) overloding your server
because the peer is accepting too much news, the first thing all news
admins do is reduce the size of that outgoing feed. News admins don't have
many options on how to do that. The single most common way to do it is to
put an article size limit on the feed. You might drop a feed all the way
down to 4K articles, or you might set it at 10K or 200K or 500K etc etc.
Admins pick random numbers based on how much they feel they need to reduce
the size of the feed.

When they do this, they seldom tell their peers. So the guy that thinks
hes getting a full feed from you might in fact be getting a size limited
feed.

So, admins may think they have 50 peers, but in fact, only 3 of them are
sending them a full feed for large articles. And if all 3 of those are
falling behind and dropping articles then you end up with lost articles.
And if you aren't using other tools to watch for the problem, you may have
no clue it's going on because all your transit server look like they are
running fine and for 47 of your peers, the feeds are working fine (they
just aren't sending you a full feed).

> > Even working after an atom bomb ruined a large part of it :-))
>
> IMHO news is much more robust than email. For one thing it isn't as
> vulnerable to short-term DNS outages and also has multiple paths of
> delivery built in and the ability to scale well.

For sure. But I think that robustness is working against it.

If you don't assign responsibilty to someone, and instead try to assign it
to a group then everyone will assume that someone else will do the work,
and in the end, no one does the work.

This I think is one factor of why the lost article problem happens a lot.
People put too much faith in the "group" of peers that somebody surely will
offer me the article.

> > The "good" servers are meanwhile all the pay-servers.
> > The "bad" servers are almost the 'free' servers located at the ISPs.
>
> A lot of people don't read binary newsgroups (or only picture groups). A
> "bad" news server is good enough for those people. An ISP is faced with
> an equation like (broad numbers):
>
> "Bad" news server
> =================
>
> Hardware: $10,000 (upgraded every 3-4 years)
> Bandwidth: 2-5Mb/s
> Admin Time: 2 Hours per week (averaged over years, ie including upgrades)
>
> "Good" news server
> ==================
>
> Hardware: $100,000 (upgraded every year)
> Bandwidth: 40-50Mb/s (doubles every year)
> Admin Time: 20 Hours per week (averaged over years, ie including
> upgrades)

The "Cidera Factor" falls in there to and I've always suspected they were
actually a major cause of missing articles.

Their sat feed makes it real affordable for an ISP to run a server which is
"good enough for most users", but yet has thousands of missing parts.

> Not to mention that perhaps ISPs don't actually *want* customers who max
> out their broadband links 24x7 , sensible pricing and limits is a good
> fix here.

The very fact that running a "good server" works out to be bad business for
many ISPs is clearly a factor in the missing article problem.

Curt Welch

unread,

Sep 24, 2002, 11:48:11 PM9/24/02

to

arch...@i3w.com (Juergen Helbing) wrote:
> We NEED urgently the general permission to transfer larger
> articles than today. 10 Meg might be a good limit for binaries.
> Then all this 'multipart' problem can be ended for a lot of binaries.

This isn't a "permission" thing. It's a technical thing. Servers are
optimized for moving smaller articles, they don't work well when moving
10MB articles. Usenet is optimized for moving small articles.

The entire internet is built on 1500 byte packets. Everything you do on
the internet is likely to be chopped into 1500 byte or smaller packets.

Why aren't you screeming to get that fixed if you think article size is
such a huge problem on Usenet? Whey do you think the internet works that
way?

The internet, like Usenet, is a store and forward system. Store and
forward systems work much much better if you keep the packets small.

For example, with a 1Mbit link, how long does it take to transmit a 10MB
article? 80 seconds. With a store and forward system, a news server will
not send the article to the next server until it's done receiving it. If
you have 20 hops, and each hop takes 80 seconds, that means it takes over
26 minutes for that article to travel though the network.

But if you chop it into 100 articles of 100KB each, how long does it take?
Each article takes only .8 seconds to move from hop to hop now. So the
first article only takes .8 * 20 or 16 seconds to move though the net. The
rest of the articles follow that out, and you end up with total time of 16
+ 99 * .8 or 95.2 seconds for all the pieces to move though the net.

1.5 minutes if you cut it up vs 26 minutes if you don't.

And for the slow way, the servers are also required to have 100 times more
memory to store articles before they get sent on.

This is significant for news servers becuse they cache articles in memory
to make it fast to re-send to peers. The larger the articles, the more
memory you need (or the more likely that the article won't still be cached
in memory when it comes time to send.

Because usenet is a store and forward system, it works much much beter with
smaller articles. It would be very very wrong to try and get everyone to
post 10MB articles. (which by the way some people are trying to do now).
Hey, was that you posting those 10 MB articles to Newshosting the other
day?

Juergen Helbing

unread,

Sep 25, 2002, 2:29:53 AM9/25/02

to

Bas Ruiter <<lord...@home.nl>> wrote:

>I was brainstorming about this yesterday, and one way to
>limit reposts and/or identify posts, and where (what newsgroups) they
>are available in is to use a central database server - like cdds (or
>whatever that thing is called for collecting CD info).

That "central cddb" is a nice feature. In fact this are (AFAIK)
a few dozen de-central hosts. (And more than three in
different countries is a must).

However you are coming _very_ close to my own ideas.
Having a bunch of "header/msgid" - servers which dont
have articles - but just header/binary/msgid information
is a great help.

I am running such a server already - fetching headers from
30 larger Usenet hosts. Whenever I am missing any multipart
header then I know that something was _seriously_ wrong.

>A poster could submit what he's posting, and to which newsgroups. He
>would also send along additional info.. some of which is required, and
>some of which is optional.
>Anyone looking for something could make queries to the same server,
>and find out what's available.

All this information could even be stored in the actual headers.
Having better "overview" commands might be an interesting idea
for making Usenet more attractive.

>What if NNTP and FTP were brought together. FTP provides the files and
>8-bit transmission, and NNTP distributes the info?

There is already an "External-Body" header in a spec.
But this idea is very close to "moving Binary Usenet to Freenet":
We cannot expect that 10.000 news-admins install such stuff.

>My own provider (@home NL) even states that News is a service they throw
>in for free. Customers can't even expect it to work, according to
>@home.. never mind expecting posts to be complete. Retention is
>somewhere between 12 and 24 hours.. *puke*

WOW - what an attitude.

>> Usenet has lost importance on the Internet all the time.
>> Making it more attractive might be one way.
>

>This is mainly because of the software -- News is, for many, too
>technical. Give them a shiny client - large colourfull buttons,
>easy to set up and use... voila!

Yes - and there are already some nice "Binary Grabbers".
Unfortunately most people on the Internet dont know about them.

>> And my own fantasy goes even one step further: The users does not
>> only have one downstream NSP - no - he has also more than one UPSTREAMS.
>> And he is using time-shift for feeding upstream.
>

>As a user I have no objections to being a source for others upstream
>from me... but I'm sure @home minds. I have a 1GB limit/day, so if I
>d/load a DivX that doesn't leave much room at all. Besides that,
>individual users have uploads speeds which is gonna make people's
>eyes water.

I am _not_ talking about 'full secondary upstream posts".

Most user do check if their post was appreciated by the newsgroup.
Some users even download their own posts completely
- just to be sure that it was complete and correct.

Those users need a button "Auto-Verify my last posts" in their
newsreader/autposting software. This function could check if
all parts/messages did arrive properly on Usenet.
If yes - OK.
If not - then the 'missing parts' are sent to another NSP
- or reposted automatically.
The sender would fill up their own holes :-)

Imagine you are uploading 600 msgs for an album or video.
Three messages did not make it up to the large Usenet servers.
With a single click (or even full automatically) your newsreader
or autoposter is 'fixing' the problem.

I am using this technique for two years now.
By IHAVE and even by POST.
This works perfectly.
My own multipart posts are always 100% on Usenet.

>> We NEED urgently the general permission to transfer larger
>> articles than today. 10 Meg might be a good limit for binaries.
>> Then all this 'multipart' problem can be ended for a lot of binaries.
>

>As a result of Curt's earlier postings this week I did a little
>Googling to see what else he discussed wrt NNTP development etc. and
>he said that having large articles is a bad thing. If an article would
>need to be reposted, you'd be reposting 10MB's instead of something
>like 500kB. And it gets worse if even bigger articles are used.

If we succeed to solve the "completeness" problems, then we could
continue to leave the size of binaries where they are.

But if we are NOT able to fix this problem, then we should try
to avoid multiparts whenever it is possible.

Meanwhile also a lot of pictures are posted as "high quality" JPEGs.
They are ALL splitted as multiparts (600kB...2000 kB).

Perhaps 10 MB is too large - but I believe that it would be still
better to distribute the large binaries slower than splitted messages
with missing parts.

>> And all 'not-as-binary-labelled' posts must be strictly limited
>> to 10 kBytes. It is neccessary to prevent user from "fumbling
>> around" just to circumwent the filters which should prevent
>> binaries from entering non-binary groups.
>

>There are texts (FAQs etc) that easily reach several 100 kBs.

Create a newsgroup called: "news.binaries.faq".
All Usenet admins would accept binaries there.
Crossposts _all_ FAQs to that group.
Done.

We would have the best FAQ survey group overall.
Abusing such a group would cause real trouble :-)

>Then again.. you could just simply state that from now on, any post
>above xyz kB is going to be treated as being binary -

Yes - and it must follow the "Binary Message Format Guidelines".
Else it would be rejected.

>and will be
>distributed or filtered as such. If you have a large text which someone
>wants to post he will HAVE TO split it up in smaller segments.

There is no "large text" on Usenet.
Large FAQs are 'documents' - so they are binaries/files.

Large discussions (50 times top-quoting) could be even
stopped this way.
Sending a one-line follow-up as a top-post with 5000 following lines
from a quoted encoded message would also be stopped.

>That way you don't have to "mark" articles as being binary.. you simply
>assume by size.

Any kind of change to Usenet takes its time - and during the transition
we would _need_ a specific labelling of binaries.

Using only the message size is not enough.
Some people would even post 10.000 messages with 10 kB
to circumwent new policies. We must _prevent_ abuse.

btw.: The 'size' is actually not included in the MSG-ID.
So transit servers dont know the size ahead - what they SHOULD.

I will make a proposal for a new "Binary-Message-ID" format soon.
This might explain more.

Thanks for your creative participation.

--
Juergen

Juergen Helbing

unread,

Sep 25, 2002, 2:51:15 AM9/25/02

to

cu...@kcwc.com (Curt Welch) wrote:

>arch...@i3w.com (Juergen Helbing) wrote:

>> We NEED urgently the general permission to transfer larger
>> articles than today. 10 Meg might be a good limit for binaries.
>> Then all this 'multipart' problem can be ended for a lot of binaries.
>
>This isn't a "permission" thing. It's a technical thing. Servers are
>optimized for moving smaller articles, they don't work well when moving
>10MB articles. Usenet is optimized for moving small articles.

Yes - this is a part of the problem.

>The entire internet is built on 1500 byte packets. Everything you do on
>the internet is likely to be chopped into 1500 byte or smaller packets.

AFAIR the Internet acknowledges every single packet - and it retransmits
missing ones :-)
Implementing this into Usenet would be also very simple......

>Why aren't you screeming to get that fixed if you think article size is
>such a huge problem on Usenet? Whey do you think the internet works that
>way?

If we dont do anything for completeness then we should at least
avoid multiparts as often as possible.

If we _succeed_ in fixing incompleteness then we could
leave the maximum size as it is.

I dont know how powerful the large "transit" servers are today.
But the limit of "500 kB" is now used for more than 5 years.
Perhaps it would be possible to raise it ?

Please take it as a "request" to your technical skills:
I want to transmit larger article (up to 10 MB) through your
feeders. How can you fulfill my request ?

>For example, with a 1Mbit link, how long does it take to transmit a 10MB
>article? 80 seconds. With a store and forward system, a news server will
>not send the article to the next server until it's done receiving it. If
>you have 20 hops, and each hop takes 80 seconds, that means it takes over
>26 minutes for that article to travel though the network.

I am sure that the "Usenet Backbone" is not working with 1 Mbit links.
AFAIK you need actually a 30-50 Mbit line.
This would reduce your 20 hops to 5-10.

btw.: Today some "small parts" take hours to be propagated.

>And for the slow way, the servers are also required to have 100 times more
>memory to store articles before they get sent on.

If the situation is so bad, then I slowly begin to understand why Usenet
is notoriously incomplete: Any case of failure between two large
transists would cause a backlog which could not be "spooled out"
again. Any kind of failure _must_ result in loss of messages.

Large sites with multiple feeders and peers might still have a chance.
But if the situation is correct then the peering to "smaller sites"
must be reworked (with better redundancy) urgently.

>This is significant for news servers becuse they cache articles in memory
>to make it fast to re-send to peers. The larger the articles, the more
>memory you need (or the more likely that the article won't still be cached
>in memory when it comes time to send.

I dont see any reason why you need to keep the full article
in memory and start transfer to the next host after it was
sent completely.

You could start sending "instantly" - every proxy server does this.
And sending _one_ article with 10 MB instead of 20 with 50 MB
would even reduce the access to the history - and reduce the
amount of necessary IHAVE (or whatever) requests - with their
necessary latency time).

I assume you are one of the specialists for high volume transists.
So _you_ should tell me the benefits and the consequences :-))

>Because usenet is a store and forward system, it works much much beter with
>smaller articles. It would be very very wrong to try and get everyone to
>post 10MB articles.

There have been already proposals in earlier discussions
to sent out 500 MB as one message :-)))

>(which by the way some people are trying to do now).
>Hey, was that you posting those 10 MB articles to Newshosting the other
>day?

No - I know that it does not work actually.
But AFAIR I've downloaded alread 5 MB messages from
news.newsreader.com. It seems that you have no "size limits"
on your own site. Why ?

Thanks for being engaged in this discussion
- especially during such a lot of work....

CU
--
Juergen

Juergen Helbing

unread,

Sep 25, 2002, 2:57:21 AM9/25/02

to

drec...@yuck.net (Mike Horwath) wrote:

>: My own 'completeness measurement equipment' (which is still

>: running all the time) does not show larger changes compared
>: to Nov2001-Feb2002 (when I created some reports for a.b.n-s-c).
>

>I would love to see those again.

I am in progress to redesign my own toy (MyNews) actually.
After this I am hoping to be able to "verify" and "monitor" a large
part of Usenet. And if I am able to get access to a few more
major hosts (I've lost some of them since Jan 2002) then such
statistics would be created automatically - and could be
also posted automatically.....

But this all depends also on ressources... (as usual).

--
Juergen

Juergen Helbing

unread,

Sep 25, 2002, 3:25:23 AM9/25/02

to

Simon Lyall <simon...@ihug.invalid> wrote:

>What I don't understand is *why* we are getting incompletes at all.
>Assuming a series of articles is posted correctly then a full set will be
>on the posting server.

A lot of incomplete posts exists because the connection
fails during POSTing to that first host.
Most posting programs do not have any kind of retry.

>This server will then forward to the a local
>transit server (or two)

Bottleneck #1
Often enough some messages are only available an the NSP
where the messages were posted to.
(I have access to some of them - I see this all the time).

>which will then forward to dozens of other servers

Bottleneck #2

Some (even huge) NSP dont have enough 'outgoing' peers.

>and then in a heavy duty mesh to a few hundred (even I assume for
>large binary articles, certainly for small articles) servers.

As soon as a message has reached some of the "major" NSPs
they are usually perfectly distributed on Usenet.

(and it is one of my ideas to use these "major" NSPs - as redundancy -
for getting better distribution overall).

>Each machine in the mesh should be offered the article by most or all of
>it's peers, this should completly g'tee that it gets through. The only
>bottlenecks should be at the posting or the final reader end.

These seem to be truely the most important source of trouble.
We all know that > 90% of the Usenet hosts are not administered
the way they should be.

>Why doesn't it work then?
>
>- Too few machines handling full feeds?
>- Incomplete peering
>- Timeouts moving large articles which are then not resent (and other
> peers don't resend since the receiver said it already had them)
>- Physical lack of bandwidth/cpu/disk speed to handle the full feed by
> transit servers?
>- Something else?

My personal experience:

* Downtime
* Inability to handle spools after downtime.

The available ressources are actually calculated to keep up with a
full feed. But ANY case of downtime requires MORE ressources.
Because they are not available the backlogs are simply "dropped".

>> Even working after an atom bomb ruined a large part of it :-))
>
>IMHO news is much more robust than email.

I believe you are wrong. eMail has two important features which
are NOT (yet) available for Usenet:

"Retry 1-4-8-24 hours later"
"Use alternative eMail relais if target is not reachable".

>> And all 'not-as-binary-labelled' posts must be strictly limited
>> to 10 kBytes. It is neccessary to prevent user from "fumbling
>> around" just to circumwent the filters which should prevent
>> binaries from entering non-binary groups.
>
>Won't work, there are plenty of articles in text groups larger than 10
>kilobytes, for example the article of yours I'm following up to is 15
>kilobytes.

And it is already far too long for discussion.
In an FAQ discussion I've asked people lately to
send serveral follow ups to _single_ parts of it.
This worked nearly perfectly.

<Dream>

These "500 lines" discussion threads are a 'meta-problem'
of TEXT-Usenet which is not solved yet.
However we could solve it easily by enforcing a limit of
10 KBytes per message.
If your newsreader tells you that your follow-up is too long
then we could even make discussion on Usenet more usefull...

</Dream>

>I think the current anti-binary filters are good enough for most people
>and if a new posting method comes along it's easy enough to upgrade.

Transition is always difficult - but I want to keep it in our minds.

>> I believe that it would be _very_ easy for the servers which
>> receive the POSTed articles to find out what a binary is
>> - and what not.
>
>Well this blocks 95% of them:
>
> *,!*bina*,!alt.mag*:\
> Tm,Ap,H8,<80000:\
> innfeed!

I'm afraid you are missing my point:
I dont want to _block_ binaries - but seperate them from text
(and identify them).

>> The "good" servers are meanwhile all the pay-servers.
>> The "bad" servers are almost the 'free' servers located at the ISPs.
>
>A lot of people don't read binary newsgroups (or only picture groups). A
>"bad" news server is good enough for those people.

My short spec of "good" and "bad" applies only to _binaries_.
Usenet does not have problems with the few text bytes.

>For an ISP to go for a "good" news server represents a big investment for
>that ISP. Even for a large ISP with broadband users it might not be worth
>it.

It would be nice to know how ISPs can find out how much traffic
is caused by NNTP. Of course the costs of maintaing a "good"
newsserver must stay comparable to bandwidth and outsourcing
costs.
However I have no idea about details.....

>Not to mention that perhaps ISPs don't actually *want* customers who max
>out their broadband links 24x7 , sensible pricing and limits is a good fix
>here.

At least not for a free service as Newsgroups....

CU
--
Juergen

Juergen Helbing

unread,

Sep 25, 2002, 3:37:41 AM9/25/02

to

drec...@yuck.net (Mike Horwath) wrote:

>:>Make it standalone? Call it Busenet? Then we can just add an 'a' to

>:>the front?
>
>: We did discuss this already.
>: 'Freenet' would be a base - but might be not powerful enough
>: for so many hosts.
>: Worse - there would be no chance to create a new powerful
>: structure like Usenet. No time, no money, no interest.
>

>Not true, it would be Usenet but with just bunnies, same software,
>same server sets, etc.
>Companies (loose term) could then decide what they wanted.

We are having this already.
A few hundred Usenet hosts are already forming the "Binary Usenet".
For them the "text-part" is just a minor problem.

I dont believe that any admin would setup two servers
- one for text and one for binaries.
If he has one with binaries then it can handle text easily.
This would be just "double pain"..

>Maybe a new 'approved' header so that things can be classified
>accordingly.

I believe we are having two options:

1) Adding a new header - or using the MIME headers.
The disadvantage would be that this header is not available
for transists - and on XOVER/XHDR today.
It would require a major redesign of a lot of software.

2) Using the message-id for more information.
This would also help transit server - and could be used
very quickly as it does not affect Usenet at all.

There is an old "FNTP" document in my archive which
describes a "new" message-id:
<bin-info> . <unique> @ <host> . bin

I will make more proposals about this soon.

>: AFAIR I specified that abbreviation as:
>: Future News Transport Protocol.
>Wouldn't FNNTP work better, though?

I dont like 5 letter abbreviations. :-)
NTP might be even better... or: BTP....

CU
--
Juergen

Juergen Helbing

unread,

Sep 25, 2002, 3:38:48 AM9/25/02

to

"magical truthsaying bastard roney!" <^#*&$@ennui.org> wrote:

>>Even the "old ideas" are not really bad.
>>But they must be _realized_ one day.
>
>Old ideas MUST be realized one day? Are you mad?

A lot of these "old" ideas have not been realised because
there were not enough ressources. Many of them are
worth re-considering.

That's all I wanted to say.

Sorry.
--
Juergen

Brian Truitt

unread,

Sep 25, 2002, 10:20:26 AM9/25/02

to

> >For an ISP to go for a "good" news server represents a big investment for
> >that ISP. Even for a large ISP with broadband users it might not be worth
> >it.
>
> It would be nice to know how ISPs can find out how much traffic
> is caused by NNTP. Of course the costs of maintaing a "good"
> newsserver must stay comparable to bandwidth and outsourcing
> costs.
> However I have no idea about details.....

We track NNTP traffic via MRTG, Cricket, or other various packages that pull
info from the switch via SNMP. All our news equipment is on a dedicated
switch, so it's separate from any other traffic we'd generate as an ISP.
Separating traffic NNTP generated from traffic generated by the feed to the
reader box is a rough estimate, but I know how much the internal feed/filter
server is pushing so can get close. External feed is about 50-60Mb/s,
customer outgoing traffic is anywhere from 100-150Mb/s peak depending on the
day. Have no idea how that compares with an NSP, probably tiddlywinks in
comparison. And I only consider us an "ok" newsserver at this point, as
completion issues have been showing up the last couple weeks and retention
needs a LONG overdue upgrade.
We only allow connections to our server from IP's directly on our internal
backbone, which is much cheaper than bandwidth to the internet-at-large.

Thomas

unread,

Sep 25, 2002, 11:24:02 AM9/25/02

to

Juergen Helbing wrote:

> AFAIR the Internet acknowledges every single packet - and it retransmits
> missing ones :-)
> Implementing this into Usenet would be also very simple......

Not that simple. A TCP stream has sequence numbers, and this makes it possible
to discover which fractions are missing.

While a certain stream of news posts may be in sequence, the articles get mixed
with other streams and the sequence will get lost.

> I dont know how powerful the large "transit" servers are today.
> But the limit of "500 kB" is now used for more than 5 years.
> Perhaps it would be possible to raise it ?
>
> Please take it as a "request" to your technical skills:
> I want to transmit larger article (up to 10 MB) through your
> feeders. How can you fulfill my request ?

Is 10 MB enough? Or should it go up to 9 GB at once? But, if this 9 GB post gets
interrupted, and your news server has been sending it out before it had received
it all, the effect is the same as an incomplete multipart!

> You could start sending "instantly" - every proxy server does this.

It should be possible to write software that intercepts multiparts, stores them
in a spool, waits until the thing is complete, and only then feeds the whole
thing out. That would increse customer satisfaction a lot I suppose. And it
could be done reletively easy if people would encode their posts accordingly.
That could even be in the subject line.

A modified overchan like program could assemble lists, and spit out the IDs of
the messages to be fed out when a multipart is complete.

Note that this would dramatically slow down multiparts. But if the goal is to
make stuff more efficient, this is the only way. Maybe there can be a sister
hierarchy to alt.binaries named slowprop.binaries....

Now make the slowprop servers be able to fetch missing parts through nnrp from
servers they usually do not peer with.

But you are still chugging great masses of stuff across the net. Making a chain
of squid-like proxies seems so much neater... post just the file description and
a key, and use the Path: header to trace back through the proxies.

Again, that could be a new hierarchy. And the client for fetching the body would
not use nnrp. But who cares.

Thomas

Andrew - Supernews

unread,

Sep 25, 2002, 11:52:55 AM9/25/02

to

In article <15518....@archiver.winews.net>, Juergen Helbing wrote:
> cu...@kcwc.com (Curt Welch) wrote:

>>The entire internet is built on 1500 byte packets. Everything you do on
>>the internet is likely to be chopped into 1500 byte or smaller packets.
>
> AFAIR the Internet acknowledges every single packet - and it retransmits
> missing ones :-)

"the Internet" does nothing of the sort - in fact at the IP level, there
are no acknowledgements and no retransmissions, and data transfer is
assumed to be unreliable.

The TCP protocol (which is merely one of many transport protocols, albeit
a very widely used one) uses sequence numbers, acknowledgements and
retransmissions to build a reliable _point to point_ connection between
two hosts over the unreliable IP layer. Providing reliability over a
multicasting link is very much harder, and is likely to even be impractical
in the case where there is no means for producer and consumers to communicate
directly.

> I dont know how powerful the large "transit" servers are today.
> But the limit of "500 kB" is now used for more than 5 years.

by whom?

The default limit for INN was 1000000 bytes last time I looked, and the
Diablo default appears to be "unlimited" now. Last time I looked at Cyclone
it had a 4MB hard limit, but I don't know what the shipped default was.

We currently apply a 4MB limit to local and transit traffic.

> I dont see any reason why you need to keep the full article
> in memory and start transfer to the next host after it was
> sent completely.

what if the article doesn't successfully arrive?

what if you reject the article after looking at the body?

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

JanC

unread,

Sep 25, 2002, 12:01:43 PM9/25/02

to

arch...@i3w.com (Juergen Helbing) schreef:

> I dont believe that any admin would setup two servers
> - one for text and one for binaries.
> If he has one with binaries then it can handle text easily.
> This would be just "double pain"..

My ISP did. The people that only (or primarily) used the text newsgroups
didn't like it when the server got unresponsive too often because of the
up- & downloads in the *.binaries.* groups...

--
JanC

"Be strict when sending and tolerant when receiving."
RFC 1958 - Architectural Principles of the Internet - section 3.9

Dr.Ruud

unread,

Sep 25, 2002, 3:56:39 PM9/25/02

to

Juergen Helbing skribis:

> I dont believe that any admin would setup two servers
> - one for text and one for binaries.
> If he has one with binaries then it can handle text easily.
> This would be just "double pain"..

xs4all.nl does it, and we like it. The text-groups are also
on the binary server.
http://www.xs4all.nl/helpdesk/nieuws/news.html

--
Affijn, Ruud

Kjetil Torgrim Homme

unread,

Sep 25, 2002, 4:22:57 PM9/25/02

to

[Mike Horwath]:
>

I find it distressing that you are using X-No-Archive in a technical
forum such as this.

> : 2) Using the message-id for more information.

> : This would also help transit server - and could be used
> : very quickly as it does not affect Usenet at all.
>
> : There is an old "FNTP" document in my archive which
> : describes a "new" message-id:
> : <bin-info> . <unique> @ <host> . bin
>
> : I will make more proposals about this soon.
>

> That means that everyones software has to change, far harder than
> getting the transit systems to filter based on a single header.

a new header requires changes in NNTP to be used effectively during
transit. putting the information in the message-id enables filtering
to happen during the CHECK stage, with no compatibility problems. I
think it is easier to get the authors of Power Post et al to change
their software than to upgrade all the news servers on this planet.

--
Kjetil T. ==. ,,==. ,,==. ,,==. ,,==. ,,==
::://:::://:::://:::://:::://::::
=='' `=='' `=='' `=='' `=='' `== http://folding.stanford.edu

Simon Lyall

unread,

Sep 25, 2002, 8:55:00 PM9/25/02

to

Mike Horwath <drec...@yuck.net> wrote:
> Simon Lyall <simon...@ihug.invalid> wrote:
> : Not to mention that perhaps ISPs don't actually *want* customers who max

> : out their broadband links 24x7 , sensible pricing and limits is a good fix
> : here.

> BS!
> I want my DSL users to max out their links...on my internal network.
> I don't want them going outside of my network for their news.
> It is in my best interest that news stays local.

I said "sensible pricing and limits" .

Can you really afford 20,000 (50,000? 100,000? ) customers on 1Mb/s DSL
each maxing out their link downloading from your News server 24x7 ?

Lets assume 1000 of them are instead using the p2p protocol of the week to
download (and upload) 1Mb/s. I assume you are charging your customers
enough to cover that Gb/s of bandwidth for just 1000 customers? Hmm
visi.com charges just $84.00 for a 1Mb/s DSL, I guess their peering costs
are much less than that.

Over this part of the world bandwidth costs too much to give flatrate DSL
(nobody can even make money on flatrate 128Kb/s DSL) to home customers.

Over here we put a 10GB/month limit for International traffic on all DSL
customers. That means I need 1350 (450GB/day * 30 / 10 ) customers using
their entire limit for nothing but news in order to break even just an
news bandwidth. Not to mention I'd have to have a really good server those
customers would be fully confident in in order for them to give up their
subscriptions with remote NSPs.

Curt Welch

unread,

Sep 25, 2002, 11:25:07 PM9/25/02

to

arch...@i3w.com (Juergen Helbing) wrote:
> Warning - this is a long reply.
> There is a short summary at the end ;-)
>
> cu...@kcwc.com (Curt Welch) wrote:
>
> >The main issue I belive with incomplets starts with the very nature
> >of how Usenet works. i.e., no one is in charge, and no one can make
> >anyone fix a broken server.
>
> There are other protocols than NNTP which are working properly
> 'without charge'. So perhaps there is something basically broken.

Maybe. Do those other protocols push 50 Mbits of data 24x7? Or do they
push so little that any geek can pay for it with their pocket change?

The problem with NNTP is that it costs real money, not pocket change,
to make it work (for a full feed).

> >This sets up a problem where making 50,000 servers "work" when no one is
> >in charge close to impossible. Articles will be lost in parts of the
> >network that you have no control over and no access to.
>
> We need an approach closer to TCP/IP with dynamic routing.

Usenet routing is far better and far more dynamic that antyhing TCP is
doing. If you look at all the possible paths an article could take
to get from one host to the next, you end up with thousands of options.
Usenet routing finds the most optimal path for _ever_ article moved between
those hosts. For every article, Usenet calculates which of the thousands
of possible paths will be the fastest at that moment in time, and routes
the article along that path. TCP has nothing like that working for it.

I'll say it again. The technology is great. When it doesn't work, it's
because of management and cost issues, not because of the technology.

However, I will agree that there might be a technology "fix" to the
management and cost issues. So maybe, some change in how usenet works
could "fix" the fact that so many people don't care about it, or address
the cost issues so people can justifying the money needed to support it.

Speaking of cost, the other way to "fix" cost is to improve the value of
usenet. If usenet had more value, then people would be willing to pay more
for it.

> Good Golly.
> As usual your huge answers result in enormous replies
> from my side. Please apologize.

So huge I can't deal with it. I keep trying to reply, but next thing you
know I've written another book and I find I've only responding to one
comment in your post.

> Here a short summary:
>
> * We can add "redundancy" information to all binary content.
> * We can label binary content properly.
> * We can involve the users - giving them more down and upstreams.
> * Servers which receive POSTS can perform completeness tests.
> * Usenet must become more attractive to new (paying) users.

Most people don't like the "wild west" of Usenet. Either they are put off
by the way some people act (and the fact that there is no one in charge to
control this crazy bunch), or they feel there's way too much noise and way
too little content (i.e they want someone to edit the crap for them).
There really is only a small subset of the people of the world that find
themselves drawn to this strange place. You can't change that and still
have it be Usenet.

Curt Welch

unread,

Sep 26, 2002, 12:34:53 AM9/26/02

to

arch...@i3w.com (Juergen Helbing) wrote:

> Large sites with multiple feeders and peers might still have a chance.
> But if the situation is correct then the peering to "smaller sites"
> must be reworked (with better redundancy) urgently.

Fork over $100K for each "small" site on usenet to update they feeders
and you could fix usenet. This month. Next month, the new "fixed" usenet
would have attraced so many usres as to require a re-fix.

The real heart of the problem here is that the people causing usenet to
"break" are not being correctly charged for their damage (i.e the posters).
They are only charged indirectly for their damage. It's much like what all
of us do when we use up the worlds natural resources and are never
"charged" for the damage we do to the world" - the tragady of the commons.
The only thing that keeps posters under control are the lost articles.
That's what makes it "too hard" and causes them to stop posting.

Anything you do to make it easier for people to post and read large
articles will just cause more preasure on the other side of the equation.
That in turn will make volume increase until the lost article problem gets
bad enough to equalize the presure.

> >This is significant for news servers becuse they cache articles in
> >memory to make it fast to re-send to peers. The larger the articles,
> >the more memory you need (or the more likely that the article won't
> >still be cached in memory when it comes time to send.
>
> I dont see any reason why you need to keep the full article
> in memory and start transfer to the next host after it was
> sent completely.

Think about what happens when someone in the tree breaks the transmission.

> You could start sending "instantly" - every proxy server does this.

Proxy servers are single level. Usenet is a 50,000 node proxy server.
Very different requirements.

> And sending _one_ article with 10 MB instead of 20 with 50 MB
> would even reduce the access to the history

True, but history file access is not the problem.

> - and reduce the
> amount of necessary IHAVE (or whatever) requests - with their
> necessary latency time).
>
> I assume you are one of the specialists for high volume transists.
> So _you_ should tell me the benefits and the consequences :-))
>
> >Because usenet is a store and forward system, it works much much beter
> >with smaller articles. It would be very very wrong to try and get
> >everyone to post 10MB articles.
>
> There have been already proposals in earlier discussions
> to sent out 500 MB as one message :-)))

Yeah, but 500 MB articles get transmitted as 300,000 packets, not one.

My point here with all this chat about packets is that you are
addressing the wrong problem. The problem has nothing to do with
the fact that the articles are chopped up, the problem is that
news servers don't know they are supposed to be kept together
and "act" like the one article it is.

They don't know that when they must drop an article, that they
should drop the entire 500MB post, not just some small 300K chunk
in the middle of it.

News severs _must_ drop articles. That's a fact of life with usenet since
we have no other way to control posters. But if we we moved the logic of
transfering large binary files into the news server, instead of hiding it
from the news server like we do now, then the news server would have the
option of dropping the entire file, or sending the entire file.

Your idea about sending it as one 500MB post is correct in terms of getting
the server to do "what it should" - drop the whole thing or send the whole
thing. But, doing it as one "article" on NNTP is not a good answer.

My entire _point_ about posting equal size articles and removing the
"tails" is to make them work just like you are asking for - send the whole
thing or drop the whole thing. It's not the true and correct solution to
this, but it's a quick fix that gets us headed in the right direction.
Severs know how to filter by artical size.

Of course, the proble there is posters learn to post more parts to get
under the filter.

But what about this. If binary articles had a hash of the entire file in
the message-Id (which we have talked about for many reasons), servers which
wanted to "thin" a binary feed down to a level it could handle, could
randomly toss a fixed percent of all binary posts based on the files hash.
If every server which chose to do that used their only raondom selection of
files to pass, then posters could have no way of knowing how far their
binary file would actually go and would not be trained to "do the wrong
thing" like what happens when feeds are thined by article size.

> >(which by the way some people are trying to do now).
> >Hey, was that you posting those 10 MB articles to Newshosting the other
> >day?
>
> No - I know that it does not work actually.
> But AFAIR I've downloaded alread 5 MB messages from
> news.newsreader.com. It seems that you have no "size limits"
> on your own site. Why ?

My servers have hard-coded 4MB article size limits. I don't know how you
could have downloaded anything large. It's just not possible. Is it? Do
I need to check my code? Did I increase the size limit and forget that I
did it?

Nope, it's still 4MB...

#define MAXDATASIZE (4*1024*1024) /* Max size of data on input (i.e. max
article size) */

> Thanks for being engaged in this discussion
> - especially during such a lot of work....

Hey, the servers are moved and seem to be working! I'm hard at work doing
QA testing by reading and posting news articles. :). Only one disk tried
to scare me by playing dead for about an hour, but all on it's own it
decided to spin back to life. (well, two disks if you count that one on
feed1 that died back on Monday).

> CU

Curt Welch

unread,

Sep 26, 2002, 12:40:44 AM9/26/02

to

arch...@i3w.com (Juergen Helbing) wrote:
> Simon Lyall <simon...@ihug.invalid> wrote:
>
> >What I don't understand is *why* we are getting incompletes at all.
> >Assuming a series of articles is posted correctly then a full set will
> >be on the posting server.
>
> A lot of incomplete posts exists because the connection
> fails during POSTing to that first host.
> Most posting programs do not have any kind of retry.

This could be fixed on the server side if the server had a good
way of knowing that those last 20 articles were 20 parts of one
large file. Which just gets back to my points about how a big
part of this problem is the fact that servers only move articles
and are only tricked into moving files. If you stop sending an
article half way though, severs are smart enough to know that
you don't want the first half sent, but they are not smart enough
to know this for files.

News servers tend to re-try every few mintes for days. Much better
than what e-mail does.

Kjetil Torgrim Homme

unread,

Sep 26, 2002, 5:38:26 AM9/26/02

to

[Curt Welch]:

>
> News servers tend to re-try every few mintes for days. Much
> better than what e-mail does.

is it really common for feeders to have enough spool space to do that?
our feeder will only retry binary messages for something like 8 hours
before the message expire.

Curt Welch

unread,

Sep 26, 2002, 1:21:46 PM9/26/02

to

Kjetil Torgrim Homme <kjet...@haey.ifi.uio.no> wrote:
> [Curt Welch]:
> >
> > News servers tend to re-try every few mintes for days. Much
> > better than what e-mail does.
>
> is it really common for feeders to have enough spool space to do that?
> our feeder will only retry binary messages for something like 8 hours
> before the message expire.

Ok, "days" is probably stretching it. Boy I hate it when someone ruins
a good argument with facts. :)

For reference, one of my transit servers has a 4 hour spool at the moement
(a disk just died which cut it's spool in half), and the other currently is
running a 1.6 days - but once I get some load shifted, it will fall to
something in the "hours" range as well.

My real point above was that spool servers tend to try a lot hard to
re-connect than mail servers because they can't afford to get behind. If
they can't connect to a feed, some will keep trying every few seconds at
first. 5 minutes before a re-try is long in the land of feed servers.

Alan Shackelford

unread,

Sep 26, 2002, 4:03:06 PM9/26/02

to

I may be miles off target here, but is there anything in the KAZAA,
Morpheus, or Napster models that could be put to use? It seems these
were (marginally) successful at swapping binaries, albeit controversial
ones. Seems like there might be some useful knowledge there, though.

Alan

Klaas

unread,

Sep 27, 2002, 12:52:14 AM9/27/02

to

After careful consideration, Juergen Helbing muttered:

> btw.: The 'size' is actually not included in the MSG-ID.
> So transit servers dont know the size ahead - what they SHOULD.
>
> I will make a proposal for a new "Binary-Message-ID" format soon.
> This might explain more.

We had a rather extensive discussion about this 5-6 months ago on abnsc.
You should check it out before coming up with something.

-Mike

Jeremy

unread,

Sep 27, 2002, 2:23:26 AM9/27/02

to

Juergen Helbing <arch...@i3w.com> wrote:

> If we succeed to solve the "completeness" problems, then we could
> continue to leave the size of binaries where they are.
>
> But if we are NOT able to fix this problem, then we should try
> to avoid multiparts whenever it is possible.

Interesting -- that sounds backwards to me. If we solve the completeness
problem, then the main reason for smaller parts goes away -- there are no
reposts to worry about, so you don't have to be concerned with how much
data you need to post if a part goes missing. Thus, why not post parts
of 4 megs or even larger?

> Meanwhile also a lot of pictures are posted as "high quality" JPEGs.
> They are ALL splitted as multiparts (600kB...2000 kB).

That's just a bad idea, if you ask me.

--
Jeremy | jer...@exit109.com

Juergen Helbing

unread,

Sep 27, 2002, 1:05:06 AM9/27/02

to

drec...@yuck.net (Mike Horwath) wrote:

>: I will make more proposals about this soon.
>
>That means that everyones software has to change, far harder than
>getting the transit systems to filter based on a single header.

Yes, I know that people would complain again because
Outlook Express does not support these new features :-)

It would be not just a question of "harder" - but also of
"general purpose":
All changes for transits must be
done based on the message-id - because this is the only one
the receiving host knows _before_ it receives the entire
article.
And all changes must be also based on XOVER because
all newsreaders have only this information before fetching
articles or headers.

--
Juergen

Juergen Helbing

unread,

Sep 27, 2002, 1:07:23 AM9/27/02

to

Klaas <spam...@klaas.ca> wrote:

>> I will make a proposal for a new "Binary-Message-ID" format soon.
>> This might explain more.
>
>We had a rather extensive discussion about this 5-6 months ago on abnsc.
>You should check it out before coming up with something.

Could you please provide us with a short "summary" of that discussion?
Checking "rather extensive" discussions includes the danger
to miss your point.

TIA
--
Juergen

Juergen Helbing

unread,

Sep 27, 2002, 1:17:56 AM9/27/02

to

Andrew - Supernews <andrew...@supernews.com> wrote:

>The TCP protocol (which is merely one of many transport protocols, albeit
>a very widely used one) uses sequence numbers, acknowledgements and
>retransmissions to build a reliable _point to point_ connection between
>two hosts over the unreliable IP layer.

This is what I'm talking about. Reliable streams on an unreliable medium.
I believe that there are far more "packet losses" on TCP which are recovered
than between Usenet transits.
Perhaps we can learn from that protocol ?
We should also talk about unconventional methods.

>> I dont know how powerful the large "transit" servers are today.
>> But the limit of "500 kB" is now used for more than 5 years.
>
>by whom?

By the users. Because all larger posts have poor propagation.

>> I dont see any reason why you need to keep the full article
>> in memory and start transfer to the next host after it was
>> sent completely.
>
>what if the article doesn't successfully arrive?

The same thing you are doing today in that case.

>what if you reject the article after looking at the body?

I was told that no transit server has the performance
to scan the body on the flight.

But I agree that it would be _nice_ to have a new kind
of IHAVE for huge binary articles.
And - btw - NNTP/NNRP lacks from a STOP (CTRL+C)
which permits to cancel _all_ multiline transmissions from the
_sending_ and the _receiving_ side.

--
Juergen

Juergen Helbing

unread,

Sep 27, 2002, 1:22:28 AM9/27/02

to

Thomas <z...@spam.invalid> wrote:

>> Please take it as a "request" to your technical skills:
>> I want to transmit larger article (up to 10 MB) through your
>> feeders. How can you fulfill my request ?
>
>Is 10 MB enough? Or should it go up to 9 GB at once? But, if this 9 GB post gets
>interrupted, and your news server has been sending it out before it had received
>it all, the effect is the same as an incomplete multipart!

10 MB would be enough for today - and the next few years.
This would cover 100% of the larger picture posts and 80% of
the MP3 posts (20% are full CD tracks).
The larger binaries could be protected by PARITY information.

>Now make the slowprop servers be able to fetch missing parts through nnrp from
>servers they usually do not peer with.

Great idea. I was also thinking about such approaches.

>But you are still chugging great masses of stuff across the net. Making a chain
>of squid-like proxies seems so much neater... post just the file description and
>a key, and use the Path: header to trace back through the proxies.
>Again, that could be a new hierarchy. And the client for fetching the body would
>not use nnrp. But who cares.

This would be a "special binary news network".
Perhaps the best way overall.

--
Juergen

Juergen Helbing

unread,

Sep 27, 2002, 1:42:38 AM9/27/02

to

cu...@kcwc.com (Curt Welch) wrote:

>> Large sites with multiple feeders and peers might still have a chance.
>> But if the situation is correct then the peering to "smaller sites"
>> must be reworked (with better redundancy) urgently.
>
>Fork over $100K for each "small" site on usenet to update they feeders
>and you could fix usenet. This month.

We know that this will not happen.

>Next month, the new "fixed" usenet
>would have attraced so many usres as to require a re-fix.

The purpose of [FNTP] would be to attract many new users.
Dont you want this ?
Do you expect that there are 10 million new users -
but they are only lurkers ?

>The only thing that keeps posters under control are the lost articles.
>That's what makes it "too hard" and causes them to stop posting.

You are out of date.
The only thing which keeps poster "under control" is their lack
of upstream bandwidth - especially with ADSL.
And - of course - the posting limits defined in most newsgroups
by the users (and FAQs) themselves.

>Anything you do to make it easier for people to post and read large
>articles will just cause more preasure on the other side of the equation.
>That in turn will make volume increase until the lost article problem gets
>bad enough to equalize the presure.

In the opposite direction we could implement "auto-drops" which
make it impossible to send out large binaries. Would this solve
your problem ? :->>

>My point here with all this chat about packets is that you are
>addressing the wrong problem. The problem has nothing to do with
>the fact that the articles are chopped up, the problem is that
>news servers don't know they are supposed to be kept together
>and "act" like the one article it is.

If a transit server would know that messages "belong together"
then it would again have to treat them as one entity. It had
to store them - and send them as a bunch. I dont see any
differences to the "one huge file".

>Your idea about sending it as one 500MB post [...]

This is NOT my idea. I am asking for 10 MB posts
to have an easy fix for the multipart problem for pictures
and MP3 - a huge part of the "common interest".

That "500 MB" proposal was from third party.
And I dont believe that it could work.

But you did not answer my question what must be
_done_ to permit 10 MB articles as an "easy fix".

>My entire _point_ about posting equal size articles and removing the
>"tails" is to make them work just like you are asking for - send the whole
>thing or drop the whole thing. It's not the true and correct solution to
>this, but it's a quick fix that gets us headed in the right direction.
>Severs know how to filter by artical size.

Juergen Helbing

unread,

Sep 27, 2002, 1:44:34 AM9/27/02

to

Alan Shackelford <asha...@jhmi.edu> wrote:

>I may be miles off target here, but is there anything in the KAZAA,
>Morpheus, or Napster models that could be put to use? It seems these
>were (marginally) successful at swapping binaries, albeit controversial
>ones. Seems like there might be some useful knowledge there, though.

I am developing and using this method for three years now.
Check: www.winews.net
This is "P2P" for newsgroups.
Power users dont have problems with incompleteness.
It is the huge "OE" crowd which is the problem.

--
Juergen

Juergen Helbing

unread,

Sep 27, 2002, 1:54:14 AM9/27/02

to

cu...@kcwc.com (Curt Welch) wrote:

>> There are other protocols than NNTP which are working properly
>> 'without charge'. So perhaps there is something basically broken.
>
>Maybe. Do those other protocols push 50 Mbits of data 24x7? Or do they
>push so little that any geek can pay for it with their pocket change?

These "geek applications" have millions of users online simultanously.
The amount of data they are swapping all the time is far higher
than Usenet.

>The problem with NNTP is that it costs real money, not pocket change,
>to make it work (for a full feed).

Usenet has already lost _many_ customers to those "geek apps".
And if we dont act in the next 12-24 months then these "geek apps"
will replace Binary Usenet quickly.

It is a question of _your_ survival as an NSP to offer solutions.
If we cannot solve the problems - with users paying for access -
then you'll face the same problem as the media industry:
Why paying for "bad service" when free service is even better...

>> We need an approach closer to TCP/IP with dynamic routing.
>
>Usenet routing is far better and far more dynamic that antyhing TCP is

>doing. [...]

>Usenet routing finds the most optimal path for _ever_ article moved between
>those hosts. For every article, Usenet calculates which of the thousands
>of possible paths will be the fastest at that moment in time, and routes
>the article along that path. TCP has nothing like that working for it.

If you would be right (and this sounds of course fantastic)
then there would be no problem at all. But we are HAVING problems.

>I'll say it again. The technology is great. When it doesn't work, it's
>because of management and cost issues, not because of the technology.

Should we stop discussion ?

>Speaking of cost, the other way to "fix" cost is to improve the value of
>usenet. If usenet had more value, then people would be willing to pay more
>for it.

I dont agree. I would just agree with "those users who are using free
services today would start to pay".

>Most people don't like the "wild west" of Usenet. Either they are put off
>by the way some people act (and the fact that there is no one in charge to
>control this crazy bunch), or they feel there's way too much noise and way
>too little content (i.e they want someone to edit the crap for them).

My project to "moderate most binary groups" failed.
Because the _contributors_ dont accept it.

>There really is only a small subset of the people of the world that find
>themselves drawn to this strange place. You can't change that and still
>have it be Usenet.

Again I dont agree.
Already today there are hosts which ignore cancels - and hosts
with close-to-perfect spam filters. Most users love the despammed
groups. Ask why ?

CU
--
Juergen

Juergen Helbing

unread,

Sep 27, 2002, 1:57:15 AM9/27/02

to

cu...@kcwc.com (Curt Welch) wrote:

>> I believe you are wrong. eMail has two important features which
>> are NOT (yet) available for Usenet:
>>
>> "Retry 1-4-8-24 hours later"
>
>News servers tend to re-try every few mintes for days. Much better
>than what e-mail does.

Then it must be the "hierarchical level" or mail-forwarders
which is used in case of unavailability of a mail-host.

Perhaps this would be a good solution:
If a host is not reachable for a short period then a sequence KNOWN
replacements is used instead - which guarantee delivery to the
broken host.

Just fooling around...

--
Juergen

Kjetil Torgrim Homme

unread,

Sep 27, 2002, 11:35:05 AM9/27/02

to

[Juergen Helbing]:

>
> >My point here with all this chat about packets is that you are
> >addressing the wrong problem. The problem has nothing to do with
> >the fact that the articles are chopped up, the problem is that
> >news servers don't know they are supposed to be kept together
> >and "act" like the one article it is.
>
> If a transit server would know that messages "belong together"
> then it would again have to treat them as one entity. It had to
> store them - and send them as a bunch. I dont see any differences
> to the "one huge file".

the transit server might not care (if you extend NNTP), but the reader
server can leave out incomplete posts (I guess you'll object to that,
since you are using multiple servers to make them complete), and the
news administrator can get completeness statistics much more easily.

> But you did not answer my question what must be _done_ to permit
> 10 MB articles as an "easy fix".

NNTP needs a mechanism for sending partial articles, and for aborting
transfer. or you can just use small articles (<1 MiB), the way you do
today. what is the downside to using 1 MiB articles?

> You will love my "Msg-ID-proposal" :-)

encoding it in the Message-ID is nice since it requires no change to
NNTP.

> > But what about this. If binary articles had a hash of the entire
> > file in the message-Id
>
> This is not possible.

why not?

> Identifying binaries as identical before they are downloaded is a
> bad idea.

why?

> But I can offer you this:
> <p17o580s5...@server.bin>
> Part 17 of 580 size is 587 MB.

yes, something like that'd be nice. the pseudo top domain is good,
perhaps we could have IANA reserve it officially. I'd use a currently
unusual character in the Message-ID, and specify the part size, the
offset and the total size (all in octets).

I did a quick frequency count on 113637 Message-IDs from a text feed.
the most uncommon characters were:

0 ' legal
0 ` legal
1 ^ legal
1 { legal
1 } legal
2 ; ILLEGAL
3 ? legal
4 & legal
4 | legal
4 ~ legal
5 ! legal
7 < ILLEGAL
16 space ILLEGAL
18 : ILLEGAL
47 * legal
: : :
96602 $ legal (homage to Rich $alz)
: : :
136129 2 legal
167120 1 legal
173337 0 legal

(I include the last three as a curiosity -- they all occur more times
than the total number of Message-ID's. this scheme will not help in
that regard :-)

anyway, ' seems like a good candidate.

my proposal:

<offset'part-size'total-size'id-...@id-right.nntpbin>

offset, part-size and total-size should be represented in hexadecimal
(without any prefix like "0x") to make it easy to code and understand,
while keeping it slightly more efficient spacewise than decimal. the
sizes and offsets should refer to the _unencoded_ data. id-left and
id-right are as defined in RFC 2822. nntpbin is a text constant.

if the partsize is (relatively) constant, it's easy to work out how
many parts there are in total, and which part number this is. you
only have a problem if the tail part is smaller than the others, and
there is no reason it should.

Klaas

unread,

Sep 27, 2002, 4:06:10 PM9/27/02

to

After careful consideration, Juergen Helbing muttered:

> Klaas <spam...@klaas.ca> wrote:

My point is that we've already gone into many of the details needed to
implement an method of encoding binary data into the msgid. Stuff as
obvious as exactly what to include, as well as a subtle as how many bits
of message hash could be included without inflating the size
unnecessarily (since message-ids are transmitted so many times anyway).
It was for that last issue that I whipped up the hashing chart I posted
here a few weeks ago (that is why it concentrates on smaller bit sizes).

A summary won't do you much good--if you really want to develop a robust
format you'll have to read past discussions in their entirety.

-Mike

Klaas

unread,

Sep 27, 2002, 11:56:31 PM9/27/02

to

After careful consideration, Juergen Helbing muttered:

>>But what about this. If binary articles had a hash of the entire file

>>in the message-Id
>
> This is not possible.

Why not?

> dentifying binaries as identical before
> they are downloaded is a bad idea. This is why yEnc does
> not offer the "crc32" in the headers.
>
> But I can offer you this:
> <p17o580s5...@server.bin>
> Part 17 of 580 size is 587 MB.

You should have a key on the LHS that indicates it is a binary in
addition to the pseudo-domain. Also, you can fit a lot more in there if
you use a better encoding scheme, like base64. Stuff like the actual
size of the file in bytes. Just having MiB is useless for error
checking. Also, why bother with parts when you can encode the byte
offset? This will enable programs to combine parts from different
posters into one file.

>>hash. If every server which chose to do that used their only raondom
>>selection of files to pass, then posters could have no way of knowing
>>how far their binary file would actually go and would not be trained
>>to "do the wrong thing" like what happens when feeds are thined by
>>article size.
>
> Any "thinning out" other than based on newsgroup/hierarchy is sick.

What about misplaced binaries on text servers (on any server, actually)?
How about if the algorithm drops entire files instead of parts?

-Mike

ZoSo

unread,

Sep 28, 2002, 2:05:46 AM9/28/02

to

In news:14547....@archiver.winews.net,
Juergen Helbing scribed:

> [snip]

> Usenet has already lost _many_ customers to those "geek apps".
> And if we dont act in the next 12-24 months then these "geek apps"
> will replace Binary Usenet quickly.

> [mo' snip]

If that's your motivation, Jeurgen... try some other perspectives:

Our saddle company has already lost _many_ customers to those "horseless carriages".
And if we dont act in the next 12-24 months then these "horseless carriages"
will replace saddles quickly.

Our telegraph company has already lost _many_ customers to those "telephones".
And if we dont act in the next 12-24 months then these "telephones"
will replace telegrams quickly.

Our ice company has already lost _many_ customers to those "electric refrigerators".
And if we dont act in the next 12-24 months then these "electric refrigerators"
will replace ice boxes quickly.

Our movie theatre has already lost _many_ customers to those "VCRs".
And if we dont act in the next 12-24 months then these "VCRs"
will replace movie theatres quickly.

Our VCR company has already lost _many_ customers to those "DVDs".
And if we dont act in the next 12-24 months then these "DVDs"
will replace VCRs quickly.

et cetera, ad nauseam.

So, did you stop to think maybe Binary Usenet *should* be replaced by P2P networks?

A Linux box, a download from http://opennap.sourceforge.net, and a DSL line can handle 1000 nap clients, easy.
Use dynamic domain services and don't advertise on Napigator. Far-more secure for users than IRC is, IMHO.

--
["reader machines". <laughter>. That is rich.] - TLO, 18 Sept 2002, in abs.mp3.d

Curt Welch

unread,

Sep 28, 2002, 10:54:23 AM9/28/02

to

arch...@i3w.com (Juergen Helbing) wrote:
> cu...@kcwc.com (Curt Welch) wrote:

> >The only thing that keeps posters under control are the lost articles.
> >That's what makes it "too hard" and causes them to stop posting.
>
> You are out of date.
> The only thing which keeps poster "under control" is their lack
> of upstream bandwidth - especially with ADSL.
> And - of course - the posting limits defined in most newsgroups
> by the users (and FAQs) themselves.

yes, that's my point as well, usenet volume is limited (and created) by the
technology available to us.

But, the "too hard" issue I talk about doesn't effect the people posting as
much as it effects the number of people willing to post. No matter what
the qroup FAQ says, if there are only two people posting to a group you get
a lot less volume than if you have 10,000 people posting to the group.

> But I can offer you this:
> <p17o580s5...@server.bin>
> Part 17 of 580 size is 587 MB.

I really think it would be wrong to use part numbers like that.
You need to describe file segments by byte range, not by part numbers.
It's fine to continue to use part numbers in the subject to make multipart
posts compatible with current usage (and easy to understand), but
when you encode information in the headers for the software to use,
byte ranges give you so much more power to do advanced things like
reposting a single segement in multiple smaller parts, or to take
two different posts of the same file (done with different line counts),
and let the software correctly merge all the pieces back together.

> Any "thinning out" other than based on newsgroup/hierarchy is sick.

That's true. From a user perspective that's easy to say. Try running
your own server which supports thousands of users and then try to pick
which groups or hierarchys to delete. And try maintain a set of peers,
who provide you feeds only out of the goodness of their heart, where every
few weeks you have to tell them to change the list because volume has once
again grown too large. News servers that can't get everything, but want to
get as much as they can, need a dynamic system to keep their pipes filled
to whatever level they can afford. Thining by hierarch is totaly
unworkable for that. You might block some larger hierarchys to get your
feed down to the range of what you want, but then you need a dynamic system
to get as much as you can within your bandwidth/cost limits.

> >> No - I know that it does not work actually.
> >> But AFAIR I've downloaded alread 5 MB messages from
> >> news.newsreader.com. It seems that you have no "size limits"
> >> on your own site. Why ?
> >
> >My servers have hard-coded 4MB article size limits.
>
> Alzheimer !

So does INN and cyclone and typhoon, and most servers (i.e., they have
hard coded article size limits).

My server caches articles in memory. On 10MB article sitting in the memory
cache takes up space that could have been used to hold over 2000 text
articles. If instead it comes in as 100 100K posts, then they can be
expired from the cache as they are processed. Most servers are written to
assume an entire article can not only fit in memory at once, but that you
can hold a lot of these articles in memory at the same time.

Processing "unlimited" size messages would be impossible with my design and
I think that's probably true of all systems. I can change my hard coded
limit to 100MB and it would work, but the throughput on my transit servers
would really suck, and I'd have to add a lot more memory to all my servers
to keep performance at the same level (assumeing people were actaully
posting articles like that).

You can tune machines to move high volumes of data easier if the data is
all the same size. It's hard enough to deal with 1K cancel messages all
the way up to 1MB posts as it is. Extending it another order of magnatude
by taking the "packet" size all the way up to 10MB only makes it harder to
make the news server move this stuff efficiently.

Increasing the size of posts will not "fix" anything. If we instead
creating a cleaner system for chopping large files into lots of little
parts, things could be made to work much better in the long run.

For example, imagine if we got to the point where news server understood
that they were moving large files, and not just text articles. When you
download headers, the server could list the file parts separately from the
"real" text articles. It could combine parts together and simply list byte
ranges available for download for each file, and if all the parts were
available, it would show just a single header-like entry for the file. So
instead of downloading 100,000 article headers and having the news client
guess at what was what, the news server would give the newsreader one list
of all the file segments, and a different (old style XOVER-like list of the
non-binary posts to the group).

Or, maybe if the data was encoded in the corret format, the news server
could actually combine multipart posts together on the fly, and make them
look as if they were single huge articles.

There is just so many things that could work so much better if the news
servers knew they were transporting files and were able to deal with files
at that level instead of what we do know of hiding binary files inside of
text articles and forcing the news client to do all the work of finding all
the file parts and putting them back together again.

And though all the details have not been worked out, it seems to me that we
can transition from the current system to this new "binary file support"
system in ways that keeps things backward compatible to make the transition
easier.

Curt Welch

unread,

Sep 28, 2002, 12:04:07 PM9/28/02

to

arch...@i3w.com (Juergen Helbing) wrote:
> cu...@kcwc.com (Curt Welch) wrote:
>
> >> There are other protocols than NNTP which are working properly
> >> 'without charge'. So perhaps there is something basically broken.
> >
> >Maybe. Do those other protocols push 50 Mbits of data 24x7? Or do they
> >push so little that any geek can pay for it with their pocket change?
>
> These "geek applications" have millions of users online simultanously.
> The amount of data they are swapping all the time is far higher
> than Usenet.

I doubt that. The volume they move is probably higher than a full usenet
news feed, but if you count everything they move, then you have to count
the amount of data moved between NNTP servers and end-users as well. i.e.
a 10MB file that gets download 10000 times needs to be counted as 100GB of
data moved, not 10MB.

I could easilly be wrong about the other applications however because I
don't know much about what is going on with those.

You can bet that a lot more data is moved by HTTP than all of these other
applications combined however. But that's not the point.

> >The problem with NNTP is that it costs real money, not pocket change,
> >to make it work (for a full feed).
>
> Usenet has already lost _many_ customers to those "geek apps".
> And if we dont act in the next 12-24 months then these "geek apps"
> will replace Binary Usenet quickly.

We don't need to save Usenet that way. If the geek apps work better for
sharing files, they should use them. I would actually be happy to see all
the binary data move from Usenet to the geek apps and let Usenet go back to
text. That might be by far the best way to "save" Usenet.

> It is a question of _your_ survival as an NSP to offer solutions.

To be honest, I'd rather save Usenet than save my own business. And if
saving Usenet means getting rid of the binary files, and going back to a
model where the ISPs all run their own small Usenet text servers, that's
fine by me. But to be honest, I don't see that happening.

It really is very unclear if Usenet can ever be made to work well for
transporting very large files. It was never designed to do that.

The heart of the Usenet technology is the fact that messages must have
unique names (the message-id) for the flooding algorithm to work. But
like I've said, it's very hard to deal with variable sized messages
when the range of size can span so many orders of magnatude. Systems
tuned for moving 2K messages work very differently than systems tuned
for moving 2MB messages. You end up having to do all sorts of complex
stuff in a news server (like setting up multiple feeds for different
size ranges of articles) to keep it working smoothly.

As technology advances, the size of the files we want to move keeps going
up, buy small text articles stay the same size. So this means we have an
every expanding range of article sizes to deal with. Like you have said,
people keep parts small (500Kish) to make sure they move well though
usenet, but then we have 1000s of headers to dowload which is just stupid.
Imaging if you got a different header for every IP packet sent.

Posting large articles won't work because we can't move then effeciently,
but breaking them up into 10,000 parts and forcing newsreaders to download
millions of headers and try to match them all up is getting completely out
of hand.

To keep it working, we need to build the transport of "files" into Usenet.

Each file being transported needs to be given a unique (message-id like)
name. And then we need to move the segmets as if each segment were just
part of the file.

For example, when you post a file to Usenet, you should upload one article
header, with one Message-Id, and then upload the file.

The server could then chop the 10GB file article into small pieces and send
them out to Usenet by saying IHAVE <file-id-whatever> bytes 0 to 9999, do
you want it?, then bytes 10000-19999 do you what that, etc.

In the headers, the post would show up as a single entry, not 10,000
entries. But it might have support to show that only certain segments of
the file were currently available for download.

In other words, the news servers could do a lot more in terms of breaking
the files up and combinging them back together again so that clients didn't
have to deal with that so much. I think basic support of this belongs in
the servers and not in the clients and the large the files get, the more
obvious it as that we have this being done in the wrong place.

> If we cannot solve the problems - with users paying for access -
> then you'll face the same problem as the media industry:
> Why paying for "bad service" when free service is even better...

It could be that Usenet just isn't the answer, in which case we would be
doing everyone a dis-service by encourging them to stick with the wrong
technology long after better solutions had been created.

For example, before the web, we used Usenet a lot to try and distribute
static data (in the forms of FAQs mostly). But it didn't work very well
because stuff had to be constantly re-posted, and it couldn't be updated
very quickly, and how often it needed to be re-posted was a function of
server retention which was of course different for every server, so we
tried to fix it by adding stuff like Superceeds: and people came up
with all sorts of complex systems to make the distribution of documents
over Usenet work better.

But then the web was invented. And it works so so much better for that
type of thing than usenet ever did that it would be stupid to try to
force people to keep using Usenet for that when you have the web.

If a better way to distribute files (which maintains the other advantages
of Usenet) comes along, it would be stupid to play games to try and make
people stay with a "broken" technology just because we wanted them to.

> >> We need an approach closer to TCP/IP with dynamic routing.
> >
> >Usenet routing is far better and far more dynamic that antyhing TCP is
> >doing. [...]
> >Usenet routing finds the most optimal path for _ever_ article moved
> >between those hosts. For every article, Usenet calculates which of the
> >thousands of possible paths will be the fastest at that moment in time,
> >and routes the article along that path. TCP has nothing like that
> >working for it.
>
> If you would be right (and this sounds of course fantastic)
> then there would be no problem at all. But we are HAVING problems.

Right. My point is that I think you are barking up the wrong tree in
your attempt to identify the cause and the solution for those
problems. i.e, we don't need an approach closer to TCP/IP. We already
have one that's better than that in terms of it's "dynamic routing
ability" which you talked about.

> >I'll say it again. The technology is great. When it doesn't work, it's
> >because of management and cost issues, not because of the technology.
>
> Should we stop discussion ?

No. First, just because I think something doesn't make it a fact. That's
why we talk about it.

And, even if I'm right, there might be technology solutions to management
issues. It's just important to understand that what we need to change is
not the underlying transport system in terms of it's ability to detect and
correct erorrs, or to find better "routing". We need to understand why
usenet servers are constantly mis-managed and try to understand what change
we could make to improve that (either reduce management cost or find a way
to motivate people to manage their servers correctly).

> >Speaking of cost, the other way to "fix" cost is to improve the value of
> >usenet. If usenet had more value, then people would be willing to pay
> >more for it.
>
> I dont agree. I would just agree with "those users who are using free
> services today would start to pay".
>
> >Most people don't like the "wild west" of Usenet. Either they are put
> >off by the way some people act (and the fact that there is no one in
> >charge to control this crazy bunch), or they feel there's way too much
> >noise and way too little content (i.e they want someone to edit the crap
> >for them).
>
> My project to "moderate most binary groups" failed.
> Because the _contributors_ dont accept it.

Of course. Usenet isn't "broken". Usenet is Usenet. We shouldn't try to
change that. Usenet is a place where you can mostly do what you want and
say what you want. That's what attracts so many to it in the first place
but at the same time keeps so many people away from it.

> >There really is only a small subset of the people of the world that find
> >themselves drawn to this strange place. You can't change that and still
> >have it be Usenet.
>
> Again I dont agree.
> Already today there are hosts which ignore cancels - and hosts
> with close-to-perfect spam filters. Most users love the despammed
> groups. Ask why ?

I don't see how your comment has much to do with what I said.

Because you disable cancles and install spam features doesn't mean you
have changed the basic characteristic of Usenet. It's still people just
blowing a lot of hot air. There's no control over quality and very little
accountability for content. That puts Usenet on the entire different end
of the scale from something such as an edited publication. spam and rouge
cancles are just the worst of the worst. That just tries to take usenet
from a score of -1 back to a score of 0 on a scale of 1 to 10 where 10 is
well written edited publication.

Usenet isn't useless, but it's close to it. It's mostly just a place for
people to hang out and have fun, much like the geek equivalent of the
corner pub. At the pub you drink (which is how the pub stays in business),
on Usenet you download files (which is how the servers pay to keep
running).

There's only a limited segment of the population what would rather hang out
on Usenet than go down to the corner pub, and nothing we do to make Usenet
work better is going to change that. Usenet is not a viable "tool" for
sharing information. Anyone that has made the mistake of thinking that has
gone out of business. Most people that "need" information, such as
techical help, do much better by using serarch tools to find answers on web
sites and elsewhere instead of going to usenet. Usenet is good to keep
touch with what's going on in some segment of the world, but it's mostly a
waste of time when trying to get any type of real work done.

Usenet is mostly just a geek entertainment system that far too many people
try to pretend as some type of "real" value to society.

Curt Welch

unread,

Sep 28, 2002, 12:42:51 PM9/28/02

to

Kjetil Torgrim Homme <kjet...@ifi.uio.no> wrote:

> my proposal:
>
> <offset'part-size'total-size'id-...@id-right.nntpbin>

Yes, that's heading towards the right direction. We discussed that
at some point in the last year and I had created what I thought
was a workable Message-ID format.

The other thing that would be helpful to put in is small hash of the
file contents. Not long enough to identify the file (that would make
the Message-Id just too long) but one
which was long enough to greatly narrow down the number of articles
to search when trying to match up parts, or when trying to spot
a re-post a file which you had missing segments for.

A large hash would be included in the headers so for any article which
looked like it might be the segment which was needed, the newsreader
could download the full header and see if the file hash matched. The
small hash in the Message-ID would greatly reduce the number of
full HEADs that had to be done to identify articles.

> offset, part-size and total-size should be represented in hexadecimal
> (without any prefix like "0x") to make it easy to code and understand,
> while keeping it slightly more efficient spacewise than decimal. the
> sizes and offsets should refer to the _unencoded_ data. id-left and
> id-right are as defined in RFC 2822. nntpbin is a text constant.
>
> if the partsize is (relatively) constant, it's easy to work out how
> many parts there are in total, and which part number this is. you
> only have a problem if the tail part is smaller than the others, and
> there is no reason it should.

Well, it's an estimate at best because there are different ways to split
up a file. For example, a 31 byte file could be broken into size of
5 and 4 bytes each like this:

5 5 5 4 4 4 4

or 5 and 1 byte files like this:

5 5 5 5 5 5 1

The first way is better, but when looking at a single segment, it's complex
to know for sure how it's divided up because there are other options than
these.

More imporant is the stimple fact you don't give a shit about the part
count or the part number if you have offset and size information. For
backward compatibility, part counts can be used in the subject for
newsreaders matching up articles by the subject, but for newsreaders that
knew how to use the information in the Message-ID, they would ignore the
subject.

If people match articles by hand, part counts are very useful, trying to
comprend offset and size data would be a real bitch. But the point is that
you want the computer to do this for you, and for the computer, offset and
size is what makes it easy.

J.B. Moreno

unread,

Sep 28, 2002, 2:57:35 PM9/28/02

to

In article <20020924221518.021$G...@newsreader.com>,
Curt Welch <cu...@kcwc.com> wrote:

> Gnerally speeking, it's very very hard to know when your server is broken.
> They never look broken - they get lots of news, they have free CPU time,
> they "look" like the same as they did when they were working.

I hadn't considered it from that angle before, but the message-id hack
could help there as well -- aiding in automatic group stats. The
server software could be modified to inform the admin when there's a
problem.

--
J.B. Moreno

Curt Welch

unread,

Sep 28, 2002, 3:48:57 PM9/28/02

to

Klaas <spam...@klaas.ca> wrote:
> After careful consideration, Juergen Helbing muttered:
>
> >>But what about this. If binary articles had a hash of the entire file
> >>in the message-Id
> >
> > This is not possible.
>
> Why not?

A hash is simple. There's plenty of room in the Message-ID for a one-bit
hash of the entire file for example. But a hash large enough to reduce
collisions to a point where you could accurately use the hash to identify
segments of the file you were looking for would push the size of the
Messaage-ID to questionable lengths (though not past the point of working).

Message-IDs get moved over the net and stored more times than all other
parts of an article by at least a factor of 10. Any byte you add to a
Message-ID is just as bad as adding 10 bytes elsewhere in the article. A
good hash would be at least 128 bits (MD5), and in hex you need 32
characters. Add to that the other stuff like offset, and file size, and
you end up with a message-id of questionable length:

<bfcc67b15898b4bc55831cd6acf2f85f$b0031b$a2dc$ran...@something.bin>

This is still shorter than some already on Usenet.

If we were simply adding this stuff to allow newsreaders to match up parts
more accurately, then entire hash does not need to be in the Message-ID.
We can use a much shorter hash (say the first 8 buytes from above), and put
the full 128 bit (or higher) hash in the article header. So a newsreader
looking for a file with a has of:

bfcc67b15898b4bc55831cd6acf2f85f

Could look for messages-IDs which match:

<bfcc67$*bin>

And then to find out if it was the same file, do a HEAD on the article to
get the full hash.

However, if news servers start to deal with these as file segements and
special case how file segmets are dealt with (vs normal articles), then
there might be some logic in putting a large hash in the Message-IDs and to
expect the servers to be able to use that to store the file. Maybe. Or
Maybe it's still fine evey for the NNTP server if the Hash is in the header
elsewhere and not in the Message-ID.

Bill Cole

unread,

Sep 28, 2002, 5:06:45 PM9/28/02

to

In article <20020928154857.814$t...@newsreader.com>,
cu...@kcwc.com (Curt Welch) wrote:

> A
> good hash would be at least 128 bits (MD5), and in hex you need 32
> characters.

But why use hex??? I think 6 bits per character is a better idea than 4,
and quite workable. That's 22 characters.

--
Bill Cole
This message is the official opinion of the voices in my head.
I do not channel my bosses, their lawyers, or Andy Kaufman

Bill Cole

unread,

Sep 28, 2002, 5:16:46 PM9/28/02

to

In article <14547....@archiver.winews.net>,
arch...@i3w.com (Juergen Helbing) wrote:

> Usenet has already lost _many_ customers to those "geek apps".
> And if we dont act in the next 12-24 months then these "geek apps"
> will replace Binary Usenet quickly.

Cool. How can we speed this? Would injecting noise into discussion of
fixes help? What about sending a new willing and attractive woman to
your door every day? how bribable are you and others trying to make
news more workable for binaries?

> It is a question of _your_ survival as an NSP to offer solutions.

Usenet would be a better place if the binaries went away. The protocols
for transport and the logical model of the data set are absolutely
absurd as a mechanism for distributing and storing anything other than
conversations. The death of binary news and the ensuing collapse of the
binary-focused NSP's would be a Very Good Thing for Usenet. An event
worth speeding and assisting, even if it costs the employment of some
fine people like Curt and Jeremy and Andrew.

Kjetil Torgrim Homme

unread,

Sep 28, 2002, 7:35:01 PM9/28/02

to

[Curt Welch]:

>
> Kjetil Torgrim Homme <kjet...@ifi.uio.no> wrote:
>
> > my proposal:
> >
> > <offset'part-size'total-size'id-...@id-right.nntpbin>
>
> Yes, that's heading towards the right direction. We discussed
> that at some point in the last year and I had created what I
> thought was a workable Message-ID format.

hmm, I thought I had followed that debate, but I can't recall your
format. this was part of a suggestion of "misusing" References,
wasn't it?

> The other thing that would be helpful to put in is small hash of
> the file contents. Not long enough to identify the file (that
> would make the Message-Id just too long) but one which was long
> enough to greatly narrow down the number of articles to search
> when trying to match up parts, or when trying to spot a re-post a
> file which you had missing segments for.

hmm, a 16-bit CRC like in sum(1)?. of course, like sum(1), the
total-size can be used for narrowing down the list of candidates as
well.

> > if the partsize is (relatively) constant, it's easy to work out
> > how many parts there are in total, and which part number this
> > is. you only have a problem if the tail part is smaller than
> > the others, and there is no reason it should.
>
> Well, it's an estimate at best because there are different ways to
> split up a file. For example, a 31 byte file could be broken into
> size of 5 and 4 bytes each like this:
>
> 5 5 5 4 4 4 4
>
> or 5 and 1 byte files like this:
>
> 5 5 5 5 5 5 1

surely you're not suggesting a part size of 5 bytes? :-)

in reality, you will have a Ogg file, like "01_Angels Turn to
Devils.ogg" which 5967032 octets. to keep it under the common 512k
transit limit, we choose a maximum part size of 500 000 (unencoded),
leading to 12 parts, each consisting of 497253 octets, but the last
part will be just 497249. the msgids will be

<0'79665'5b0cb8'id-...@id-right.nntpbin>
<79665'79665'5b0cb8'id-...@id-right.nntpbin>
<f2cca'79665'5b0cb8'id-...@id-right.nntpbin>
...
<537657'79661'5b0cb8'id-...@id-right.nntpbin>

the last part N will never be more than N octets smaller than the
first 1..N-1 parts. it is therefore trivial to calculate the part
number.

> The first way is better, but when looking at a single segment,
> it's complex to know for sure how it's divided up because there
> are other options than these.

this algorithm should be part of the spec.

> More imporant is the stimple fact you don't give a shit about the
> part count or the part number if you have offset and size
> information.

right. it's easier to calculate the percentage downloaded directly
from the octet counts than to go via part numbers.

Klaas

unread,

Sep 28, 2002, 11:20:19 PM9/28/02

to

After careful consideration, Bill Cole muttered:

> In article <20020928154857.814$t...@newsreader.com>,
> cu...@kcwc.com (Curt Welch) wrote:
>
>> A
>> good hash would be at least 128 bits (MD5), and in hex you need 32
>> characters.
>
> But why use hex??? I think 6 bits per character is a better idea than
> 4, and quite workable. That's 22 characters.
>

Not only that, but is a 128 bit hash really necessary in the MsgID? A
64-bit hash would work very well (especially when coupled with file
size). That's only 11 characters.

I realize that msg-id bits are precious, but I don't see that precluding
the inclusion of a hash in the msg-id (not as the only reason, anyway).

-Mike

Kjetil Torgrim Homme

unread,

Sep 29, 2002, 7:10:13 AM9/29/02

to

[Klaas]:
> [Bill Cole]:
> > [Curt Welch]:

> > > A good hash would be at least 128 bits (MD5), and in hex you
> > > need 32 characters.
> >
> > But why use hex??? I think 6 bits per character is a better idea than
> > 4, and quite workable. That's 22 characters.
>
> Not only that, but is a 128 bit hash really necessary in the
> MsgID? A 64-bit hash would work very well (especially when
> coupled with file size). That's only 11 characters.

one collision for every 4 billion files (birthday paradox) seems
reasonable for this use, yeah. and let's go for BASE64 (which for
reference uses A..Z a..z 0..9 + /).

<offset'part-size'total-size'hash'id-...@id-right.nntpbin>

total-size ranges from 0 to 1 GiB. today's typical size is quite a
bit smaller since people rely on chopping into multiple RAR-files.
(30 bits => 5 six-bit characters)
offset is on average the same size as total-size, minus one bit (half
the parts will have an offset larger than total-size/2).
(29 bits => 5 six-bit characters)
part-size ranges from 0 to 1 MiB.
(20 bits => 4 six-bit characters)
hash is 32 bits.
(32 bits => 6 six-bit characters)
fixed characters: ' ' ' ' .nntpbin
(12 characters)

in total, 32 characters added to the original Message-ID.

we can reduce the hash to 30 bits, saving one octet, or increase it to
36 bit with no extra cost.

id-left can be reduced in size, though. 4 random characters,
conforming to dot-atom-text, should suffice -- that gives you 40
million possible values. some posters will want to add their own tag
to make it possible to score all articles easily, though, and that
should be allowed.

perhaps .nntp is a better RHS constant than .nntpbin. I feel .bin is
too likely to be used as a TLD in DNS proper.

> I realize that msg-id bits are precious, but I don't see that
> precluding the inclusion of a hash in the msg-id (not as the only
> reason, anyway).

the above shows that the hash is 7 octets out of 32 octets of
"overhead", which seems reasonable to me, but we have to keep in mind
the story about the straw that breaks your camel's back.

Dr.Ruud

unread,

Sep 29, 2002, 7:35:36 AM9/29/02

to

Bill Cole skribis:
> Curt Welch:

>> A good hash would be at least 128 bits (MD5), and in hex you
>> need 32 characters.

> But why use hex??? I think 6 bits per character is a better idea than
> 4, and quite workable. That's 22 characters.

If you allow uppercase, lowercase and numbers, then you have 62
differing characters. Throw in two more, like [$_], and you
have 64. To express 2^128-1 with BASE64 takes also 22 chars.
With 5 characters more [.-+()] you would need 21, etc.
Just use all (widely) supported characters in a Message-ID.

--
Affijn, Ruud

Curt Welch

unread,

Sep 29, 2002, 11:09:10 AM9/29/02

to

Bill Cole <bi...@scconsult.com> wrote:
> In article <20020928154857.814$t...@newsreader.com>,
> cu...@kcwc.com (Curt Welch) wrote:
>
> > A
> > good hash would be at least 128 bits (MD5), and in hex you need 32
> > characters.
>
> But why use hex??? I think 6 bits per character is a better idea than 4,
> and quite workable. That's 22 characters.

It was just an example to try and make my point look better. But I see
it didn't fool you. :) Of course, even at 6 per character you end up
with a very long Message-ID.

Curt Welch

unread,

Sep 29, 2002, 11:27:57 AM9/29/02

to

Kjetil Torgrim Homme <kjet...@haey.ifi.uio.no> wrote:
> [Curt Welch]:
> >
> > Kjetil Torgrim Homme <kjet...@ifi.uio.no> wrote:
> >
> > > my proposal:
> > >
> > > <offset'part-size'total-size'id-...@id-right.nntpbin>
> >
> > Yes, that's heading towards the right direction. We discussed
> > that at some point in the last year and I had created what I
> > thought was a workable Message-ID format.
>
> hmm, I thought I had followed that debate, but I can't recall your
> format. this was part of a suggestion of "misusing" References,
> wasn't it?

I forget, but we started to talk about it at about the same time
were were talking about putting fake IDs in Referecnes: as a way
to add more data into the headers that would show up in XOVER.

The actual format I came up with is not important. It's the same
concept as yours with the addition of the small hash. I think.
I'd have to to go find the article on google to be sure I'm not
forgetting something.

>
> > The other thing that would be helpful to put in is small hash of
> > the file contents. Not long enough to identify the file (that
> > would make the Message-Id just too long) but one which was long
> > enough to greatly narrow down the number of articles to search
> > when trying to match up parts, or when trying to spot a re-post a
> > file which you had missing segments for.
>
> hmm, a 16-bit CRC like in sum(1)?. of course, like sum(1), the
> total-size can be used for narrowing down the list of candidates as
> well.

If you have a large checksum in the header elsewhere (SHA-1, MD5 etc), then
the small checksum should just be N bits from the large checksum instead of
a completely different algorithm.

> > > if the partsize is (relatively) constant, it's easy to work out
> > > how many parts there are in total, and which part number this
> > > is. you only have a problem if the tail part is smaller than
> > > the others, and there is no reason it should.
> >
> > Well, it's an estimate at best because there are different ways to
> > split up a file. For example, a 31 byte file could be broken into
> > size of 5 and 4 bytes each like this:
> >
> > 5 5 5 4 4 4 4
> >
> > or 5 and 1 byte files like this:
> >
> > 5 5 5 5 5 5 1
>
> surely you're not suggesting a part size of 5 bytes? :-)

:)

> the last part N will never be more than N octets smaller than the
> first 1..N-1 parts. it is therefore trivial to calculate the part
> number.

Yes, you are right. But, with this new information in the Message-ID,
it's both possible and useful to be able to repost partial segements
of a file. For example, you post a 1MB file in 10 parts of 100K each, and
then decide to repost only part 8. But when you repost it, you post the
segment not as one part, but as 2 parts of 50K each (to help make sure the
guy that lost the first part gets this one). If you tried to recalculate
the "part numbers" of those two parts, the routh estimate would show
you that they were parts 15 and 16 of a 20 part post but the exact byte
numbers would not match up correctly as if you had done the orginal
post as 20 parts (though it might be very close).

If you were to repost the file segment which covered parts 2 and 3,
but this time repost the segment in 3 parts instead of 2, then the
reverse calculation of the "part number" wouldn't work at all.

It's best to migrate to a system where the sofware keeps tracks of
parts by using byte ranges and then it gets so complex that the
user can't deal with it (just like the user doesn't have to deal
with byte ranages in a TCP stream today).

> > The first way is better, but when looking at a single segment,
> > it's complex to know for sure how it's divided up because there
> > are other options than these.
>
> this algorithm should be part of the spec.

That wouldn't hurt.

>
> > More imporant is the stimple fact you don't give a shit about the
> > part count or the part number if you have offset and size
> > information.
>
> right. it's easier to calculate the percentage downloaded directly
> from the octet counts than to go via part numbers.

--

Howard Swinehart

unread,

Sep 29, 2002, 5:08:56 PM9/29/02

to

If reducing server bandwidth can increase reliablity, using external
bodies could help.

MIME articles may point to a URL (RFC2017) rather than contain the
data in the body. Upload the file to a web server. Then post an
article using the message/external-body content-type (RFC2046, section
5.2.3) with an access-type of URL, which points to the file location.
This could be done either in additon to or instead of posting the file
to Usenet.

Disadvantages: Requires changes to many binary downloaders and
posters. Requires the poster to have access to a web site. A popular
or very large file would hit web site bandwidth limits quickly.

Advantages: No changes to the server. HTTP downloads can stop/resume.
100% MIME compatible. Efficent because the file remains in raw binary
format.

--
How to fill your hard drive with movies, music and pictures while you
sleep.
http://www.binaryboy.com/

Curt Welch

unread,

Sep 29, 2002, 7:18:59 PM9/29/02

to

"Howard Swinehart" <how...@supernet.com> wrote:
> If reducing server bandwidth can increase reliablity, using external
> bodies could help.

We have talked a lot about the ideas of not shipping the binary file along
with the article. There are many variations on that theme that sound
interesting, but the problems seem to outweight the advantages.

First off, anyone that wants to do that can do it already. They just put
the file on a web site and post the URL. Yet, it's seldom done. People
use usenet when for some reason putting it on a web is not already the
better option. If putting it on a web site works, then they seldom care
about Usenet anyway. I think these ideas are all functionally equivilent
to the idea of "don't use usenet to distribute binaries".

If you put it on a web site, then whoever pays for the web site ends up
paying for the download bandwidth. If you put it on Usenet, then the file
is downloaded from the news server, and the people that pay for that news
server end up paying those download costs. Usenet in many ways is just one
huge distrubuted web site.

The other reson of course is the issue of taking responsibilty for content.
A lot of stuff gets posted to Usenet because the person posting the file
does not want to take responsibility for the content. That's not a good
thing to me, but it's a fact of life about why people use Usenet.

There have been ideas where the posting site stores the binary on a server
(web or nntp or other type of special server) and allows remote users to
access it from them. That way, the user doesn't have to have access to
their own web site. But that is silly because 1) it removes the
distrubuted cache function that makes Usenet work in the first place, and
makes the posting site pay for all the bandwidth. A single popular file
(even a legal one) could end up costing the posting site thousands of
dollars. No NSP is going to risk that. And if they pass the risk to the
user by making them pay for the bandwidth, no user would use that site to
post. It also makes the site more responsiblie for content hosted on their
site. Most NSPs are looking for ways to remove themself from the content,
not take more responsibilty for it.

Then there's the fetch through the path on demand ideas. Those should work
technically, if all the ISPs made sure their servers worked, but this gets
back to the problem that when you try to fetch a file, it's not just the
function of how good your local server is, but also a fuction of how well
multiple servers in the path are working at that moement. The odds of
being able to fetch a file though a network like that is even worse than
what it is today. At least today you know what you can and cannot get. In
the system where you distribute headers and fetch the binary though the
path, every server will end up looking "complete", but some percentage of
the files will never be fetchable. It doesn't seem like a system that
would be workable or if workable, would be any better than what we already
have today.

Juergen Helbing

unread,

Sep 30, 2002, 12:33:11 AM9/30/02

to

Jeremy <jer...@exit109.com> wrote:

>> If we succeed to solve the "completeness" problems, then we could
>> continue to leave the size of binaries where they are.
>> But if we are NOT able to fix this problem, then we should try
>> to avoid multiparts whenever it is possible.
>
>Interesting -- that sounds backwards to me.

Not really. It is more a "two way strategy".
And there is no good reason to do just one thing
and forget about the other one if both can reduce the problem.

>Thus, why not post parts of 4 megs or even larger?

You are an admin. You should tell me.

>> Meanwhile also a lot of pictures are posted as "high quality" JPEGs.
>> They are ALL splitted as multiparts (600kB...2000 kB).
>
>That's just a bad idea, if you ask me.

Unfortunately this not an "idea", it is the actual reality.
Users are helping themselves - because admins are
refusing to transport larger messages properly.

If you are interested in "really bad ideas":
There are people asking for video posts with 100 kByte segements.
Because their admins dont accept larger messages in _binary_
newsgroups.......
And - worse - there is always someone fulfilling such requests.

--
Juergen

Juergen Helbing

unread,

Sep 30, 2002, 12:34:27 AM9/30/02

to

Klaas <spam...@klaas.ca> wrote:

>A summary won't do you much good--if you really want to develop a robust
>format you'll have to read past discussions in their entirety.

Without any kind of pointer it will be a lot of work
digging through the ten thousand messages there since Jan.2002. :-(

--
Juergen

Juergen Helbing

unread,

Sep 30, 2002, 12:53:00 AM9/30/02

to

Kjetil Torgrim Homme <kjet...@ifi.uio.no> wrote:

>the transit server might not care (if you extend NNTP), but the reader
>server can leave out incomplete posts (I guess you'll object to that,
>since you are using multiple servers to make them complete), and the
>news administrator can get completeness statistics much more easily.

The next consequence of "leaving out incomplete posts" could only
be that users stop to label their posts as multiparts.

>> But you did not answer my question what must be _done_ to permit
>> 10 MB articles as an "easy fix".
>
>NNTP needs a mechanism for sending partial articles,

We are having this today - splitting messages.
It does not work.

>and for aborting transfer.

You mean "aborting incomplete transfers" ?

>or you can just use small articles (<1 MiB), the way you do
>today. what is the downside to using 1 MiB articles?

It results in incompleteness.

>> > But what about this. If binary articles had a hash of the entire
>> > file in the message-Id
>> This is not possible.
>why not?
>> Identifying binaries as identical before they are downloaded is a
>> bad idea.
>why?

Most binaries are unwanted - by someone.
There is _always_ at least one party which believes that
a post "does not belong here" - "is spam" - "is illegal"
- "is whatever.... but bad...."
Giving people the opportunity to have a "quick scan" of Usenet
would make the situation worse than ever.

When I wanted to include the CRC32 of yEncoded messages
into the subject line I was warned by a lot of people.

It is necessary to identify mutliparts to belong together,
not to identify their content. This is a completely different task.

>yes, something like that'd be nice. the pseudo top domain is good,
>perhaps we could have IANA reserve it officially.

Yes but this is not too important.

>I'd use a currently
>unusual character in the Message-ID, and specify the part size, the
>offset and the total size (all in octets).

>[...]

>my proposal:
> <offset'part-size'total-size'id-...@id-right.nntpbin>

You are "over-specifying" - and using "special characters"
will be subject to objections.

A soon as we are using ".bin" - or in your example: ".nntpbin"
we can rely on proper formating and could also use the 'dot'
as seperator.

>offset, part-size and total-size should be represented in hexadecimal
>(without any prefix like "0x") to make it easy to code and understand,
>while keeping it slightly more efficient spacewise than decimal. the
>sizes and offsets should refer to the _unencoded_ data. id-left and
>id-right are as defined in RFC 2822. nntpbin is a text constant.

I dont see the need for offset and partsize.
This belongs to the message / MIME-headers.

>if the partsize is (relatively) constant, it's easy to work out how
>many parts there are in total, and which part number this is. you
>only have a problem if the tail part is smaller than the others, and
>there is no reason it should.

You cannot "find out" anything. It is necessary to have the total
number of parts - and the number of this part specified directly.
"Finding out" results is disaster (I've learned this from yEnc).

--
Juergen

Juergen Helbing

unread,

Sep 30, 2002, 1:16:39 AM9/30/02

to

cu...@kcwc.com (Curt Welch) wrote:

>The other thing that would be helpful to put in is small hash of the
>file contents.

I dont see the need for a hash.
It makes things very complicated.

>Not long enough to identify the file (that would make
>the Message-Id just too long) but one
>which was long enough to greatly narrow down the number of articles
>to search when trying to match up parts, or when trying to spot
>a re-post a file which you had missing segments for.

My example was perhaps misunderstandable.
Here an example for a three part message:

<p1o3s4.abcd...@server.com.bin>
<p2o3s4.abcd...@server.com.bin>
<p3o3s4.abcd...@server.com.bin>

The msgid is _identical_ for all parts - execpt of
the part-counter. This way we dont need any hash.
We dont need any kind of identification
(which is generally unwanted).

As soon as a reader or server has seen one:
<p7o47s2...@server.bin> then it knows
that there must be also <p1 <p2 ... <p47

>A large hash would be included in the headers so for any article which
>looked like it might be the segment which was needed, the newsreader
>could download the full header and see if the file hash matched. The
>small hash in the Message-ID would greatly reduce the number of
>full HEADs that had to be done to identify articles.

In my example it would be possible to discard _all_ the headers
for the part 2-n. This would also reduce the amount of multipart
headers down to _one_ (the first one).

Newsreaders could find all the parts of a message by
seeking by msgid. And so could news-servers if they are
missing single parts: They could use additional "bypasses"
to find missing parts.

>More imporant is the stimple fact you don't give a shit about the part
>count or the part number if you have offset and size information. For
>backward compatibility, part counts can be used in the subject for
>newsreaders matching up articles by the subject, but for newsreaders that
>knew how to use the information in the Message-ID, they would ignore the
>subject.

I dont agreee that 'positioning' information belongs to the message id.

>If people match articles by hand, part counts are very useful, trying to
>comprend offset and size data would be a real bitch. But the point is that
>you want the computer to do this for you, and for the computer, offset and
>size is what makes it easy.

If I am understanding right what Kjell and you are discussing here
then you want to combine multiple solutions into the message id.
From the "simple and easy" aproach to identify multiparts as being
an entity you now also want to solve "the rest of the problems"....

This would result in an "over complexified" message-id - which cannot
be its purpose. The message id would be great to help transits, hosts
and readers to know that 150 parts belong together. But all the
"rest" should be done with other headers.

It seems that we are starting the same thing as before:
Every little change and improvement is instantly used to
become the "total solution" of all problems. This results in
very complex definitions.

Actually we are discussing _completeness_ here.
And we are finding a way to identify multiparts properly.
Mixing this topic with a lot of more "nice features" might be
possible but it would slow down the process - and leads into the
wrong direction.

The goal is to improve multipart distribution - and/or to filter out
(these) binaries perfectly. Attempts to make "fills" and "reposts"
easier cannot work because it would take years to change user
behavior and newsreader functionality.

I believe that we should concentrate on (and solve) incompleteness.
Then all the other "nice trys" are even obsolete.

--
Juergen

Juergen Helbing

unread,

Sep 30, 2002, 1:27:00 AM9/30/02

to

Klaas <spam...@klaas.ca> wrote:

>You should have a key on the LHS that indicates it is a binary in
>addition to the pseudo-domain.

The ".bin" extension indicates that this _is_ a binary.
No text messages must use this.
And if it uses the BIN extension then it will be dropped
automatically - because it is malformatted.

>Also, you can fit a lot more in there if
>you use a better encoding scheme, like base64. Stuff like the actual
>size of the file in bytes.

The size of the file is included in the message headers or bodies.

>Just having MiB is useless for error checking.

Error checking is done with the full message.
There all necessary information is available.

The MBytes information is for _filtering_ purposes.
This is my 'candy' for the transit-admins: Those who
dont want large binaries can filter them out based in Megs now.
Any transit which has specific options can easily filter
on that information.

>Also, why bother with parts when you can encode the byte
>offset? This will enable programs to combine parts from different
>posters into one file.

That feature is already available. It is included into yEnc.
And I dont want to expand the msg-id more than necessary.

We _need_ the information about the total size - and about
the parts (for completeness). Nothing else.

>> Any "thinning out" other than based on newsgroup/hierarchy is sick.
>
>What about misplaced binaries on text servers (on any server, actually)?
>How about if the algorithm drops entire files instead of parts?

Of course all text servers would reject all <xxxx.bin> messages.
In fact BIN messages would be permitted only on the alt.bin hierarchy.
(And perhaps the alt.mag. hierarchy :-)
This _is_ fitering by group/hierarchy.

--
Juergen

Juergen Helbing

unread,

Sep 30, 2002, 2:07:08 AM9/30/02

to

cu...@kcwc.com (Curt Welch) wrote:

>> But I can offer you this:
>> <p17o580s5...@server.bin>
>> Part 17 of 580 size is 587 MB.
>
>I really think it would be wrong to use part numbers like that.

This would finally replace the odd and unreliable method to use
part-numbers in the subject (which has been criticized since its
beginning). It solves _your_ problem by easy identification
of multiparts which belong together. And it offers the urgently
required information about the full size for transit filtering.

>You need to describe file segments by byte range, not by part numbers.

Your transit will have a lot of fun calculating all this shit.
And the size of the message id is inflated without _any_ good reason.

>It's fine to continue to use part numbers in the subject to make multipart
>posts compatible with current usage (and easy to understand),

No - it is not fine.

>but
>when you encode information in the headers for the software to use,
>byte ranges give you so much more power to do advanced things like
>reposting a single segement in multiple smaller parts, or to take
>two different posts of the same file (done with different line counts),
>and let the software correctly merge all the pieces back together.

I dont need this - because yEnc offers all this already.
Jeremy's proposals would place all this into the MIME headers.

If you want a few hundred bytes which describe a file than
you can easily place them into any header - or into a (0/n) message.

>> Any "thinning out" other than based on newsgroup/hierarchy is sick.
>
>That's true. From a user perspective that's easy to say. Try running
>your own server which supports thousands of users and then try to pick
>which groups or hierarchys to delete. And try maintain a set of peers,
>who provide you feeds only out of the goodness of their heart, where every
>few weeks you have to tell them to change the list because volume has once
>again grown too large. News servers that can't get everything, but want to
>get as much as they can, need a dynamic system to keep their pipes filled
>to whatever level they can afford. Thining by hierarch is totaly
>unworkable for that. You might block some larger hierarchys to get your
>feed down to the range of what you want, but then you need a dynamic system
>to get as much as you can within your bandwidth/cost limits.

You want to solve the fundamental problems of IHAVE (the lack of
management tools and automatization of feed-selection) this way?

How do you do this with "hashes and byte-ranges" ?
I dont see any advantages for your problems.

We should not mix up things again.

>> >My servers have hard-coded 4MB article size limits.
>> Alzheimer !
>So does INN and cyclone and typhoon, and most servers (i.e., they have
>hard coded article size limits).

There seem to be thousands of hosts running with lower limits.
The distribution of articles with 15000 UU-lines is very poor
- and the "1-4 meg" articles can be found only on a few commercial
NSP hosts.

>My server caches articles in memory. On 10MB article sitting in the memory
>cache takes up space that could have been used to hold over 2000 text
>articles. If instead it comes in as 100 100K posts, then they can be
>expired from the cache as they are processed. Most servers are written to
>assume an entire article can not only fit in memory at once, but that you
>can hold a lot of these articles in memory at the same time.

I understand this very well. But please answer me this question:

As soon as one 10 Meg article came in and is completely in your
memory then _all_ your outgoing feeds will now try to send this
article out. If they all succeed then the 10 Meg article is also
instantly removed from the cache. It is not necessary to have
1000 of these 10 Meg articles in memory. The size of your cache
depends on the time lag between the incoming stream and the
"accepted" message from your outgoing feedee.
And I am pretty sure that one 10 meg article would be even
_faster_ transferred than 100x100k articles because the feedee
has less work with it.

The only reason for "larger caches" would be that a feedee fails.
But in this case you are filling up the cache with exactly the
same speed with 100x100 articles than with 10.000 articles.

And again: You are having a 'lantency' time for 100 articles
which is in summary far longer than for one 10 Meg article.

Those people who are working on "transport theory" are
always preferring "bulks as large as possible" over "small".
They can even prove that the medium transfer capability
is better.

And - btw - the difference between your actual "4 Meg" limit
and my desired "10 Meg" limit is just 2.5

>Processing "unlimited" size messages would be impossible with
>my design and I think that's probably true of all systems.

I DONT want to transfer 500 Meg articles.
I want to reduce the amount of splitted messages for
_pictures_ and _mp3_ !

>Increasing the size of posts will not "fix" anything.

You dont believe in my statement:
"If multiparts are a problem then avoid multiparts ?"

LOL !

>If we instead
>creating a cleaner system for chopping large files into lots of little
>parts, things could be made to work much better in the long run.

Yes - I agree. Having perfect distribution for multiparts
would also solve the problem for the 500k-10.000k files.

But I am afraid that we are _far_ away from a solution.
Perhaps we could increase the actual limit of 500 k posts
to 3999 k. Then at least the splitting of pictures would end.

I am just afraid that 'Cidera' is technically not able to
transfer these messages - am I right ?

>For example, imagine if we got to the point where news server understood
>that they were moving large files, and not just text articles. When you
>download headers, the server could list the file parts separately from the
>"real" text articles. It could combine parts together and simply list byte
>ranges available for download for each file, and if all the parts were
>available, it would show just a single header-like entry for the file. So
>instead of downloading 100,000 article headers and having the news client
>guess at what was what, the news server would give the newsreader one list
>of all the file segments, and a different (old style XOVER-like list of the
>non-binary posts to the group).
>
>Or, maybe if the data was encoded in the corret format, the news server
>could actually combine multipart posts together on the fly, and make them
>look as if they were single huge articles.

Nobody prevents news-servers from doing this already today.
On the MyNews network you can already see subjects as:
- filename.avi (all/590) :-)

And - by the way - I dont see any progress if a news-server tells
me that the range 50.000.000 -> 51.000.000 is NOT available.

>There is just so many things that could work so much better if the news
>servers knew they were transporting files and were able to deal with files
>at that level instead of what we do know of hiding binary files inside of
>text articles and forcing the news client to do all the work of finding all
>the file parts and putting them back together again.

Yes - I agree.
But the big question is how to make this happen _now_ and _easily_.
I dont see any chance that reader-servers are changed to treat
"files" differently within the next five years.
But "my way" to label multiparts simply in the message id
offers everything you need.
I am already downloading only message-ids from 30 news-servers.
The full XOVER line is only downloaded for those headers which
I dont already have.
As soon as I KNOW that <p4o567s56......> is just part 4 of the
message <p1o567s56....> then I will save the lot of header
download. And all this can be done without the slightest
modification to news-servers. Changing three autoposters
and a few newsreader/binary grabbers would already have
enormous effect.

>And though all the details have not been worked out, it seems to me that we
>can transition from the current system to this new "binary file support"
>system in ways that keeps things backward compatible to make the transition
>easier.

Yes - all our ideas have the benfit that they are "backward compatible".

CU
--
Juergen

Juergen Helbing

unread,

Sep 30, 2002, 2:17:44 AM9/30/02

to

Bas Ruiter <<lord...@home.nl>> wrote:

>From a users point-of-view, being able to positively identify a file
>BEFORE downloading any part of it is HIGHLY desireable!

A binary hash or CRC code does not help a bit.
It would only prevent you from downloading an identical
file twice.
Today pictures are "manipulated" to make user to download
them again. There are faked filenames - and it would be
even very easy to "fool" an MD5 by changing just one bit.

One of my servers is 'identifying' binaries already perfectly
- and creates such "checksums" - but this does not help.

You can believe me: I am working with "binary identification"
for three years now - and my actual "file database" for just one
group is larger than 150 Megs (filename+size+crc). There are
"false duplicates" all the time - and there are "false new triggers"
all the time.

All this stuff is just a part of the problem - and a whole bunch of
problems follows.

You know perhaps that "Morpheus", "Kazza" and "eDonkey"
already permit "ahead identification" - by title, artist, album,
even by CRC. This does not help any way. In contrary.
People started to "fool" these systems long ago.

One of the _biggest_ advantages of Usenet is that people are
sending _comments_ to binaries. Only INFO files and User
comments are helpful. There is no "automated" solution.

CU
--
Juergen

Juergen Helbing

unread,

Sep 30, 2002, 2:22:25 AM9/30/02

to

"Brian Truitt" <btr...@rcn.com> wrote:

>> It would be nice to know how ISPs can find out how much traffic
>> is caused by NNTP. Of course the costs of maintaing a "good"
>> newsserver must stay comparable to bandwidth and outsourcing
>> costs.
>> However I have no idea about details.....
>
>We track NNTP traffic via MRTG, Cricket, or other various packages that pull
>info from the switch via SNMP. All our news equipment is on a dedicated
>switch, so it's separate from any other traffic we'd generate as an ISP.
>[...]

Thanks for this helpful information.
It seems that you are able to find out the NNTP-traffic on your
own server - and the amount of data your users are fetching
from it.

But the _real_ good question would be:

How much NNTP traffic does your users create to _external_
news-servers ? Do you also have any chance to count this volume ?

An ISP who is offering a "text-only" news-server might have no idea
how many terabytes are fetched by his users from
Super-Giga-Hosting-Reader news....

TIA
--
Juergen

Juergen Helbing

unread,

Sep 30, 2002, 2:26:36 AM9/30/02

to

"ZoSo" <zo...@voyager.net> wrote:

>So, did you stop to think maybe Binary Usenet *should* be replaced by =
>P2P networks?

In the past most people agreed to the statement that the end of
the binary newsgroups would be also the end of the text newsgroups.

You could try to forbid binary attachments to eMail.
The eMail volume would be really reduced.
Then let's see how long eMail would survive :-))

--
Juergen

Juergen Helbing

unread,

Sep 30, 2002, 3:10:17 AM9/30/02

to

cu...@kcwc.com (Curt Welch) wrote:

>I could easilly be wrong about the other applications however because I
>don't know much about what is going on with those.

All people which are my "friend-family-company" are _heavy_ downloaders
on these "geek" apps. Not a single one of more than 100 people
is using Usenet.

>You can bet that a lot more data is moved by HTTP than all of these other
>applications combined however. But that's not the point.

Those 80 million people using Kazaa are downloading far more by
P2P than by HTTP.

>> It is a question of _your_ survival as an NSP to offer solutions.
>
>To be honest, I'd rather save Usenet than save my own business. And if
>saving Usenet means getting rid of the binary files, and going back to a
>model where the ISPs all run their own small Usenet text servers, that's
>fine by me. But to be honest, I don't see that happening.

The ISPs are offering already today only TEXT servers.
The "Binary Party" is already done mainly by pay servers.

>Posting large articles won't work because we can't move then effeciently,
>but breaking them up into 10,000 parts and forcing newsreaders to download
>millions of headers and try to match them all up is getting completely out
>of hand.

We are working on this issue.
For me we are very close to a solution.

>To keep it working, we need to build the transport of "files" into Usenet.

>[...]
>Each file being transported needs to be given a unique [...]
>For example, when you post a file to Usenet, you should upload one [...]
>The server could then chop the 10GB file article into small pieces and [...]
>In the headers, the post would show up as a single entry, [...]

I remember very well the time when I wanted to introduce "full binary
transport of files" by Usenet. As you know I have completely given up.
It was possible to introduce yEnc instead - because it does not affect
the server/transit operation at all. The resistance of the users is already
hard to crack - but the resistance of the Usenet admins (plus their limited
ressources) is an obstacle we cannot surpass.

I _agree_ that it would be really nice to have "News+FTP" combined.
But from the experience I am now having with changes on Usenet
I am afraid that it would _never_ happen.

Am I too realistic ?
Are there any reasons why you believe that a complete redesign
of Binary Usenet could ever work ?

>> If we cannot solve the problems - with users paying for access -
>> then you'll face the same problem as the media industry:
>> Why paying for "bad service" when free service is even better...
>
>It could be that Usenet just isn't the answer, in which case we would be
>doing everyone a dis-service by encourging them to stick with the wrong
>technology long after better solutions had been created.

But we all agree that Usenet is the most efficient way to broadcast
huge amounts of data around the world, don't we ?

Why should be leave this track if it is possible to solve the few problems ?

>For example, before the web, we used Usenet a lot to try and distribute

>static data (in the forms of FAQs mostly). [...]

>But then the web was invented. And it works so so much better for that
>type of thing than usenet ever did that it would be stupid to try to
>force people to keep using Usenet for that when you have the web.

Meanwhile Usenet come to a point where web-links are counted
as spam-traps.......

>If a better way to distribute files (which maintains the other advantages
>of Usenet) comes along, it would be stupid to play games to try and make
>people stay with a "broken" technology just because we wanted them to.

The "better" way to distribute files by P2P (which is today better in
terms of redundancy) causes traffic aroud the world. People are sending
500 Meg CDs from Australia to Iceland....... and back to South Africa...

>Right. My point is that I think you are barking up the wrong tree in
>your attempt to identify the cause and the solution for those
>problems. i.e, we don't need an approach closer to TCP/IP. We already
>have one that's better than that in terms of it's "dynamic routing
>ability" which you talked about.

Are you so sure that the "dynamic routing" does not apply to our problem?
If I know the URL of a web-page then I _will_ get the page - unless the
delivering server is down. Dynamic routing _will_ find a way (even if
half the Internet is down) to deliver me the requested page.

Imagine this: If I am having the URL (msg-id) of a Usenet message
then I my application _will_ find a server which has it -and will deliver
it to me - even if half of the Usenet is down.
If I want to post 500 messages (one cd) to Usenet then my application
_will_ find a path - even if my own ISP - and all his upstreams - are down.

>> Should we stop discussion ?
>
>No. First, just because I think something doesn't make it a fact. That's
>why we talk about it.

<vbg>

>We need to understand why
>usenet servers are constantly mis-managed and try to understand what change
>we could make to improve that (either reduce management cost or find a way
>to motivate people to manage their servers correctly).

Any ideas from your side ?
One way I've already seen is that ISPs are closing their news-servers
and are offering additional accounts on their 'new binary server'.

>Usenet is not a viable "tool" for sharing information.

>[...]

>Usenet is mostly just a geek entertainment system that far too many people
>try to pretend as some type of "real" value to society.

Sorry, but I dont agree with you.

CU
--
Juergen

Juergen Helbing

unread,

Sep 30, 2002, 3:10:54 AM9/30/02

to

drec...@yuck.net (Mike Horwath) wrote:

>: And if we dont act in the next 12-24 months then these "geek apps"

>: will replace Binary Usenet quickly.
>

>Why is this bad?
>Just playing devil's advocate.

Because the end of the Binary Usenet would also be the end of
Text Usenet.

--
Juergen

Juergen Helbing

unread,

Sep 30, 2002, 3:21:20 AM9/30/02

to

Bill Cole <bi...@scconsult.com> wrote:

>Cool. How can we speed this? Would injecting noise into discussion of
>fixes help?

No - we are used to noise - and we are able to stay on track.

>Usenet would be a better place if the binaries went away.

At first it would be a dead place.
Remove all the people who are on Usenet for binaries PLUS text
and you will drop below the critical mass.

>The protocols
>for transport and the logical model of the data set are absolutely
>absurd as a mechanism for distributing and storing anything other than
>conversations.

Yes, it would be high time to find a better way to transport
these few text-message-bytes. They are only disturbing
the binary flow :->>>

>The death of binary news and the ensuing collapse of the
>binary-focused NSP's would be a Very Good Thing for Usenet. An event
>worth speeding and assisting, even if it costs the employment of some
>fine people like Curt and Jeremy and Andrew.

Binary Usenet is one of the very few Internet services which is successful
in terms of payment. And the only reason why text usenet is still existing
is that there are people who are paying for the binaries.

Some unconventional answers ?
There is no reason that your approach is any better than mine.
But I am offering at least reasonable arguments rather than fatalism.

--
Juergen

Juergen Helbing

unread,

Sep 30, 2002, 3:27:13 AM9/30/02

to

"Howard Swinehart" <how...@supernet.com> wrote:

>If reducing server bandwidth can increase reliablity, using external
>bodies could help.

Yes.

>MIME articles may point to a URL (RFC2017) rather than contain the
>data in the body. Upload the file to a web server. Then post an
>article using the message/external-body content-type (RFC2046, section
>5.2.3) with an access-type of URL, which points to the file location.
>This could be done either in additon to or instead of posting the file
>to Usenet.

Yes, having write access to: ftp.giganews.com only as a customer
but read access for eveybody would be really great.

Are you running a server where we could try this ?

>Disadvantages: Requires changes to many binary downloaders and
>posters.

No - most binary downloaders are able to display URLs and launch
them with a single click.

>Requires the poster to have access to a web site.

No, the NSPs could offer FTP servers.

>A popular
>or very large file would hit web site bandwidth limits quickly.

If the NSPs would offer cascading FTP proxies then this could
be solved easily.

>Advantages: No changes to the server. HTTP downloads can stop/resume.
>100% MIME compatible. Efficent because the file remains in raw binary
>format.

Yes - this would be all very good.

And now the question: Which NSP will offer the servers ?
How hard would it be to create a system of cascading FTP proxies ?

CU
--
Juergen

Kjetil Torgrim Homme

unread,

Sep 30, 2002, 6:35:18 AM9/30/02

to

[Juergen Helbing]:

>
> Kjetil Torgrim Homme <kjet...@ifi.uio.no> wrote:
>
> >the transit server might not care (if you extend NNTP), but the reader
> >server can leave out incomplete posts (I guess you'll object to that,
> >since you are using multiple servers to make them complete), and the
> >news administrator can get completeness statistics much more easily.
>
> The next consequence of "leaving out incomplete posts" could only
> be that users stop to label their posts as multiparts.

and how would reading tools cope with that? surely no one is going to
piece together hundred posts manually.

> >NNTP needs a mechanism for sending partial articles,
>
> We are having this today - splitting messages. It does not work.

it does work.

> >and for aborting transfer.
>
> You mean "aborting incomplete transfers" ?

yes. for large articles, you just can't wait until it has been
transferred completely (see Curt's calculation of propagation time).

A -> B -> C

so, what happens if the originating server A crashes? today, there is
only one possibility for B. it cannot end the article with . CR LF,
since that would destroy the article. it must drop the connection to
C, and clear the history entry for the article in question, hoping
that someone else will offer it. if that was a 50 MiB file, a lot of
bandwidth may have been wasted. so NNTP needs a mechanism to abort a
command. but this is _very_ hard to get deployed globally.

the alternative, with small articles, just works.

> >or you can just use small articles (<1 MiB), the way you do
> >today. what is the downside to using 1 MiB articles?
>
> It results in incompleteness.

large files will not help, unless you do away with multiparts
completely.

it leads to incompleteness only when the system is overloaded. what
we need is a throttle and priorities in the transit software, so that
it can prioritise files which are almost complete. see below.

> Most binaries are unwanted - by someone.
> There is _always_ at least one party which believes that
> a post "does not belong here" - "is spam" - "is illegal"
> - "is whatever.... but bad...."
> Giving people the opportunity to have a "quick scan" of Usenet
> would make the situation worse than ever.

sorry, but protecting pirates is not on my agenda.

> >yes, something like that'd be nice. the pseudo top domain is good,
> >perhaps we could have IANA reserve it officially.
>
> Yes but this is not too important.

so what do you do when .bin is allocated to something else?

> >my proposal:
> > <offset'part-size'total-size'id-...@id-right.nntpbin>
>
> You are "over-specifying" - and using "special characters"
> will be subject to objections.

period is actually a more "special character" than apostrophe, if you
look at the grammar.

> A soon as we are using ".bin" - or in your example: ".nntpbin"
> we can rely on proper formating and could also use the 'dot'
> as seperator.

are there any advantages to using a period?

> I dont see the need for offset and partsize.

how do you suggest transit servers can implement priorities without
offset and partsize? consider:

server receives a message, which it will send it to its peers. one
of its peers is backlogged. the server knows how many octets he has
sent to it of the stream identified by the tuple (total-size id-left
id-right). the completion ratio determines what gets sent first.
when that is done, start sending messages which are complete on the
servers spool. when that is done, send the newest message.

offset and partsize is required information to implement something
like this. the cost is low, you only need a small additional history
database. (notice that you don't need to keep track of which
intervals have arrived or been sent, since that is handled by the
normal history database.)

> This belongs to the message / MIME-headers.

useless for transit.

> >if the partsize is (relatively) constant, it's easy to work out how
> >many parts there are in total, and which part number this is. you
> >only have a problem if the tail part is smaller than the others, and
> >there is no reason it should.
>
> You cannot "find out" anything. It is necessary to have the total
> number of parts - and the number of this part specified directly.
> "Finding out" results is disaster (I've learned this from yEnc).

parts are irrelevant. the same file can divided any number of ways,
and an offset,size based news reader can piece them together
regardless of part numbers in the repost.

Kjetil Torgrim Homme

unread,

Sep 30, 2002, 6:39:34 AM9/30/02

to

[Juergen Helbing]:

>
> The ".bin" extension indicates that this _is_ a binary. No text
> messages must use this. And if it uses the BIN extension then it
> will be dropped automatically - because it is malformatted.

bzzzt. wrong, can't be done, it would be breaking the RFCs. articles
with Message-ID which can't be parsed must be processed as a normal
article.

Kjetil Torgrim Homme

unread,

Sep 30, 2002, 6:40:10 AM9/30/02

to

[Juergen Helbing]:

>
> It seems that we are starting the same thing as before: Every
> little change and improvement is instantly used to become the
> "total solution" of all problems. This results in very complex
> definitions.

this makes me very sad. you didn't learn ANYTHING from the yEnc
disaster, did you?

Kjetil Torgrim Homme

unread,

Sep 30, 2002, 6:48:29 AM9/30/02

to

[Juergen Helbing]:

>
> Because the end of the Binary Usenet would also be the end of
> Text Usenet.

the people using text groups and the people using binary groups are to
a large extent distinct. I don't think death of binary Usenet would
impact text usenet much at all. have you looked at the Top 1000? it
is IMO dominated by academic servers and servers offering text Usenet.

Klaas

unread,

Sep 30, 2002, 10:13:07 AM9/30/02

to

After careful consideration, Juergen Helbing muttered:

I would give you one if I had access to an archive.

Try searching for this link, which I mentioned in the discussion:

http://usenet.klaas.ca/hash.jpg

-Mike

Klaas

unread,

Sep 30, 2002, 10:17:34 AM9/30/02

to

After careful consideration, Juergen Helbing muttered:

> Klaas <spam...@klaas.ca> wrote:

Here a Msg-ID: <a7gf9o$l3a4n$1...@ID-26705.news.dfncis.de>

Google unfortunately does not have the complete thread since only bits of
it were crossposted here.

-Mike

Andrew - Supernews

unread,

Sep 30, 2002, 10:54:21 AM9/30/02

to

In article <39167....@archiver.winews.net>, Juergen Helbing wrote:
> My example was perhaps misunderstandable.
> Here an example for a three part message:
>
><p1o3s4.abcd...@server.com.bin>
><p2o3s4.abcd...@server.com.bin>
><p3o3s4.abcd...@server.com.bin>
>
> The msgid is _identical_ for all parts - execpt of
> the part-counter. This way we dont need any hash.

and you lay yourself completely open to ID prediction attacks; anyone
who wants to interfere with propagation of the file simply needs to
preemptively post an article matching one of the later part IDs.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

Curt Welch

unread,

Sep 30, 2002, 11:47:38 AM9/30/02

to

arch...@i3w.com (Juergen Helbing) wrote:
> cu...@kcwc.com (Curt Welch) wrote:
>
> >The other thing that would be helpful to put in is small hash of the
> >file contents.
>
> I dont see the need for a hash.
> It makes things very complicated.

You don't see the need or you don't believe in the need?

The value of the hash is many fold. One BIG end-user feature is that
download tools could do something they have _NEVER_ been able to do in the
past. They can take advantage of reposts. If you download a 4000 part
file, but you are missing 20 parts, your download tool could continue to
scan usenet for those missing parts. If someone were to repost that 60
minute video again 2 months later, but this time in 5000 parts, and with a
different file name, your download tool could still spot it as being the
same file, and find the byte ranges it was missing from the first post, and
complete the missing segments for you - all automatcially without the user
having to do anything.

For example, if you saw a file you really wanted, but it was only one part
out of 50, (say the other parts already expired and you missed it). You
could still mark it for download, and just let it sit in your list of
partial downloads. Then, if the same file was reposted weeks later, your
tool would find all the parts and download them for you withoug you haveing
to scan thousands of headers everyday looking for something good to
download. You could easilly just set up the tool to scan multiple servers
every hour looking for your missing file segments.

You don't see that as something users would like?

I see it as something they couldn't live without once they learned it's
true value.

But, before the users could learn the glory of this, someone would have to
take the time to write this new type of download tool. It's (probably) not
a simply addition to any of the current tools becaus it works in a
completly different way than the other tools. It has to maintain a
database of incomplete downloads and constantly scan a set of groups for
new posts that help to complete the download. It needs a UI which shows the
list of files waiting to be completed, and it needs a UI to show the list
of files which could be downloaded to help complete the partial downloads,
etc. It's a lot of work to create this tool but I'm sure the users would
love it, and if it was available for free, worked well, and supported all
the other expected advanced features (multiple servers, download
priorities, dealing with various file types and encoding types, etc), then
it would take over Usenet in no time.

Though, there is the other side which downloaders might not like. It might
also make it very easy for copyright holders to spot re-posts of their work
and automatcially cancel it or collect evidence for a court case etc. That
is a good thing, but it might be something that would prevent the idea from
being accepted on Usenet. If that turns out to be the case, then it makes
me just want to not help anyone do anything with binaries on Usenet.

Kjetil Torgrim Homme

unread,

Sep 30, 2002, 11:47:47 AM9/30/02

to

[Andrew - Supernews]:

>
> In article <39167....@archiver.winews.net>, Juergen Helbing wrote:
> > The msgid is _identical_ for all parts - execpt of
> > the part-counter. This way we dont need any hash.
>
> and you lay yourself completely open to ID prediction attacks; anyone
> who wants to interfere with propagation of the file simply needs to
> preemptively post an article matching one of the later part IDs.

very good point. to group the articles, we need a constant element
common to all the parts of a file. a hash could play that role.

Kjetil Torgrim Homme

unread,

Sep 30, 2002, 12:07:14 PM9/30/02

to

(aha, so that's why your contributions were unavailable on Google,
you're using X-No-Archive. to honour your choice, I will not quote
from your post.)

I agree with your suggestions, ie. to allow tagging the information
for extensibility and to leave RHS alone. nonetheless, taking a leaf
off XOVER, it may be a good idea to hardcode a standard selection for
efficiency. that is, the first three positions are always offset,
segment size and total size.

<nntpbin'offset'segment'total[more tags]''random@server>

tags would be the way you suggest:

'MXXXXXXXXXXXXXXXXXXXXXX
MD5-128 hash, BASE64 coded.

'mXXXXXX
MD5-128 hash, BASE64 coded, 36 LSB.

'iXXX
random ID which is common among parts (anything goes, but can
not contain ') this should help the pirates a little...

I still think it's a good idea to use an uncommon character as
separator to reduce the number of false matches. as I showed, the
dollar sign is among the more common characters used in Message-IDs.

Juergen Helbing

unread,

Sep 30, 2002, 10:13:31 AM9/30/02

to

Andrew - Supernews <andrew...@supernews.com> wrote:

>> The msgid is _identical_ for all parts - execpt of
>> the part-counter. This way we dont need any hash.
>
>and you lay yourself completely open to ID prediction attacks; anyone
>who wants to interfere with propagation of the file simply needs to
>preemptively post an article matching one of the later part IDs.

Now I am really curious how you want to prevent such "attacks"
by using a hash and/or offset/size. ;-))
It would be real fun for _everybody_ to trick out the complex
newsreader (and transit) routines which are desperately trying
to get all byte-ranges together.... And this would not even be msg-id
forging because everybody could easily use a different TLD.

It seems that we are close to sending all MIME headers as a
message id now - so perhaps we could even add PGP for security :->>

But you are raising a valid question:
As soon as we send out "additional" information (in which part
of a message ever) it might be subject to attacks.

Today this is extremely easy - because sending out messages
with an identical subject but confusing part identifiers can be
done by everybody who is able to type. And even this kind
of attack was never recognized by myself. Having a program
which is intercepting msg-ids - and creates automatically
'confusing' messages might be possible

- but I dont believe that this would be a realistic scenario.

Do you ?

--
Juergen

Juergen Helbing

unread,

Sep 30, 2002, 10:14:33 AM9/30/02

to

Klaas <spam...@klaas.ca> wrote:

>Try searching for this link, which I mentioned in the discussion:
>http://usenet.klaas.ca/hash.jpg

That graph activated some unused parts of my brain.
Thanks a lot.

--
Juergen

Juergen Helbing

unread,

Sep 30, 2002, 10:23:49 AM9/30/02

to

Kjetil Torgrim Homme <kjet...@haey.ifi.uio.no> wrote:

>> It seems that we are starting the same thing as before: Every
>> little change and improvement is instantly used to become the
>> "total solution" of all problems. This results in very complex
>> definitions.
>
>this makes me very sad. you didn't learn ANYTHING from the yEnc
>disaster, did you?

If you believe that you can solve the case with complexity then
feel free to continue. I'm watching carefully what's going on.

btw.: Already the computation of a CRC32 is a very complex task.

--
Juergen

Juergen Helbing

unread,

Sep 30, 2002, 11:17:42 AM9/30/02

to

Kjetil Torgrim Homme <kjet...@haey.ifi.uio.no> wrote:

>> The next consequence of "leaving out incomplete posts" could only
>> be that users stop to label their posts as multiparts.
>
>and how would reading tools cope with that? surely no one is going to
>piece together hundred posts manually.

Usenet tends to find its own ways to bypass administrative limits.

>> >NNTP needs a mechanism for sending partial articles,
>> We are having this today - splitting messages. It does not work.
>it does work.

Nice to know. Today I did miss two parts out of 2600.
They did not appear on any of my accessible 30 Usenet servers.

>[...]

>so NNTP needs a mechanism to abort a
>command. but this is _very_ hard to get deployed globally.

OK, then let's drop changes to transits.

>the alternative, with small articles, just works.

I recommend that you visit alt.binaries.startrek.
The start of the new "Enterprise" episodes might be
a very good example how "small articles just work".

Differently from "usual Usenet noise" there are some
binaries which the users _really_ want to have.
And not just users from the few dozen HQ NSPs.

>large files will not help, unless you do away with multiparts
>completely.

OK - I'm going to bury the idea of raising the posting limits
to avoid multipart problems.

>> >yes, something like that'd be nice. the pseudo top domain is good,
>> >perhaps we could have IANA reserve it officially.
>> Yes but this is not too important.
>so what do you do when .bin is allocated to something else?

OK - let's use: .b or .$
The finally used combination is not important - I dont insist on BIN
It seems to be even better to use something which cannot
be assigned at all.

>are there any advantages to using a period?

They are looking very similar with all fonts.

>> I dont see the need for offset and partsize.
>
>how do you suggest transit servers can implement priorities without
>offset and partsize?

If you want priorities at all then a transit could also identify
'important' messages by a simple serial number (which my
own 'partnumber' could be):

>offset and partsize is required information to implement something
>like this.

The "offset" is the 1-4 byte part-number. And the partsize - well
a transit should know how large the messages are.
You want to add a lot of information (and size) to the message
id - and I beleive that simpler methods - with less bytes -
could offer the same amount of required information.

I personally dont want to blow up the msg-id by more bytes
than absolutely necessary. And - btw - every transit or
news-server _always_ has the full message - including
all headers - available for all kind of "nice to have".

>> This belongs to the message / MIME-headers.
>useless for transit.

I could understand that offset/size would be nice for
those parties who want to join - or combine - or manage.
But all those applications will always need the FULL message.
(And I know the importance of offset/size. They are a part of yEnc!)

The check for "completeness" - and even more does not need it
(in the msg-id).

I am really not a specialist for high performance transit servers.
So please find out the necessary things with Curt and those
who can discuss it on the same level.
But I for myself am not convinced.

>parts are irrelevant. the same file can divided any number of ways,
>and an offset,size based news reader can piece them together
>regardless of part numbers in the repost.

Parts are relevant for the first distribution of a binary.
And reposts (even smaller ones) are - and will always be - made
based on parts - at best (today people are reposting RAR segments).

Sorry, I cannot agree with you.
Including reposts in variant sizes in the actual debate of completeness
is far away for me personally.
A future "Full Binary Usenet" might use such approaches.
But I see no way to it.

--
Juergen

Juergen Helbing

unread,

Sep 30, 2002, 11:21:31 AM9/30/02

to

Kjetil Torgrim Homme <kjet...@haey.ifi.uio.no> wrote:

>> The ".bin" extension indicates that this _is_ a binary. No text
>> messages must use this. And if it uses the BIN extension then it
>> will be dropped automatically - because it is malformatted.
>
>bzzzt. wrong, can't be done, it would be breaking the RFCs. articles
>with Message-ID which can't be parsed must be processed as a normal
>article.

There must a strange mechanism which bypasses this RFC.
There are people who dont process binaries in text groups.

I believe the same strange mechanism would permit us
to prevent text entering Usenet as binaries.

--
Juergen

Andrew - Supernews

unread,

Sep 30, 2002, 2:14:48 PM9/30/02

to

In article <3113g....@archiver.winews.net>, Juergen Helbing wrote:
> Andrew - Supernews <andrew...@supernews.com> wrote:
>>and you lay yourself completely open to ID prediction attacks; anyone
>>who wants to interfere with propagation of the file simply needs to
>>preemptively post an article matching one of the later part IDs.
>
> Now I am really curious how you want to prevent such "attacks"
> by using a hash and/or offset/size. ;-))

you can't. The only way to avoid the attack is to use unpredictable IDs.
Each message-id that you generate should have enough random data in it
(16 or so bits of entropy is enough) to make the attack impractical.

That means, unfortunately, that it is never possible to construct a safe
scheme where you can deduce the message-ids of following parts given the
first part.

Brian Truitt

unread,

Sep 30, 2002, 3:39:07 PM9/30/02

to

"Juergen Helbing" <arch...@i3w.com> wrote in message >

>
> But the _real_ good question would be:
>
> How much NNTP traffic does your users create to _external_
> news-servers ? Do you also have any chance to count this volume ?
>

That'd be a bit of a nightmare to try to track. We'd have to monitor all
egress points to our network and examine all the packets to see if they were
going to a usenet server. As several allow you to connect on multiple ports
to get around firewalls, this would also get around just looking for port
119. Not to mention newsfeeds that go to port 119, or web-based newsreaders.
It would be nice to know though =)

Klaas

unread,

Sep 30, 2002, 5:25:20 PM9/30/02

to

After careful consideration, Juergen Helbing muttered:

> Klaas <spam...@klaas.ca> wrote:

You're welcome... I think.

-Mike

Klaas

unread,

Sep 30, 2002, 5:41:18 PM9/30/02

to

After careful consideration, Juergen Helbing muttered:

> cu...@kcwc.com (Curt Welch) wrote:

>
>>The other thing that would be helpful to put in is small hash of the
>>file contents.
>
> I dont see the need for a hash.
> It makes things very complicated.

No, it makes things marginally more complicated, while increasing
usability immensely.

>>Not long enough to identify the file (that would make
>>the Message-Id just too long) but one
>>which was long enough to greatly narrow down the number of articles
>>to search when trying to match up parts, or when trying to spot
>>a re-post a file which you had missing segments for.

>
> My example was perhaps misunderstandable.
> Here an example for a three part message:
>
> <p1o3s4.abcd...@server.com.bin>
> <p2o3s4.abcd...@server.com.bin>
> <p3o3s4.abcd...@server.com.bin>

If we're going to maintain so many bits of like data among the msg-ids,
why /not/ make it a hash, when it would be so useful? (You'd want some
random element as well, ofc).

>>A large hash would be included in the headers so for any article which
>>looked like it might be the segment which was needed, the newsreader
>>could download the full header and see if the file hash matched. The
>>small hash in the Message-ID would greatly reduce the number of
>>full HEADs that had to be done to identify articles.
>
> In my example it would be possible to discard _all_ the headers
> for the part 2-n. This would also reduce the amount of multipart
> headers down to _one_ (the first one).

Sure, if you wanted to break compatibility with existing readers and
guarantee that the parts will be unusable on any server that didn't get
the first part.

> Newsreaders could find all the parts of a message by
> seeking by msgid. And so could news-servers if they are
> missing single parts: They could use additional "bypasses"
> to find missing parts.

A hash would allow servers and clients to search multiple posts for a
file, even if posted by different users in different groups. A server
could compare hashes and instead of storing a multipost twice, simply
store the headers for one and point both to the same location in the
spool. Imagine that!

>>If people match articles by hand, part counts are very useful, trying
>>to comprend offset and size data would be a real bitch. But the point
>>is that you want the computer to do this for you, and for the
>>computer, offset and size is what makes it easy.
>
> If I am understanding right what Kjell and you are discussing here
> then you want to combine multiple solutions into the message id.
> From the "simple and easy" aproach to identify multiparts as being
> an entity you now also want to solve "the rest of the problems"....
>
> This would result in an "over complexified" message-id - which cannot
> be its purpose. The message id would be great to help transits, hosts
> and readers to know that 150 parts belong together. But all the
> "rest" should be done with other headers.

Why, necessarily? I can see things like the hash being used in
intelligent peering logic. While not every problem can be solved in one
go, there's not point in restricting ourselves to the narrowest possible
enhancement, which you seem intent on doing. If we're going to develop a
standard for encoding binary part data into the message id, we should
make it as comprehensive as possible while being efficient as possible.

> It seems that we are starting the same thing as before:
> Every little change and improvement is instantly used to
> become the "total solution" of all problems. This results in
> very complex definitions.

If necessary, sure.

> Actually we are discussing _completeness_ here.
> And we are finding a way to identify multiparts properly.
> Mixing this topic with a lot of more "nice features" might be
> possible but it would slow down the process - and leads into the
> wrong direction.

Juergen, the goal is not to patch one part of the problem as quickly as
possible. That type of thinking is why we're stuck with yEnc.

> The goal is to improve multipart distribution - and/or to filter out
> (these) binaries perfectly. Attempts to make "fills" and "reposts"
> easier cannot work because it would take years to change user
> behavior and newsreader functionality.

It didn't take years for users and developers to adopt yEnc. Where's
your argument now?

> I believe that we should concentrate on (and solve) incompleteness.
> Then all the other "nice trys" are even obsolete.

I don't understand what you mean by the last sentence.

-Mike

Klaas

unread,

Sep 30, 2002, 5:54:19 PM9/30/02

to

After careful consideration, Juergen Helbing muttered:

> "ZoSo" <zo...@voyager.net> wrote:

>
>>So, did you stop to think maybe Binary Usenet *should* be replaced by =
>>P2P networks?
>
> In the past most people agreed to the statement that the end of
> the binary newsgroups would be also the end of the text newsgroups.

I wouldn't mind a reference to that survey!

> You could try to forbid binary attachments to eMail.
> The eMail volume would be really reduced.
> Then let's see how long eMail would survive :-))

Plenty long, I wager. I only very rarely use mail for file transfer, and
I suspect I not the only one...

-Mike

Klaas

unread,

Sep 30, 2002, 5:56:47 PM9/30/02

to

After careful consideration, Juergen Helbing muttered:

> You can believe me: I am working with "binary identification"
> for three years now - and my actual "file database" for just one
> group is larger than 150 Megs (filename+size+crc). There are
> "false duplicates" all the time - and there are "false new triggers"
> all the time.

You should consider a better hash algorithm. CRC blows.

-Mike

Kjetil Torgrim Homme

unread,

Sep 30, 2002, 7:22:32 PM9/30/02

to

[Juergen Helbing]:

>
> Kjetil Torgrim Homme <kjet...@haey.ifi.uio.no> wrote:
>
> >> We are having this today - splitting messages. It does not work.
> > it does work.
>
> Nice to know. Today I did miss two parts out of 2600.
> They did not appear on any of my accessible 30 Usenet servers.

you snipped away this part (which came later):

it leads to incompleteness only when the system is overloaded.

okay, so Usenet has always been overloaded. that is a fact of life,
and we need to deal with it. (yEnc has shown that improving transfer
efficiency doesn't really help servers.)

> OK, then let's drop changes to transits.

good.

> >the alternative, with small articles, just works.
>
> I recommend that you visit alt.binaries.startrek. The start of
> the new "Enterprise" episodes might be a very good example how
> "small articles just work".

yes, I know. I also run a small department server. I do offer a few
binary groups, but large multiparts will very seldom be complete. the
hardware simply can't keep up, and it's not a priority to offer
binaries, especially since very few groups with large files contain
files which can be distributed without breaking copyrights.

more annoyingly, since I have set the article size to maximum 512 KiB,
I'm getting

Fleetwood Mac Live at BBC part01.rar (55/55)
Fleetwood Mac Live at BBC part02.rar (55/55)
Fleetwood Mac Live at BBC part03.rar (55/55)
Fleetwood Mac Live at BBC part04.rar (55/55)
Fleetwood Mac Live at BBC part05.rar (55/55)
...

since only the tail part is smaller than my limit. this wouldn't be
so annoying if there was only one such tail part, but there are 69 of
them... so, I would love to be able to filter out these messages
without wasting I/O and CPU time. this is a real problem on my
server, it's old, and I can't get money to upgrade it.

> > so what do you do when .bin is allocated to something else?
>
> OK - let's use: .b or .$ The finally used combination is not
> important - I dont insist on BIN It seems to be even better to use
> something which cannot be assigned at all.

okay, .$ is good. Curt may be right that we don't really need to add
anything to the RHS, though.

> > are there any advantages to using a period?
>
> They are looking very similar with all fonts.

:-)

> If you want priorities at all then a transit could also identify
> 'important' messages by a simple serial number (which my own
> 'partnumber' could be):

you are right. it's not as flexible, though.

> I could understand that offset/size would be nice for those
> parties who want to join - or combine - or manage. But all those
> applications will always need the FULL message.

this is where you are wrong! many applications want to make the
decision without even downloading the headers. and worse, a transit
feed either gets just the Message-ID, or the complete message. my own
server throws away half the number of articles offered, and 90% of the
bandwidth is wasted since they turn out to be unwanted.

> > parts are irrelevant. the same file can divided any number of
> > ways, and an offset,size based news reader can piece them
> > together regardless of part numbers in the repost.
>
> Parts are relevant for the first distribution of a binary.
> And reposts (even smaller ones) are - and will always be - made
> based on parts - at best (today people are reposting RAR segments).
>
> Sorry, I cannot agree with you. Including reposts in variant
> sizes in the actual debate of completeness is far away for me
> personally. A future "Full Binary Usenet" might use such
> approaches.

if you can see it in the future, why should we not add support for it
today? remember that if this comes to fruition, it will be used for
many years.

Curt Welch

unread,

Oct 1, 2002, 12:45:00 AM10/1/02

to

arch...@i3w.com (Juergen Helbing) wrote:
> cu...@kcwc.com (Curt Welch) wrote:

> I understand this very well. But please answer me this question:

You never asked your question.

> As soon as one 10 Meg article came in and is completely in your
> memory then _all_ your outgoing feeds will now try to send this
> article out. If they all succeed then the 10 Meg article is also
> instantly removed from the cache. It is not necessary to have
> 1000 of these 10 Meg articles in memory. The size of your cache
> depends on the time lag between the incoming stream and the
> "accepted" message from your outgoing feedee.
> And I am pretty sure that one 10 meg article would be even
> _faster_ transferred than 100x100k articles because the feedee
> has less work with it.

It gets complex, but to simplify...

With a single 10Meg aricle, you need 10Meg of cache. i.e. enough room
to hold one article. With a stream of incoming 10Meg articles, you
need at least 20Meg of cache (one for the incoming, one for the outgoing).

With 1000 10K articles, coming in one at a time, and sent out as it comes
in, with no backlog, you need 20K of cache.

Same stream, same speed, in one case you need 20MB, the other 20KB. See
the difference?

Now what happens when you have 50 streams coming in and 50 going out (more
typical of a real transit server). with 10MB articles, you need 1GB of
cache. With 10K articles, you need 1MB of cache.

With the small articles, you get to delete the first 10K article once it is
sent to all the peers, with the 10MB article, you can't really free up
space as it's sent, you have to wait until the last peer finishes sending
the entire thing before you free up any space.

> The only reason for "larger caches" would be that a feedee fails.

Nope. Larger articles create larger caches as well.

> But in this case you are filling up the cache with exactly the
> same speed with 100x100 articles than with 10.000 articles.

That's mostly true. It fills up the same, but the one dealing with 10MB
articles will be about 9990KB larger than the one dealing with 10KB
articles for the same amount of backlog.

> And again: You are having a 'lantency' time for 100 articles
> which is in summary far longer than for one 10 Meg article.

Not sure what your point is there. 10MB of data takes about the same amount
of time to send over a TCP connection whether it's one 10MB article or 1000
10KB articles. We are talking streaming feeds here so the acks for the
articles aren't a real factor like they are for IHAVE if that is what you
are getting at.

> Those people who are working on "transport theory" are
> always preferring "bulks as large as possible" over "small".
> They can even prove that the medium transfer capability
> is better.

It's the store and forward nature of the system that changes the dynamic to
"smaller is better".

> You dont believe in my statement:
> "If multiparts are a problem then avoid multiparts ?"
>
> LOL !

I don't believe increasing the size to 10MB and getting people to post
articles that size is going to help anything. All it will do is make the
10MB articles the first to be dropped.

> I am just afraid that 'Cidera' is technically not able to
> transfer these messages - am I right ?

The 512K limit is not a technical limit. It's a management limit. They
picked a random number which kept the feed size below the 20Mbits they were
willing to devote to Usenet. There are many many feeds on usenet that have
limits like that - but all at different values - for the sole purpose of
reducing feed volume.

> >Or, maybe if the data was encoded in the corret format, the news server
> >could actually combine multipart posts together on the fly, and make
> >them look as if they were single huge articles.
>
> Nobody prevents news-servers from doing this already today.

True, it could be done today.

> On the MyNews network you can already see subjects as:
> - filename.avi (all/590) :-)

I was talking about reducing XOVER data so the user doesn't have to
download 10,000 headers for a 10,000 part file. Of course any good UI will
not force the user to look at them all. Does MyNews combine the 590
articles together so over NNTP you only have to download a single header
from XOVER and a single huge article over NNTP?

> Yes - I agree.
> But the big question is how to make this happen _now_ and _easily_.

Yes, that's the hard one.

> I dont see any chance that reader-servers are changed to treat
> "files" differently within the next five years.
> But "my way" to label multiparts simply in the message id
> offers everything you need.
> I am already downloading only message-ids from 30 news-servers.
> The full XOVER line is only downloaded for those headers which
> I dont already have.
> As soon as I KNOW that <p4o567s56......> is just part 4 of the
> message <p1o567s56....> then I will save the lot of header
> download. And all this can be done without the slightest
> modification to news-servers. Changing three autoposters
> and a few newsreader/binary grabbers would already have
> enormous effect.

Fine. Maybe usenet wants to take another baby-step instead of making
real progress.

Curt Welch

unread,

Oct 1, 2002, 12:51:19 AM10/1/02

to

arch...@i3w.com (Juergen Helbing) wrote:
> Bas Ruiter <<lord...@home.nl>> wrote:
>
> >From a users point-of-view, being able to positively identify a file
> >BEFORE downloading any part of it is HIGHLY desireable!
>
> A binary hash or CRC code does not help a bit.
> It would only prevent you from downloading an identical
> file twice.
> Today pictures are "manipulated" to make user to download
> them again.

Why on earth would they do that? Are you talking about spam or what?

> You know perhaps that "Morpheus", "Kazza" and "eDonkey"
> already permit "ahead identification" - by title, artist, album,
> even by CRC. This does not help any way. In contrary.
> People started to "fool" these systems long ago.

Why do they do that? If the users want to collect files, why would they
break the systems that make it work better? What does someone posting
a file have to gain by forcing people to download it again?

I can understand why spammers do this to prevent their spam from being
spotted. But I don't understand why people trading and collecting pictures
would do that.

Curt Welch

unread,

Oct 1, 2002, 1:02:50 AM10/1/02

to

arch...@i3w.com (Juergen Helbing) wrote:
> In the past most people agreed to the statement that the end of
> the binary newsgroups would be also the end of the text newsgroups.

Only the over-inflated ego of a binary-addict would think that. The text
groups are so low-cost to support that they would be free for everyone that
wanted to use them (as they mostly are now anyway).

itsy bitsy meowbot

unread,

Oct 1, 2002, 1:11:51 AM10/1/02

to

Curt Welch wrote:

> arch...@i3w.com (Juergen Helbing) wrote:
>> You know perhaps that "Morpheus", "Kazza" and "eDonkey"
>> already permit "ahead identification" - by title, artist, album,
>> even by CRC. This does not help any way. In contrary.
>> People started to "fool" these systems long ago.

> Why do they do that? If the users want to collect files, why would they
> break the systems that make it work better? What does someone posting
> a file have to gain by forcing people to download it again?

One reason files are mislabeled is to try to circumvent some of the DRM
attempts the P2P networks have tried to use.

A second set of mislabelings comes from people working for copyright
holders, attempting to frustrate collectors.

> I can understand why spammers do this to prevent their spam from being
> spotted. But I don't understand why people trading and collecting pictures
> would do that.

Trolls and religious fanatics (or possibly trolls posing as religious
fanatics) are known to play games like that from time to time.

Juergen Helbing

unread,

Oct 1, 2002, 12:23:43 AM10/1/02

to

Andrew - Supernews <andrew...@supernews.com> wrote:

>> Now I am really curious how you want to prevent such "attacks"
>> by using a hash and/or offset/size. ;-))
>
>you can't. The only way to avoid the attack is to use unpredictable IDs.
>Each message-id that you generate should have enough random data in it
>(16 or so bits of entropy is enough) to make the attack impractical.
>
>That means, unfortunately, that it is never possible to construct a safe
>scheme where you can deduce the message-ids of following parts given the
>first part.

And what would be the conclusion ?
Is it worth to care about it ?
Would it be an argument to stop multipart identification ?

CU
--
Juergen