Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

False Threading

0 views
Skip to first unread message

Isis

unread,
Jan 9, 2007, 2:26:21 PM1/9/07
to
Occasionally when I download a group of binary headers, one will at
first appear to be missing.

Example: I just refreshed headers & Agent said there were 13 new
unread headers. I only saw 12.

I found the 13th header by displaying unread headers & it was at the
bottom. It turned out to be threaded under an old header nearly three
months old. Since the subject & author bore no relation to each other,
it patently didn't belong there.

Turning off threading & resorting by different parameters will not
dislodge it. The only way I have found to 'fix' it is to delete the
parent header, which is not desirable.

This happens only occasionally, (once a week?), so I doubt it is a
setting.

Isis

No Body

unread,
Jan 9, 2007, 2:21:53 PM1/9/07
to

That's "hash collision". Agent creates a "hash" for the subject line and
uses that to decide which messages should be threaded together.
Unfortunately, in a group with a lot of messages, two completely different
headers will sometimes generate the same hash, so Agent thinks they belong
together.

Isis

unread,
Jan 9, 2007, 11:41:07 PM1/9/07
to
On Tue, 09 Jan 2007 14:21:53 -0500, No Body <No...@Nowhere.null> wrote:

: That's "hash collision". Agent creates a "hash" for the subject line and


:uses that to decide which messages should be threaded together.
:Unfortunately, in a group with a lot of messages, two completely different
:headers will sometimes generate the same hash, so Agent thinks they belong
:together.

Thanks for the info. The group in question has only ~10K headers; the
reason I notice it is I refresh often & the new header count is
manageable.

On a larger group with numerous new headers, the problem might well go
unnoticed. I suspect this is happening more than one might think.

A more unique (lengthy) hash ID might be an improvement.

Isis

Ralph Fox

unread,
Jan 10, 2007, 4:37:55 AM1/10/07
to
On Tue, 09 Jan 2007 22:41:07 -0600, in message <v7r8q2luoom9i2g5f...@4ax.com>,
Isis wrote:

> On Tue, 09 Jan 2007 14:21:53 -0500, No Body <No...@Nowhere.null> wrote:
>
> : That's "hash collision". Agent creates a "hash" for the subject line

For the MID in the "Message-ID" header, and for each of the MIDs in
the References header.


> :and


> :uses that to decide which messages should be threaded together.
> :Unfortunately, in a group with a lot of messages, two completely different
> :headers will sometimes generate the same hash, so Agent thinks they belong
> :together.
>
> Thanks for the info. The group in question has only ~10K headers; the
> reason I notice it is I refresh often & the new header count is
> manageable.
>
> On a larger group with numerous new headers, the problem might well go
> unnoticed. I suspect this is happening more than one might think.


See this message for a table of probabilities
http://groups.google.com/groups?selm=eiusft4mchja40rpfhdf3019o59sgtooj9%404ax.com

If you are interested in how such probabilities are calculated,
see http://en.wikipedia.org/wiki/Birthday_paradox#Generalization


> A more unique (lengthy) hash ID might be an improvement.

Indeed so.
However, this has a technical obstacle for supporting upgrade users.
A. If the database is not upgraded to have the new hashes,
then new messages won't thread with old.
B. Yet, if the database contains headers without bodies then
it cannot be upgraded -- the information just is not there
in the database to calculate all the new hashes for
"References" MIDs.
This isn't an insurmountable problem, but it may mean that
a fix will not come without a change to the database format
including the IDX files.


--
Cheers,
Ralph


Isis

unread,
Jan 10, 2007, 6:56:24 AM1/10/07
to
On Wed, 10 Jan 2007 09:37:55 +0000, Ralph Fox <-...@-.invalid> wrote:

:On Tue, 09 Jan 2007 22:41:07 -0600, in message <v7r8q2luoom9i2g5f...@4ax.com>,

Thanks for the relevant info. The 'Birthday Problem' had already
occurred to me, & I had intended to mention it in any future response,
i.e., the odds are better than 50% that in a random group of 23 people
at least one pair will share the same birthday.

There is a simple magic trick based on this, where you give two people
each a deck of cards and have them turn over one card at a time. At
some point they will almost invariably each turn up the same card. You
then show them a third deck where their matching card is mysteriously
the only face-up card in that deck. (That last bit involves a bit of
trickery, but that they will share a pair of matching cards is almost
a certainty.)

I guess the only near-term solution to this problem is to always view
unread headers, so as not to miss a mis-threaded one.

Isis

Tom Plunket

unread,
Jan 10, 2007, 4:51:49 PM1/10/07
to
Isis wrote:

> I found the 13th header by displaying unread headers & it was at the
> bottom. It turned out to be threaded under an old header nearly three
> months old. Since the subject & author bore no relation to each other,
> it patently didn't belong there.

Did the references line refer to the "parent" message? That's something
that irritates me to know end, and I've always wished that Agent would
allow the user to edit the received message headers to fix up stuff like
this.

There's also the option to "start new thread when subject changes,"
which would likely solve your problem entirely.


-tom!

--

Loren Pechtel

unread,
Jan 13, 2007, 2:08:07 PM1/13/07
to
On Tue, 09 Jan 2007 14:21:53 -0500, No Body <No...@Nowhere.null> wrote:

> That's "hash collision". Agent creates a "hash" for the subject line and
>uses that to decide which messages should be threaded together.
>Unfortunately, in a group with a lot of messages, two completely different
>headers will sometimes generate the same hash, so Agent thinks they belong
>together.

Bad engineering!

Yes, hashes are very nice--but you check the results to see if they
make sense!

0 new messages