triples distance analysis

Rudi Cilibrasi

unread,

Mar 26, 2007, 6:45:40 AM3/26/07

to Computer-Assisted Stemmatology Challenge

I had a thought about the "triples distance" defined on
the contest website. I notice that it counts any sign discrepancy
identically. But isn't it true that a disagreement of
(0,1) or (0,-1) is less than a disagreement of (1,-1)? Shouldn't the
former count less than the latter? What is wrong with a formula like
abs(sign(x) - sign(y)) / 2.0
it seems like it would be a more sensitive distance measure to me. In
that "close calls" count less than "far misses" for each triple. What
do you think?

troos

unread,

Mar 26, 2007, 7:32:58 AM3/26/07

to Computer-Assisted Stemmatology Challenge

On Mar 26, 1:45 pm, "Rudi Cilibrasi" <cilib...@gmail.com> wrote:
> I had a thought about the "triples distance" defined on
> the contest website. I notice that it counts any sign discrepancy
> identically. But isn't it true that a disagreement of
> (0,1) or (0,-1) is less than a disagreement of (1,-1)? Shouldn't the
> former count less than the latter? What is wrong with a formula like
> abs(sign(x) - sign(y)) / 2.0

Yes, you are right. I have changed the score to your suggestion. It
seems to be somewhat more sensitive.

Thanks for the suggestion!

Teemu

ilias

unread,

Apr 2, 2007, 6:50:09 AM4/2/07

to Computer-Assisted Stemmatology Challenge

Good day,
can you please provide your implementation of the distance calculation
algorithm?
You have not specified whether the A,B,C nodes should be distinct,
whether a triplet (A,B,C) is considered a set or an ordered tuple.
These two parameters alter the outcome of the algorithm, as the sum
differs.

In our implementation we have used the distinct, ordered case and the
result for the given example is different from the supplied similarity
percentage. In addition, there is no baseline method defined, so we
have implemented a method that creates random graphs and we have
measured that method's performance over a set of iterations. If you
help us make sure what the actual algorithm for the distance
calculation is, we shall be able to have a result over baseline
performance as well, and test if all methods do better than that
baseline.

Thank you!

troos

unread,

Apr 2, 2007, 7:15:59 AM4/2/07

to Computer-Assisted Stemmatology Challenge

Hi ilias,

Good questions! I have now uploaded my implementation of the distance
function on the Challenge page (under Example). The program is written
is C. (The parsing of the input graphs is in a terrible shape and will
only work for some files conforming to the DOT format. Apologies for
that.)

The average is taken over all *distinct* *ordered* triplets, so for
three nodes the triplets are: (A,B,C), (A,C,B), (B,A,C), (B,C,A),
(C,A,B), (C,B,A).

Let me know if you still get a different score (from 61.8 %) in the
example case.

The program includes an option (just leave the second parameter empty)
to create certain kind of random graphs: the nodes are randomly
permuted and connected so as to form a complete circle passing through
each node. Other kind of 'random' graphs may produce different
results. This one gives about 50% score when the true graph of Data-
set #2 is used as the correct solution.

Teemu

George Giannakopoulos, aka PCKid

unread,

Apr 11, 2007, 11:06:32 AM4/11/07

to Computer-Assisted Stemmatology Challenge

Hello Teemu.

Ilias and I have used your distance metric calculation implementation
in our experiments and a problem seems to have emerged.
I have (manually) written dot files representing the example result
graph as well the correct graph. The comparison between the two files
using the rankdistance executable does not returned the expected
(68...%) result. It returns 72.649%. I have double checked the graph
and it is definitely correct. I will send you be e-mail the files for
you to check. It will be a problem if the distance metric calculation
is buggy, so please check as soon as you can.

Thank you a priori.
George G.

troos

unread,

Apr 11, 2007, 1:40:05 PM4/11/07

to Computer-Assisted Stemmatology Challenge

George,

The program is OK but I had been sloppy enough to use the wrong graph
file. I fixed the problem and corrected the score to 72.649%, exactly
as you had obtained. (I should have noticed that something was wrong:
61.8% is worse than an empty graph gives!)

Big thanks for pointing out my mistake! Apologies for the confusion.

Teemu

On Apr 11, 6:06 pm, "George Giannakopoulos, aka PCKid"

Reply all

Reply to author

Forward