http://206.65.59.246/VTES/member.tpl
--
Aaron
The Nosferatu Stuff
>Hey, what did they do with the stats. I just went to prove to someone that
>James had a crappy record, and I find that he is at zero....and I'm at zero.
>New look, new design. Whats that all about? INFO?
>http://206.65.59.246/VTES/member.tpl
New set, new stats.
As requested.
When the person who volunteered to deal with the ratings is allowed to
begin entering tournament results again, you will have a much better
picture of the players and of their histories. You will also have a
much better picture of the people running your events as you will be
able to tell at a glance if they sent in the tournament results on
time or not.
Three cheers for Conrad & the ratings updater!!
Carpe noctem.
Lasombra
The timing is just a coincidence but one that I'm happy about.
The old rating system required that each tournament be calculated in
chronological order. If tournament "C" didn't turn in their Archon reports in a
timely manner, tournaments "A" and "B" couldn't yet be updated. Also, the old
rating system required a lot of manual work. The number of tournaments has
grown as so much that keeping-up with the ratings had the potential to overwhelm
those processing them. Lastly, as was mentioned in this newsgroup, there were
flaws with the old rating calculation itself.
The new system is stats-based. It is not dependent on chronological processing
of results. It is something that we can maintain even as we experience more
growth.
-Robert
Robert Goudie
Chairman, V:EKN
rob...@vtesinla.org
I don't understand. Are they not going to do Elo any more?
Fred
Nope. No more Elo. Players will be ranked based on their Game Wins per Game.
Its not much more complex than that. Tie breakers, minimum games played, and
other details can be found by following the link to the Ratings FAQ from the
main ratings page.
http://206.65.59.246/VTES/member.tpl
-Robert
Arg! Makes it a lot less meaningful. Well, it _was_ meaningless before due
to fact that the selection of constants meant that the most very recent games were
all that mattered. This has the opposite problem, that a game you played as a
rank beginner is just as important as yesterday's game. But what's really going
to prevent me from putting much stock in this one is that it makes no adjustment
whatsoever for quality of competition. A game between you, your two children and
your neighbor's dog (playing a CL deck, no doubt) will count the same as playing
in Continental Finals.
Not worthless, I suppose. At least we'll know more or less know who's top dog in
each of the various localities...
Fred
Think baseball stats. Career averages. ERA. Batting Avg. etc. None are adjusted
for quality of opponent, weighted based on improvements in construction of
equipment, altitude of the parks, etc.
> Not worthless, I suppose. At least we'll know more or less know who's top dog
in
> each of the various localities...
And limited rankings too! :)
-Robert
The most common ones discussed are annual statistics, not lifetime. (At least,
that's my perception.) Will these be reset from year to year?
The other things matter relatively little compared to the VEKN situation. Even
with unbalanced schedules, all the teams in the league will play a certain amount
of games against one another, and in each other's home stadiums. American League
stats are seldom compared directly to National League, as the two are essentially
different games.
Contrast these vs. VEKN ratings, where many players will seldom or never play
anyone except those in their home cities. Some will travel occasionally. And
Jeff Thompson will eventually play on other planets when he runs out of new
cities to visit. Like the previous ratings, these numbers will have some
arguable meaning but on whole, not really tell us very much.
Fred
::
:: But what's really going to
:: prevent me from putting much stock in this one is that it makes no
:: adjustment whatsoever for quality of competition. A game between
:: you, your two children and your neighbor's dog (playing a CL deck,
:: no doubt) will count the same as playing in Continental Finals.
I'm very happy with this new system.
With the previous one it was very difficult to be on top of the rankings if
you were a seasoned player. I was the french player(*) who entered the
biggest number of tournaments (47) and i was also on the bottom of the
rankings, not really due to my poor playing skills, but because a lot of
other guys had had less involvement in tournaments (say, 15 to 25
participations). A lot of players also won a handful of tournament then sat
on the top three and refused to play anymore to keep the (illusionnary) head
of the pack. Now things will change. The more you play, the more your odds
to gain places in the ranking. And it's quite good.
:: Not worthless, I suppose. At least we'll know more or less know
:: who's top dog in each of the various localities...
mmm... expect big surprises here ;)
:: Fred
(*) no, Mr Rebstock, you're not a french player anymore ;)
> And limited rankings too! :)
When does this go into effect? Will gencon be able to be rated? If so I
want to rate the draft at WON! :) I love draft.
Ratings are still pretty pointless, but I guess its good that we got a
reset. Is that going to happen every year? Maybe then it should be
lifetime stats, and this years stats. I think baseball does something like
that?
It's in effect now. GenCon will be rated. WoN Draft sounds fine to me.
-Robert
Great...and right after I win my first tournament ever...
Me? 2948 last time I checked it.
Message-ID: <Vk9xS1Qb...@gratiano.zephyr.org.uk>
Life keeps getting in the way, so it shouldn't have changed since then,
which is a pity as I'd been doing a bit better at that point. My cards
live in Birmingham (where I am currently), but I spend over half the
year in Cambridge, without a car because of University regulations. As
a result, getting home is about 4 hours by train due to the wonders of
Dr Beeching (Cambridge->Birmingham New Street->Sutton Coldfield - allow
about 20 minutes for delays on the Cambridge->Birmingham leg).
Then most of the tournaments are down south, which is even more
travelling, so going to tournaments during term-time - unless it's
sealed deck or pre-built and I brought it with me to Cambridge - which
is a pretty huge chunk of time to take out when I have commitments in
Cambridge (some of which I'm trying to whittle down). And then outside
of term-time, life intervenes again and the two obvious tournaments
outside of term-time which I could play in are the European Qualifiers
and GenConUK - both of which I judge.
Why do I judge them? Well, I'm a purely average player. My rating
would be a bit higher than it is, if I could play a bit more regularly,
but I'm never going to be a top-ranked player - the main reason for
which is that I don't adapt fast enough during play. It's a problem I
have in a number of such games - like Civilisation and such. I want to
play how I want to play, and I don't see the problems as fast as I need
to to be a significantly above average player. I can do most other
things reasonably enough.
Since I only get to play in tournaments infrequently, fiddling with
decks to oblivion can get a bit frustrating. So I end up playing with
decks I haven't had chance to test enough, which is frustrating.
Hopefully, at some point, I'll get my life back. Or work out how to
move thousands of cards back and forth to Cambridge easily.
--
James Coupe
PGP 0x5D623D5D You don't need to hear it but I'm dried up
EBD690ECD7A1FB457CA2 and sick to death of love.
13D7E668C3695D623D5D
I doubt that. I'm pretty sure it was because you weren't doing very well in
your recent performances. There was no penalty for playing lots of games in
the previous system. Indeed, with the constants set as they were, it hardly
made much difference how many games you'd played. All that mattered was
how you'd done and who you'd played in your most recent few games.
I certainly wasn't defending the previous ratings systems. I'm just
complaining that a previously flawed system has been exchanged - and with a
great deal of labor and fanfare it appears - with a new flawed system. And
when a simple change to the old system would have fixed the problem.
It's the sort of thing that makes me bang my head on my computer table.
> A lot of players also won a handful of tournament then sat
> on the top three and refused to play anymore to keep the (illusionnary) head
> of the pack. Now things will change. The more you play, the more your odds
> to gain places in the ranking.
I'm sorry, you're seeing something in these rankings I'm not seeing. Ranking
is dependent on game wins PER GAME. Actually, the problem you cite still
exists. If someone gets a few excellent results and cares about their ranking,
they would do well to retire immediately.
Fred
I admit my performances were not very good lately...
But sometime in tournament you get crushed by a very skilled predator. Why
should other players who have nothing to do with your defeat (except by not
helping you) gain ranking points on your dead body ? I had some tournaments
where i did good performances (last year, in LA, went to the finals) but it
still didn't raise my ranking...
(snip)
::: A lot of players also won a handful of tournament then sat
::: on the top three and refused to play anymore to keep the
::: (illusionnary) head of the pack. Now things will change. The more
::: you play, the more your odds to gain places in the ranking.
::
:: I'm sorry, you're seeing something in these rankings I'm not seeing.
:: Ranking is dependent on game wins PER GAME. Actually, the problem
:: you cite still exists. If someone gets a few excellent results and
:: cares about their ranking, they would do well to retire immediately.
If he retires, someone with the same amount of VP but more participations
will go above him on the ranking =)
that's that simple.
You're still on the idea that the constants effected that problem?
As previously countered, they do not. Changing the K value (that's
the 32 in the "Change in Rating" formula) changes the magnitude of
the swings, but doesn't change anything w.r.t. relative ratings
(except inasfar as it allows roundoff error to creep in).
Changing the S value (that's the 400 in the exponent of the
denominator of the probability formula) has a similar effect.
Starting at 3000 and then swinging between 3100 and 2900 based on
recent performance is no different than starting at 3000 and then
swinging between 4000 and 2000 or between 3006 and 2994 based on
recent performance (under the assumption that the constant is
the same for everyone so everyone is swining in the same range).
Elo systems inherently are not affected by number of games played.
They're zero-sum systems.
If you win your rating goes up.
If you lose your rating goes down.
How much it goes up or down depends on the ratings of your opponents
and the constants.
> I certainly wasn't defending the previous ratings systems. I'm just
> complaining that a previously flawed system has been exchanged - and with a
> great deal of labor and fanfare it appears - with a new flawed system. And
> when a simple change to the old system would have fixed the problem.
>
> It's the sort of thing that makes me bang my head on my computer table.
Please supply values for the constants (or whatever your "simple change"
is) that would fix that particular flaw.
--
LSJ (vte...@white-wolf.com) V:TES Net.Rep for White Wolf, Inc.
Links to V:TES news, rules, cards, utilities, and tournament calendar:
http://www.white-wolf.com/vtes/
Well, any rating system will probably be flawed in one respect or
another. If there's a perfect rating system I don't know what
it would be.
But the reason the ELO system was done away with wasn't *just*
because it was flawed in weighting most-recent games too heavily.
It was also (so I'm told) a huge amount of work to run the ELO
system: there was a lot of manual inputting of results, plus it
required results to be entered in order of event date (so if
results went unreceived, ratings couldn't be updated until that
tournament organizer sent them in, unless the event was to be
entirely ignored).
I think it'd be cool to have an ELO-rating (with improved
constants) available in addition to the "wins per game" that
we're apparently going to be seeing soon. But I can certainly
live without it, if it would be too much effort to be practical.
Even an improved-constant ELO system would have flaws of its
own.
And, the new system can (I would think) easily be expanded to
include more statistics, like "current season GW percentage"
and so on.
Josh
buried in this bare bed
I thought you had agreed that the constants did affect the problem.
If either factor is flattened, that makes it easier for your
rating to go up again after winning, because the system doesn't
assume you "ought to" win as often - probably appropriate in a
game with a substantial luck factor as well as a "hot or cold"
effect in that if you win a five-player game, you beat four
people, but if you lose one you may have lost to four people.
Curt Adams even did some empirical testing and found that
reducing the "K" helped quite a bit in ranking players more
consistently with their "underlying skill".
http://groups.google.com/groups?selm=20020131013456.28722.00000519%40mb-cv.a
ol.com
http://groups.google.com/groups?selm=20020204235941.11248.00002036%40mb-cg.a
ol.com
Josh
not that i'm hugely attached to elo ratings
The K is a scale.
Changing from 32 to 4 is, mathematically, exactly the same as
changing the 3000 starting point to 24000 and the 400 to 3200 and
rounding the results to the nearest multiple of 8.
I suppose the key could lie in the ratio of K to S. But I have
no idea what sort of argument would mathematically bear that out.
I don't think it matters whether you do or don't change the 3000.
You could equally well make it 0; the only time "existing rating"
is used is to figure the difference between two players' ratings,
so any starting point is equivalent.
As you say (and Curt Adams wrote in those posts), changing K or
S should have equal effects when changed in the same ratios.
> I suppose the key could lie in the ratio of K to S. But I have
> no idea what sort of argument would mathematically bear that out.
S tells you what your "win probability" is supposed to be when
you play against some opponent.
K tells you how much your rating can change - which means how
much your expected win probability can change - from any single
result.
In combination, they determine how much a single result affects
"how good we think you are", eg how much winning once should
imply that you will win again. The smaller the S/K ratio, the
more any individual result affects your expected win probability.
The lower the S/K ratio, the less a single result affects your
expected win probability.
In my attempts to understand the formula, it seems that the old
formula produced so much volatility because it implied that a
person with a rating 200 points above starting position should
beat a person at starting position 75.9% of the time, gaining
8 points if he won or losing 24 if he lost. In a game with this
much luck/seating-position/etc influence that's not (IMO)
realistic.
Increasing the S/K ratio (to eg 3200/32) would reduce that
expected win percentage to 53.6%, and gaining 15 points or losing
17. It's possible that that's too much reduction, but it's hard
for me to tell without seeing actual outcomes.
You might say that if S/K was 3200/32, all that would do is
make people's ratings fluctuate in a narrower band, but Curt
Adams' results seem to indicate that that's not the case - that
in fact increasing the S/K ratio would allow "good" players'
ratings to more accurately reflect that they are "better" than
the "bad" players.
Josh
unsung and unsaid
(I'm assuming we're using "K" to represent the multiplier and "S"
to represent the divisor of the difference in player ratings.)
That is precisely what I've been saying all this time. No one has
convinced me that I'm wrong about it and I'm pretty sure most of the
people who've followed my reasoning have agreed with it.
The ratings swing, per result, is based on the differences between
the two players' ratings and the constants. If you reduce the value
of K with respect to S, you reduce the influence of the difference
between the two players' ratings in each ratings swing. This is
good because it allows ratings to go higher (for the better players)
and lower (for guys like me) before the difference in ratings swings
starts to allow me get enough extra points back for times I get
lucky to make up for all the drubbings I take from being bad. The
reason this is important is that it means a single recent game means
less to the overall ratings.
The downside of reducing this ration is that it takes more results for
the players (especially the extremely good and bad ones) to get to
their true places on the spectrum. Reduce the ratio too far and it
will take hundreds of games, further and it will take thousands of
games. As the ratio approaches zero, the effect of differences between
two players' ratings will not matter and ratings will never reach
equilibrium (assuming "K" itself is not zero). So you don't want to
overdo it.
However, the nature of the game being what it is, and the way the
results are gained (six results in a four-player game, ten in a five-
player game), I would argue that it's appropriate that it take a while
to reach equilibrium. As it was, we were reaching equilibrium in two
or three games and bouncing so much higher and lower than our true
approximate ratings that we were constantly bouncing above and below
players of much differing abilities.
By the way, the 3000 number is really meaningless. You can make it
anything you want. It's really just window-dressing to compare to the
"real numbers" - which are the positive and negative rating changes
the formula spits out. I would say you want to set it higher than
the worst negative rating would ever be. In our old system, since
ratings almost never went much below -300, 500 would have been fine.
If you were to reduce "K", 500 would still be fine. If you increased
"S", 500 times the increase of the "S" factor would be fine.
Fred
The one has little to do with the other. In a multiplayer game, people will
gain advantages they don't deserve at times due to an interaction they had
nothing to do with. Can't be helped. But your good performances likely helped
your rankings just fine. It's just your ratings then raised for a moment
making one bad slip-up cost you far more than you deserved - a flaw that fixing
the constants will solve.
> ::: A lot of players also won a handful of tournament then sat
> ::: on the top three and refused to play anymore to keep the
> ::: (illusionnary) head of the pack. Now things will change. The more
> ::: you play, the more your odds to gain places in the ranking.
> ::
> :: I'm sorry, you're seeing something in these rankings I'm not seeing.
> :: Ranking is dependent on game wins PER GAME. Actually, the problem
> :: you cite still exists. If someone gets a few excellent results and
> :: cares about their ranking, they would do well to retire immediately.
>
> If he retires, someone with the same amount of VP but more participations
> will go above him on the ranking =)
> that's that simple.
Again, I think you are mistaken. AGAIN, ranking appears to me to be dependent on
game wins PER GAME. Not total game wins or total victory points.
Fred
But, GW percentage doesn't tell you anything meaningful about how good someone
is. So, expanding the stats isn't going to be of any more use.
GW percentage has too much to do with how often games are won at all to be any
sort of meaningful indicator of ability.
This is one reason I find JOL to be satisfying actually... I can play at my
own pace and think about my actions and reactions for an hour if I want
before replying.
But I think this has only helped my game in real-time. I think being used to
a slower pace has made me more aware of the situation at any given table or
something. It's like I can scan the table and see potential threats more
clearly than I used to or something... that sounds weird but I can't explain
it better than that. I suppose it's almost like those Tai Chi people knowing
how to defend themselves in *other* than slow-motion :)
Cheers,
WES
Right. The old ELO system, with adjusted constants, would still have all the
other problems. Not much good having a "perfect" rating system if its too
cumbersome to maintain.
-Robert
Robert Goudie
rob...@vtesinla.org
Chairman, V:EKN
I'm not sure why it should be too cumbersome to maintain. To me it seems
like the system ought to be smart enough to rewind and refigure results
sent out of order. Of course, it has to be programmed to maintain histories,
but that appears to be part of the new system anyway. The programming's a
little more complex but nothing major. I flummoxed. What's supposed to be
more cumbersome about ELO than the new system?!?
Ultimately, if there is some real reason the old system was tougher than
the new one then there's nothing to be done if WW just simply doesn't want
to do it anymore. But it seems silly, doing it for two or three years when
the old system was spitting out meaningless numbers and giving up when
something worthwhile could actually be done with the data.
I guess I can't help but let my disappointment show through, here. This
set of statistics they're proposing to show and call "ratings" are just
completely uninteresting to me. To give you an idea of how much quality
of opponents (the part that's totally ignored) matters, one of the guys
in the final in Tortured Confession: Phoenix played the ASSAMITE PRECON
DECK, for crying out loud! Contrast this with the quality of the regular
players in Los Angeles and you'll understand why I think statistics taken
that ignore context are tell us little of value.
Fred
For the system to be that smart...well, first of all that would require a system
to start with. :) I'm surprised we maintained the ruse so long. There was no
database, no system, and no histories were kept. Individual brave souls took
care of updating the ratings. One at a time, and in order. All the while,
harrassing coordinators to turn-in their stuff, correcting mistakes, and
manually entering results when players didn't have Archon spreadsheets to send.
BTW, a huge thanks to Todd Bannister for doing this for so long. His devotion
and steadfastness was incredible.
> Ultimately, if there is some real reason the old system was tougher than
> the new one then there's nothing to be done if WW just simply doesn't want
> to do it anymore.
Another misconception. The ratings were started by the V:EKN in the absence of
real support by WotC. We've always done it on our own. WW actually supported us
quite a bit in creating this system. However, it is still maintained by
volunteers.
> But it seems silly, doing it for two or three years when
> the old system was spitting out meaningless numbers and giving up when
> something worthwhile could actually be done with the data.
I started doing the ratings in, I believe, 1998. There were maybe 5-10 a month.
I used to get behind. :)
Correct.
That makes sense. [and snip more stuff that makes good sense]
Actually, according to many, the ratio should be pretty close to zero,
due to the table interaction/seating/other activity beyond the players'
control.
Probably even 1:100 is too large.
But, of course, then players would complain that, after sweeping 4
games in a tournament and watching their rating go up by 5 points,
that the system doesn't "swing" enough.
Solid argument (as I said in the previous follow-up).
But that doesn't explain Curt's results.
If Curt had used a formula for his win expectation that matched the
32:400 formula's win expection, then, regardless of the real-world
validity of the 32:400 formula, the statistical analysis should've
born out the parallel.
But it didn't. Instead it indicated that his formula for expectation
more closely (but not completely) matched a 4:400 formula.
http://groups.google.com/groups?selm=20020131013456.28722.00000519%40mb-cv.aol.com
Quote:
> To produce the desired set of two-player paired outcome probabilities from
> multiplayer games, I calculate a performance for each player of 10^(Skill/D).
> The chance of each player winning the game is their performance divided by the
> sum of performances of all players in the game.
So the statistical results "showing" that the 4:400 formula more closely
follows the skills of the players is actually just showing that it more
closely follows the above-cited method of analysing the skills of the
players.
It's still useful information, but it doesn't disprove the validity of
the 32:400 formula. Some other preformance calculation could've been
found that more closely matches the 32:400 formula.
That is, if Curt had calculated the chance of each player "winning the
game" (not sure if that means sweeping or what) so that it worked out
to 1/(10^((X-Y)/400)+1) for each pairing of player X and player Y, then
it would've supported the current formulation (and would've been just
as arbitrary in doing so as his current method was in supporting a
4:400 ratio).
Calulating the percentage of each player "winning the game" as merely
that player's rating divided by the sum of all ratings at the table
would support yet another rating calculation method.
And so on.
Curt - do you still have the program you used to get those results?
Even though it won't matter now that the stats-based method is in
place, I'd still like to take a look at it.
I thought that *I* was... But you're right, Reyda, you beat me by ONE game =)
>
>
>
> (*) no, Mr Rebstock, you're not a french player anymore ;)
Don't agree with that, Pierre is definitely a french player, even on english land =)
Miller Delmardigan
http://sabbatinfrance.free.fr/en
I may be wrong (due to limited tournament experience) but this seems to be
more representative of how tournaments are ranked.
My $0.02
--
Dave
Sysop: Ottawa City Grid
The Black Sun: <http://www.ncf.carleton.ca/~bl740/blacksun.html>
"...And never fear. Fear is for the enemy. Fear and bullets..." - J. O'Barr
It's potentially doable.
Throw in each tournament, in date order. Keep the details saved. Press
the big button, calculate the rankings, which parses through each
tournament in date order.
When an older tournament comes in, plug it in, press the big button, new
rankings.
When all the sanctioned tournaments before a certain date are in, throw
away the details and start with a new baseline for all later
calculations.
The difficulties - for me - would come with integrating it with the
Archon spreadsheet, since I know nothing about interoperating with Excel
and the like under 'doze. Still, it's potentially doable.
Sure it does. It tells you how often they win games. Why isn't
that meaningful? Because of the "2 VP minimum, ties don't win"
restriction? But that applies to everyone everywhere, so should
(in theory) balance out.
> GW percentage has too much to do with how often games are won at all to
> be any sort of meaningful indicator of ability.
Well, clearly the assumption is that games should be won. If
games are not won, then yes, there's a problem with this scoring
method in that, where games are not won, everyone scores a loss.
From the rating-system POV, though, everyone *deserves* a loss if
no one wins.
If this happens a lot more in some areas than others, scores
won't be comparable across areas. But that's more or less true
anyway, since these GW scores will mainly reflect people playing
against others in their area. And it was more or less true of
the old system too - yes, the ELO system can "shift points" from
one region to another when people travel to play in other regions,
but probably not enough points or fast enough to make comparisons
truly meaningful.
Josh
meaningless
[snip]
> When all the sanctioned tournaments before a certain date are in, throw
> away the details and start with a new baseline for all later
> calculations.
>
> The difficulties - for me - would come with integrating it with the
> Archon spreadsheet, since I know nothing about interoperating with Excel
> and the like under 'doze. Still, it's potentially doable.
I'm sure it's doable. I think the problem is not having anyone
to do it. If there were a volunteer to code something that could
extract data from Archon spreadsheets and automatically calculate
ratings like this, I imagine the VEKN would be more than happy to
have it.
So, volunteers? :-)
Josh
probably don't even need to throw away the details ever, unless
you have a very small hard drive or something
On this specific example, probably no rating system can ever
take into account the quality of opposing *decks*. Opposing
*players*, yes, in theory. But if Rob Treasure shows up with
a deck made of 90 Save Face, the ELO system will still assume
that you should get lots of points for beating him.
On the general topic of "quality of opposition", yes, the ELO
system is supposed to take that into account. But even there
it's far from perfect; everyone starts at 3000 for example, so
even if they're terrible players and play terrible decks they'll
be worth a fair amount of points for quite a while (and for
longer than they used to be, with "improved" constants).
Ultimately, this game may be luck- and seating-dependent
enough that no ratings system can possibly tell us all that
much about who's the best. The "percent of games won" system
has the advantage that it's easy to calculate and won't (after
a few games) fluctuate very wildly. It does have the
disadvantage of being hard to move at all after a while, but
that can (I think) be easily solved by tracking a "current
season's GW percentage" as well.
Likewise, issues with the GW score itself could be addressed
by adding a "VPs per game" stat, a "tournaments won" stat, and
so on.
Josh
multidimensional
Thanks. :-)
> If Curt had used a formula for his win expectation that matched the
> 32:400 formula's win expection, then, regardless of the real-world
> validity of the 32:400 formula, the statistical analysis should've
> born out the parallel.
I'm not sure that's exactly what he did. As I read his article,
it appears that he assumed a "true skill value" for each of his
30 "test players". That "skill value" was expressed in terms
of the 32:400 formula and set along a normal distribution with
a particular standard deviation (90) derived from looking at
actual ratings results (the ratings being an attempt to estimate
"true skill").
He then set the "chance of winning" for each game to be based
on the "true skill" of each player in the game. However, the
amount their *ratings* shift after the game is based on their
current *rating*, ie the estimate of their skill.
> But it didn't. Instead it indicated that his formula for expectation
> more closely (but not completely) matched a 4:400 formula.
>
>
http://groups.google.com/groups?selm=20020131013456.28722.00000519%40mb-cv.a
ol.com
> Quote:
> > To produce the desired set of two-player paired outcome probabilities
> > from multiplayer games, I calculate a performance for each player of
> > 10^(Skill/D). The chance of each player winning the game is their
> > performance divided by the sum of performances of all players in the
> > game.
>
> So the statistical results "showing" that the 4:400 formula more closely
> follows the skills of the players is actually just showing that it more
> closely follows the above-cited method of analysing the skills of the
> players.
Right. But his method of describing the skills of the players -
I think - is to assume a normal distribution of skills. Whether
the standard deviation he gave that distribution is appropriate
is hard to tell, but it was at least derived from historical
results.
> It's still useful information, but it doesn't disprove the validity of
> the 32:400 formula. Some other preformance calculation could've been
> found that more closely matches the 32:400 formula.
Probably true, but if I'm understanding it correctly, the version
that would support the 32:400 formula would involve a higher
standard deviation of skills rather than a different method of
calculating each player's chance of winning.
> That is, if Curt had calculated the chance of each player "winning the
> game" (not sure if that means sweeping or what) so that it worked out
> to 1/(10^((X-Y)/400)+1) for each pairing of player X and player Y, then
> it would've supported the current formulation (and would've been just
> as arbitrary in doing so as his current method was in supporting a
> 4:400 ratio).
I'm not sure that's correct. But I'm not sure I follow the math
enough to say so for sure, either. :-)
BTW, "The chance of each player winning the game is their performance
divided by the sum of performances of all players in the game. The #2
player is then determined by a similar procedure among the remaining
players, and so on. Each pairwise outcome is determined by the first
time either of the players beats the remaining players; each player
beats each other with the appropriate expected chance."
I'm not sure exactly what that means about who beats who, but I
think it doesn't just assume one sweeping player per game.
> Calulating the percentage of each player "winning the game" as merely
> that player's rating divided by the sum of all ratings at the table
> would support yet another rating calculation method.
Right. We may not know enough about the underlying "true chances"
of winning to come up with any really good model of who should
win how often.
Josh
asleep in the deep
It'd take me (more than) a bit of research, but I could do it if no-one
else can.
A friend's been doing similar parsing of Word docs, hopefully I could
cadge some of his thoughts.
It was expressed in "ratings units" (or vice-versa), correct.
>
> He then set the "chance of winning" for each game to be based
> on the "true skill" of each player in the game. However, the
> amount their *ratings* shift after the game is based on their
> current *rating*, ie the estimate of their skill.
Right. And there's where the assumptions creep in to skew the
results.
The Elo system computes a chance of X beating Y at a table (via
the formula stated in the VEKN rules).
Curt used a different formulation:
Quoth Curt:
> To produce the desired set of two-player paired outcome probabilities
> from multiplayer games, I calculate a performance for each player of
> 10^(Skill/D). The chance of each player winning the game is their
> performance divided by the sum of performances of all players in the
> game.
End Quoth Curt.
> > So the statistical results "showing" that the 4:400 formula more closely
> > follows the skills of the players is actually just showing that it more
> > closely follows the above-cited method of analysing the skills of the
> > players.
>
> Right. But his method of describing the skills of the players -
> I think - is to assume a normal distribution of skills. Whether
> the standard deviation he gave that distribution is appropriate
> is hard to tell, but it was at least derived from historical
> results.
That's a separate issue. Whether you assign the players' skill
levels by that method, or just assign them 100, 200, 300, and so
on, might matter at some point on some level, but it's not part
of the problem in computing win expectancy.
> > It's still useful information, but it doesn't disprove the validity of
> > the 32:400 formula. Some other preformance calculation could've been
> > found that more closely matches the 32:400 formula.
>
> Probably true, but if I'm understanding it correctly, the version
> that would support the 32:400 formula would involve a higher
> standard deviation of skills rather than a different method of
> calculating each player's chance of winning.
I disagree. A version whose win expectancy matched the win expectancy
of the ratings system would yield a better parallel of observed
results.
> > Calulating the percentage of each player "winning the game" as merely
> > that player's rating divided by the sum of all ratings at the table
> > would support yet another rating calculation method.
>
> Right. We may not know enough about the underlying "true chances"
> of winning to come up with any really good model of who should
> win how often.
Another very good, but separate, point.
Even if the "skill level" being modeled incorporated deck choices (including
the meta-game-ascertaining insights of the player), playing skill,
table talking skills (intimidation, cajoling, deal-making, etc.) and
all, it would still fail to account for seating position problems,
"random" (beyond meta-game) actions/decks made/played by one's
opponents, etc.
So that one's "win-expectancy" could never be accurately modeled by
comparing skill values alone, given the large uncertainties introduced
by the latter aspects of multi-player gaming (and another reason
not to put too much weight on any rating/ranking system).
Well sure its doable. I was just making it clear to Fred that its not
a minor change to the existing database and rating system. There is
no existing ELO database to modify.
-Robert
Because how often someone wins games doesn't correlate that well to how well
someone does, first. Would you equate two players who both won the same number
of games but where one had 1.5 VPs in every other game played and the other had
0 VPs in every other game played?
>> GW percentage has too much to do with how often games are won at all to
>> be any sort of meaningful indicator of ability.
>
>Well, clearly the assumption is that games should be won. If
>games are not won, then yes, there's a problem with this scoring
>method in that, where games are not won, everyone scores a loss.
>From the rating-system POV, though, everyone *deserves* a loss if
>no one wins.
Why? Not winning a game when no one else does either doesn't display lack of
skill. (Mmm, many negatives are fun.)
How often games are won at all varies depending upon prevailing style -
offensive v. defensive - and relative skill of the players. My experience is
that the better the players, the less chance a game will be won.
>If this happens a lot more in some areas than others, scores
>won't be comparable across areas. But that's more or less true
>anyway, since these GW scores will mainly reflect people playing
>against others in their area. And it was more or less true of
>the old system too - yes, the ELO system can "shift points" from
>one region to another when people travel to play in other regions,
>but probably not enough points or fast enough to make comparisons
>truly meaningful.
Okay, the old system was crap. Why put in a new one that's no better?
I don't see why it's so hard to come up with a system that has some relevance
to skill.
Here's one:
You get 1 point for winning a tournament. You get another for winning a major.
System keeps track of how many events you've played in (otherwise, you
wouldn't know someone's average) and how many points you have.
Too simple? Fine. But, I'd be a lot more impressed with someone who has a
bunch of these points than someone who has a high GW%. And, isn't that what a
rating system is supposed to accomplish - determining who is better than
someone else?
Well, I agree that more metrics would be better. I suggested
"VPs per game" and "tournaments won" (or "tournaments won per
tournament") in my reply to Fred Scott too.
That said, the 1.5 VP and the 0 VP player could both be coming
in second in all their other games with those scores. And
from a certain point of view, not winning makes you a loser
regardless of how much you don't win by. (You'll find that many
of the truths we cling to depend greatly on our own point of
view. ;-)
> >From the rating-system POV, though, everyone *deserves* a loss if
> >no one wins.
>
> Why? Not winning a game when no one else does either doesn't display
> lack of skill. (Mmm, many negatives are fun.)
From the rating-system POV, if no one wins, everyone has lost.
That's sort of nonsensical from a human POV in that we expect
every game to have a winner, but not entirely; there are some
games where everyone can lose. Now VTES is one of them. :-)
> How often games are won at all varies depending upon prevailing style -
> offensive v. defensive - and relative skill of the players. My
> experience is that the better the players, the less chance a game will
> be won.
Mine isn't. Hence my comments about regionality... :-)
Style can change to fit scoring methods, if people care about
scoring. Isn't that what people feared/hoped would happen with
the change to the GW system for tournaments?
> Okay, the old system was crap. Why put in a new one that's no better?
Cause it's a lot easier to maintain. Also, it's supposed to
provide a more stable metric, even if not everyone likes the
new metric.
> I don't see why it's so hard to come up with a system that has some
> relevance to skill.
The game win percentage has *some* relevance to skill...
> Here's one:
> You get 1 point for winning a tournament. You get another for winning
> a major.
> System keeps track of how many events you've played in (otherwise, you
> wouldn't know someone's average) and how many points you have.
>
> Too simple? Fine. But, I'd be a lot more impressed with someone who
> has a bunch of these points than someone who has a high GW%. And, isn't
> that what a rating system is supposed to accomplish - determining who is
> better than someone else?
This system is so simple that it could be created just from Jeff
Thompson's TWD webpage. (Modulo the small tournaments.) Which isn't
a bad thing at all, to my mind. But only a few people can be ranked
at all by it, which is a pretty major drawback. Not, IMO, enough
of one to dismiss it altogether, but I'd want to have several other
stats to go along with it. Which seems to be the point I keep
coming back to: more stats is better. Rankings by each of them
independently would be even more better.
Josh
can it be done?
Actually, I suspect it *wouldn't* better match the results we
observed in practice over the past few years (since apparently
the underlying chances of winning do not match the ELO-calculated
chanes). It *should* better match what the ELO scoring system
expects to happen, I think you're right about that. Though again,
I'd have to see results of simulations; I can't feel sure about it
just from turning over the formulas in my head.
[skill vs other factors]
> So that one's "win-expectancy" could never be accurately modeled by
> comparing skill values alone, given the large uncertainties introduced
> by the latter aspects of multi-player gaming (and another reason
> not to put too much weight on any rating/ranking system).
Well, it can always be modeled by including a large enough
"random factor". But yes, if there's that much "randomness"
involved it certainly implies that we shouldn't take ratings
too seriously.
Also, we should have more axes to rank people on, because the
more things we look at, the more likely that everybody will find
at least one of them interesting. :-)
Josh
axiomatic
Right. By observed, I mean output of the system (matching the
skill-level input to the system).
Works like this:
placement stat = 1 - (finishing rank/# of participants)
It doesn't reward winning bigger events more than smaller ones -
kind of the opposite, sort of - but winning big events, as Gomi
said at the time, gets you a reputation all by itself.
Josh
right ben? ;-)
Hey, that's people for you. No one's totally happy.
But the important part is that the numbers behave in as reasonable
manner as possible. Don't forget, what was really causing the problem
of the old system wasn't how small or large the swing number was
exactly but how large it was compared to the effective range of
possible ratings. When ratings seldom go higher or lower than 2750 to
3250, a swing of greater than 100 points in a single sitting is surely
pretty reckless. The problem is caused by how quickly a guy gets up
to a point where he's risking three times as many points as he might
potentially win.
Fred
What's the difference? The problem with either is that neither adjusts for
quality of competition. I'm not sure what problem you're trying to solve but
it's not the main one.
Fred
More precisely, it means we have to take a larger sample of
results to allow the randomness to even out over time. FAILING
THAT (as obviously there's a limit to how many games you can get
people to play), we shouldn't take ratings too serious. But I
intuitively suspect a "constant-corrected" ELO system should be
reasonably informative for players that have a few dozen games
under their belt.
Fred
Sure it does reward bigger events, in that 1-(1/50) is a higher
number than 1-(1/10), no?
Not sure how you'd composite this ranking over several events -- I'm
suspicious of just averaging them, but that certainly is simple, at least.
gomi
--
Blood, guts, guns, cuts
Knives, lives, wives, nuns, sluts
OK, sure. Let's not overstate your case here. I don't see the harassment or
mistake-correcting nor manually entering results going away just because a
different system is in place. In short, I agree about the "brave souls" thing
but it appears we still need them. The only thing that's changed is the need
for consecutive entry of tournament results.
Fred
That was precisely what I was envisioning. The only question in my mind is
about the calculation time. If you've got the number crunching power to
push button and spit out provisional results that require computing one or two
dozen tournaments (back to the last hole) without too much trouble, then it
should work just fine. If there's a question about it then you need to do
away with provisional results and just calculate up to first hole. Even so,
you can still certainly enter results as they come in.
Fred
Yes, quite true. It makes a difference whether mighty Casey shows up at
a tournament gunning to kill or just to have fun with his new trick deck.
In theory, a player's rating should in part reflect his proclivity to do
one or other, but that's small calculation to victims who were in Casey's
path the day he decides to turn mercenary.
> On the general topic of "quality of opposition", yes, the ELO
> system is supposed to take that into account. But even there
> it's far from perfect; everyone starts at 3000 for example, so
> even if they're terrible players and play terrible decks they'll
> be worth a fair amount of points for quite a while (and for
> longer than they used to be, with "improved" constants).
Imperfect, true. But "far from perfect" depends on how often this
happens. You're only unrated in your first game (or the first game
of each instance of adopting a new disguise, I guess), after that
the ratings should start to do the right thing. I don't consider
this to be a major problem.
> Ultimately, this game may be luck- and seating-dependent
> enough that no ratings system can possibly tell us all that
> much about who's the best. The "percent of games won" system
> has the advantage that it's easy to calculate and won't (after
> a few games) fluctuate very wildly. It does have the
> disadvantage of being hard to move at all after a while, but
> that can (I think) be easily solved by tracking a "current
> season's GW percentage" as well.
I guess it depends on what you want. As ratings, these clearly
suck. As statistics, they might be interesting to some - though
the lack of context makes them mostly random digits to my eyes.
Fred
I don't agree. Only good stats are worthwhile and only in small quantities.
I like baseball stats, but most are bogus. They only serve to distract from
worthwhile ones.
Besides, it's more work for someone to keep track of a multitude of official
stats.
I dunno. More stats are definitely more interesting. :)
Anyway, waaaaaaaaaay back at the beginning, didn't LSJ (or Robert) mention
something about tiebreakers and the like? I firmly believe that we won't just
be going on GW/games played.
Robert wrote:
"Players will be ranked based on their Game Wins per Game.
Its not much more complex than that. Tie breakers, minimum games played, and
other details can be found by following the link to the Ratings FAQ from the
main ratings page.
http://206.65.59.246/VTES/member.tpl"
I dunno. Works for me.
Xian
Goddamit, Josh!
You totally blew my seekrit-maximize-GenCon-ratings-boost tech! Well, that and
the no more ELO thing.
I was totally going to play Tremere with 40 Save Face, 40 Apportation, and 10
Walk of Flame in every tournament prior to the NA Championships!
Arrrgh! ;)
Xian
maximizin'
It shouldn't be that difficult. None of the ratings calculations are
generally *that* difficult.
> Anyway, waaaaaaaaaay back at the beginning, didn't LSJ (or Robert) mention
> something about tiebreakers and the like? I firmly believe that we won't just
> be going on GW/games played.
First tie-break is VPs. Second tie-break is career games played.
-Robert
Sure. All of the above issues don't go away with the new system. On the other
hand, that's not an exhaustive list of the problems with the old system either.
:)
-Robert
> > (*) no, Mr Rebstock, you're not a french player anymore ;)
>
> Don't agree with that, Pierre is definitely a french player, even on english land =)
Nope, sorry we have 'claimed' him now. He is lost to you for ever and
no, you don't have any say in this Pierre :o)
Rob
If anyone wishes to pursue this (as James has mentioned he might), then
once a working system is unveiled, I'm sure it wouldn't be too much trouble
to have the Archons that WW receives forwarded to the maintainer of the
"improved" rating/ranking system.
I could. However, if people want to implement a different ratings
system anyway, there's probably little point to it all.
I guess this depends on taste; some people like stats in mass
quantities and others don't. :-)
What's a "good" stat seems to be hugely subjective in the realm
of VTES.
> I like baseball stats, but most are bogus. They only serve to distract
> from worthwhile ones.
>
> Besides, it's more work for someone to keep track of a multitude of
> official stats.
True, but given that (apparently) not everyone can be satisfied
by one single stat, and that (as I understand it) it wouldn't be
too much more work to calculate rankings for several (easily-derived-
from-Archon) stats, it seems worthwhile to me.
As far as "the most universally meaningful single stat" goes, I'd
guess that the "tournament-size-adjusted-placement" stat is probably
it. But even that has its issues.
"placement stat" = 1 - (tournament ranking/# of tournament participants)
so:
1st out of 50 = 0.98
1st out of 10 = 0.90
50th out of 50 = 0.00
10th out of 10 = 0.00
49th out of 50 = 0.02
9th out of 10 = 0.10
For multiple events you could average your "placement stats"
together, or do something else more clever, if there is such a
thing.
The 'issues' I see with this stat are:
It is entirely linear, so there's only 0.02 difference between 1st
and 2nd in a 50-player tournament, or 0.10 between 1st and 2nd in
a 10-player.
It only fairly slightly rewards "winning bigger tournaments" vs
"winning small tournaments" (0.98 vs 0.90).
It gives the same result whether the winner of the event "won" by
being first-ranked (potentially tied and resolved by coin-flip)
going into the finals and then timing out in the finals with tied
VPs, or by sweeping three tables to get to the finals and then
sweeping the final round too.
Of course, all of these things are benefits from another point of
view. ;-)
Which is why it seems to me that no one stat can hope to be
universally hailed as "the right one".
Josh
off the scale on the bogometer
Well, I can't speak for all people (ha!), but: at least me and Fred
Scott would love to see an "improved" ELO system. (It might not
sound like it from how much I've talked up the "other" methods of
rating people, but I like the ELO system too.)
Josh
eloriffic
Are there any actual bona-fide games like Chess (i.e. games with a long
fixed (or close to) rules-set and play set, because they don't get
expansions released for them) which are genuinely multi-player?
Whilst Diplomacy's multi-player system sucks in many ways, is there any
long term rating system, based on who went out first, who had fewest
units/locations at the end, or whatever? Or any similar games. Say,
Star Fleet Battles (Mr PDB6, are you there?)?
Ratings systems aren't something I know that much about. Are there any
decent primers anywhere?
Try Pete's Ludicrously Overcomplicated Points System for 2- to 6-player
games (which doesn't consider opponents' skill/ratings levels):
http://www.personal.dundee.ac.uk/~pjclinch/PLOPS.htm
It might have some ideas of use, at any rate.
Or try the "RACE" section of
http://wolff.to/area/calcexp.html
It looks a bit like the the old VEKN system, except with the Elo part
replaced by a simpler pairwise system, but it might work.
For a primer (decent? I don't know), try
http://www.gamasutra.com/features/20000209/kreimeier_pfv.htm
It seems tailored for online deathmatch style games, but certainly
has enough theory explanation built in to be useful.
Bridge, but it probably doesn't count because it's a team
game.
Poker, but I don't know if it has ratings. Its money-betting
system makes it "self scoring" in an admirably simple way
that VTES probably can't emulate. :-)
Hearts; again I don't know if it has ratings. Likewise
spades, rummy.
Monopoly - might have ratings?
Tennis - doubles matches might have some applicability? But
it is still a "two sides" thing.
> Whilst Diplomacy's multi-player system sucks in many ways, is there any
> long term rating system, based on who went out first, who had fewest
> units/locations at the end, or whatever? Or any similar games. Say,
> Star Fleet Battles (Mr PDB6, are you there?)?
Star Fleet Battles is normally only played two-player in
tournament settings, as far as I know. Or two-side (team)
if not, I'd think.
> Ratings systems aren't something I know that much about. Are there any
> decent primers anywhere?
Nor do I...
The ones LSJ pointed to look interesting; I haven't finished
reading them yet though.
Josh
mala fide
But with "improved constants" they'll start doing the right
thing fairly slowly.
Plus, if you look at the VEKN player database lots of the players in
it have (or had) very few rated games under their belts. In the
first twenty results for "USA" (not a random sample, of course),
thirteen of the entries have 0-3 games played. Unknown quantities
galore. :-)
Of course you (and I) might expect that if somebody has never
played a tournament game, they're probably not going to be all
that great at the tournament game. Though I'm not sure what my
point is, here.
> I guess it depends on what you want. As ratings, these clearly
> suck. As statistics, they might be interesting to some - though
> the lack of context makes them mostly random digits to my eyes.
Well, I'm not yet convinced that "GW percentage" does suck as a
rating method. How often you win tournament games seems to me
quite intuitive as a measure of how good a tournament player
you are. Sure, it doesn't give a bonus for winning the final,
supposedly-most-important game, but the mere fact of having
won an additional game does improve the winner's rating.
Josh
player of games
Come on. I cleverly said "Rob Treasure". No one was going to expect
it from *you*. You would have been fine if you hadn't just gone and
spilled it all over the newsgroup. ;-)
Josh
besides, you can still use your "save face" tech to impress
everyone at the week of nightmares with how much your decks
are going to suck...
>Nope. No more Elo. Players will be ranked based on their Game Wins per Game.
>Its not much more complex than that. Tie breakers, minimum games played, and
>other details can be found by following the link to the Ratings FAQ from the
>main ratings page.
If you want to use game wins for rankings, straight percentages are a bad
system since variability is so high with even moderate numbers of games.
The highest rated players will usually not be the best but rather the luckiest
of those who've just crossed the minimun games threshold. Better would
bet a Bayesian approach - assume players start with an expected skill
from some baseline distribution. Modify the distribution as the player
gets games and report the mean of each player's distribution as his rating.
Curt Adams (curt...@aol.com)
"It is better to be wrong than to be vague" - Freeman Dyson
Sure.
> Plus, if you look at the VEKN player database lots of the players in
> it have (or had) very few rated games under their belts. In the
> first twenty results for "USA" (not a random sample, of course),
> thirteen of the entries have 0-3 games played. Unknown quantities
> galore. :-)
Yea, but the people who haven't played many games haven't played
many games. No, that's not just my fingers keystroke-repeating
themselves because my mind has gone out to lunch (for the third
time today, before 11 AM no less). I mean the influence of such
things depends on how often it happens and I expect if you multiply
people by games, you discover that those how have played a dozen
games or more (arbitrary number plucked from mid-air) influence
the rankings a lot more often than those with less games. Obviously,
you can wipe the players with 0 games right off the list, as they
haven't done a thing to the rating system.
> > I guess it depends on what you want. As ratings, these clearly
> > suck. As statistics, they might be interesting to some - though
> > the lack of context makes them mostly random digits to my eyes.
>
> Well, I'm not yet convinced that "GW percentage" does suck as a
> rating method. How often you win tournament games seems to me
> quite intuitive as a measure of how good a tournament player
> you are. Sure, it doesn't give a bonus for winning the final,
> supposedly-most-important game, but the mere fact of having
> won an additional game does improve the winner's rating.
Winning games is always good. But without the adjustment for
skill, there's no saying how good. My experience is that some
games are just a heckuva lot easier to win than other, merely
due to the competition. Certain players will undoubtedly eternally
play relatively mediocre competition. Perhaps through no fault of
their own, but it still affects their winning percentage.
Fred
>Quote:
>> To produce the desired set of two-player paired outcome probabilities from
>> multiplayer games, I calculate a performance for each player of
10^(Skill/D).
>> The chance of each player winning the game is their performance divided by
the
>> sum of performances of all players in the game.
>So the statistical results "showing" that the 4:400 formula more closely
>follows the skills of the players is actually just showing that it more
>closely follows the above-cited method of analysing the skills of the
>players.
Actually in 2-player games my system produces identical outcome probabilities
to the ELO system. It just generalizes to multiplayer systems.
I did make a mistake in the stuff I posted. I assumed 20 *games* per player
on average. Actually it should have been 20 *seatings* per player. The
optimal S/K ratio should be about 9:400, not 4:400. 4:400 is optimal for
player who have played in 100 games on average. 32 is still way too high,
though.
Right.
And it generalizes to multiplayer in a manner different than the manner
in which the VEKN rules generalized to multiplayer, as near as I can tell
by the oft-quoted quote above.
>I'm sure it's doable. I think the problem is not having anyone
>to do it. If there were a volunteer to code something that could
>extract data from Archon spreadsheets and automatically calculate
>ratings like this, I imagine the VEKN would be more than happy to
>have it.
Long-term, the best thing to do is to keep the raw data as much
as possible. People can go back and reanalyzed as they see fit
later that way. In 10 years we'll probably have AI that can
read handwriting so you could just dump file drawers of
tournament reports into a scanner, so even hardcopy is useful.
I'm not sure if I'm "getting" the difference between "seatings"
and "games". But I think you might be saying: you ran 600 games
for 30 players thinking "20 games per player". But since you
were running 5-player games you actually selected players for
the games 3000 times = 100 games per player. Is that right? If
so I guess I see the problem - you don't want to assume that
people will play in 100 games each to reach a somewhat-appropriate
rating since historically only a few people have played that many
rated games. :-)
Josh
too many games
not enough players
in fact while we're at it, could your system be rerun with,
say, 300 players and 1200 games? might be interesting...
Wow, that last one is *fascinating.* Plus, it discusses the
meanings of "T" and "max_gain" (our S and K)!
I highly recommend it to anyone interested in ratings methods,
or specifically in Elo-like methods (what it's focused on, as
its premise seems to be that Elo-like methods are the "best"
ratings for its purposes).
Josh
impressed
>> I guess it depends on what you want. As ratings, these clearly
>> suck. As statistics, they might be interesting to some - though
>> the lack of context makes them mostly random digits to my eyes.
>
>Well, I'm not yet convinced that "GW percentage" does suck as a
>rating method. How often you win tournament games seems to me
>quite intuitive as a measure of how good a tournament player
>you are. Sure, it doesn't give a bonus for winning the final,
>supposedly-most-important game, but the mere fact of having
>won an additional game does improve the winner's rating.
>
I don't know if the Powers That Be of V:EKN can or should agree to
this, but I for one would be VERY interested in more robust tournament
play statistics of various sorts. The easiest way to implement this,
it seems to me, would be to have a volunteer(s) archive Archon reports
and make the results available for database queries through a web
site. Or just post 4 or five different sets of stats on a regular
basis...
For example, I would be interested in knowing
(1) How often a given player makes the wins/makes final/scores
GW/scores VP in a tournament
(2) How often a given CLAN (defined as 50% of crypt) wins/makes the
finals/scores GW/scores VP
(3) The average size of the field in tournaments in a given
region/time period
(4) Which players appear in tournaments in the most different venues
The perfect situation for me would be "Josh is a .500 player when
facing opponents who seeded higher than him in a final, but wins .68
when he is top seed. He has never come less than 3rd when scoring
more than 0 VP in a final." is an achievable result.
I would be willing to archive the results, and even to do some work on
designing a database structure, but implementation of any kind of
interface is beyond my pitiful skills.
Mark Woodhouse
Prince Of Minneapolis
(aka Data Geek)
>Right.
>And it generalizes to multiplayer in a manner different than the manner
>in which the VEKN rules generalized to multiplayer, as near as I can tell
>by the oft-quoted quote above.
No; each 2-player matchup within an n-player game produces the
expected outcome probabilities.
>I'm not sure if I'm "getting" the difference between "seatings"
>and "games". But I think you might be saying: you ran 600 games
>for 30 players thinking "20 games per player". But since you
>were running 5-player games you actually selected players for
>the games 3000 times = 100 games per player. Is that right? If
>so I guess I see the problem - you don't want to assume that
>people will play in 100 games each to reach a somewhat-appropriate
>rating since historically only a few people have played that many
>rated games. :-)
Ding! Right on the money, Josh.
One odd thing came up in the sims, though. If the rating movements
are "too small" then the ratings aren't accurate. Oddly, though,
the rankings actually *improve* a little bit! Any movement was
adquate for ranking purposes, down to 0.2. I expect this is only
due to my assumption that each players is expected to play
the same number of games. If an ELO-like rating system is
reimplimented I'll redo the sims with differing play frequencies
and look for an optimum. Still, it's interesting - the cost of
undershoot is less than one would expect.
Oh. The oft-quoted part indicates otherwise:
[BEGIN OFT_QUOTED TEXT]
> The chance of each player winning the game is their performance divided by
> the sum of performances of all players in the game.
[END]
So your system didn't round ratings to the nearest integer?
That (obviously) will help improve performance.
D'oh! :)
> besides, you can still use your "save face" tech to impress
> everyone at the week of nightmares with how much your decks
> are going to suck...
Boy, are you going to laugh when I actually do. :)
Xian
>> No; each 2-player matchup within an n-player game produces the
>> expected outcome probabilities.
>Oh. The oft-quoted part indicates otherwise:
>[BEGIN OFT_QUOTED TEXT]
>> The chance of each player winning the game is their performance divided by
>> the sum of performances of all players in the game.
>[END]
The ELO system only specifies the outcome of each two-player match.
It actually has no explicit system for determining the "winner" of a
game and theoretically would permit impossibilities (e.g A beats B
beats C beats D in a 4-player game). The way I did it (or an equivalent)
is the only way you can get the numbers right if you assume games
generate ordinal outcomes.
Technically, I suppose you're right; the ELO system does permit
non-transitive outcomes and mine doesn't. But since V:tES doesn't
permit intransitive outcomes, is that a problem?
>So your system didn't round ratings to the nearest integer?
>That (obviously) will help improve performance.
Yes, but the 32-shift system I was comparing it to was also
done without rounding. The improvement is relatively small,
though, as the 32-shift puts in so much noise the additional
rounding error is minimal. Obviously rounding on a 4-shift
is a big problem but that's not relevant - the computations
are simple enough, and if somebody has a conniption about
invisible rounding you can always go the (equivalent) other
way and increase the range to 1400 or so.
I don't know. It's just that, by the VEKN system, the chance of each player
winning the game is *not* their performance divided by the sum of the
performances of all other players in the game, in general, TTBOMK.
I'm not 100% sure that my approach here is correct, and I'm pretty
sure it's incomplete. But, suppose you have a 5-player game under
"the old system".
A: rating 3400
B: rating 3200
C: rating 3000
D: rating 2800
E: rating 2600
Expected chance of each player beating all other players,
according to the "chance = 1/(10^(diff/400))+1" formula:
A: 0.75975 * 0.90909 * 0.96934 * 0.99009 = 0.66283
B: 0.24025 * 0.75975 * 0.90909 * 0.96934 = 0.16085
C: 0.09091 * 0.24025 * 0.75975 * 0.90909 = 0.01509
D: 0.03065 * 0.09091 * 0.24025 * 0.75975 = 0.00051
E: 0.00990 * 0.03065 * 0.09091 * 0.24025 = 0.00001 (rounded)
Adding up these probabilities, I only get a 0.83929
or 83.93% chance of any player beating all other players.
I don't know if this means that the formula expects there to
be a tie for first the other 16.07% of the time, or if it
just isn't feasible to use that formula this way. (The former
seems somewhat unlikely given how much more likely it is for
player A to win than for anyone else to win.)
Looking at it from the perspective of a single player seems
to fit the formula better:
chance for various outcomes for A:
A > BCDE : 0.66283
B > A > CDE : 0.20961
C > A > BDE : 0.06629
D > A > BCE : 0.02096
E > A > BCD : 0.00663
BC > A > DE : 0.02096
BD > A > CE : 0.00663
CD > A > BE : 0.00209
total for these: 0.99600 or 99.6%, IOW this probably covers
all the reasonably-likely possible outcomes for A (according
to the formula).
These probabilities might be usable for Curt's program, but
I don't know if they would be adequate or not? They don't
specify, for example, who beats whom among the non-A players.
Josh
unsure
Aha, good point there, I hadn't thought of that.
Though, if you have a lot more 3-game-players than 50-game-players
(and I expect you do), and they generally lose points in those
three games, they could contribute "ratings creep" to the people
who do play a lot - ie there will be "more points to go around"
among the "play in lots of tournaments" crowd since those people
will have picked up extra points from the ones who only ever
played in one tournament and lost points in those.
I'm not sure if that issue is really related, but it is another
"flaw" (probably unavoidable, though correctable-for with the
"ratings decay" idea) in the old Elo system.
(re ratings decay - as described in the article LSJ mentioned at
http://www.gamasutra.com/features/20000209/kreimeier_pfv.htm,
if you're concerned about maintaining the "zero-sum nature" of
an Elo-type system, you can/should "decay" ratings toward the
starting point over time (not sure if they were saying you should
do this for everybody or just for people who haven't played a
rated game within the last X days/weeks/months). So, you assume
that a high-rated player who hasn't played for a while isn't
actually as good as his rating, and a low-rated player who hasn't
played for a while isn't as bad as his rating - that is, since
our information about them is old and therefore potentially out
of date, we assume that they have become closer to "an average
player" since then.)
[re GW percentage scoring]
> Winning games is always good. But without the adjustment for
> skill, there's no saying how good. My experience is that some
> games are just a heckuva lot easier to win than other, merely
> due to the competition. Certain players will undoubtedly eternally
> play relatively mediocre competition. Perhaps through no fault of
> their own, but it still affects their winning percentage.
This is true, and that article was pretty convincing (to me)
that an Elo-type system is a really good way to rate people
compared to less-"handicap"-based scoring systems.
Josh
never knew Elo was a person
I'm not seeing where you account for the possibilities
of passing Kindred Restructures and Dramatic Upheavals
mid-game. Did I miss the "political deck coefficient"?
:)
Fred
[re having more analysis-of-tournament-data available]
> The perfect situation for me would be "Josh is a .500 player when
> facing opponents who seeded higher than him in a final, but wins .68
> when he is top seed. He has never come less than 3rd when scoring
> more than 0 VP in a final." is an achievable result.
I agree. That would rock. Especially for me to be winning
that much. :-)
> I would be willing to archive the results, and even to do some work on
> designing a database structure, but implementation of any kind of
> interface is beyond my pitiful skills.
It's probably feasible, the "forward Archons to somebody who
wants to maintain a comprehensive database" part. I don't know
who'd be up for doing interface, though.
Josh
not a coder
>I don't know. It's just that, by the VEKN system, the chance of each player
>winning the game is *not* their performance divided by the sum of the
>performances of all other players in the game, in general, TTBOMK.
It's just a method for generating the VEKN probabilities. Just like
using a pseudorandom number generator to make a die roll.
>Adding up these probabilities, I only get a 0.83929
>or 83.93% chance of any player beating all other players.
>I don't know if this means that the formula expects there to
>be a tie for first the other 16.07% of the time, or if it
>just isn't feasible to use that formula this way. (The former
>seems somewhat unlikely given how much more likely it is for
>player A to win than for anyone else to win.)
You're assuming that every possible player-player interaction
is independent. The missing 16.07% is the intransitive
outcomes (e.g.
Yes, but it doesn't, TTBOMK, generate (match) the VEKN probabilities. It
generates some other, possibly similar, probabilities.
I don't really understand what your concern is, here. I realize that there's
a lot of different capacities and aptitudes people have: deckbuilding, card
playing, diplomacy, and so forth. Things will cause players to do better in
certain situations relative to how well they do in others, such as being
better at (or having decks build better for) winning the two-man game at the
end vs. surviving the initial oustings to get into that position. Ultimately,
I'm not sure why the rating system has to account for all these. There's a
very simple thing that the VEKN Elo system tests: on average, will player A
do better than, worse than, or as well as player B in terms of victory points.
That's all it has to do and it's not a test that really cares which aptitudes
cause which result. As long as the prediction is basically accurate over time,
that's fine.
Is there something about the nature of Curt's model that you fear is skewing
his results enough to make his prediction of the optimal constants
substantially off? If so, how?
Fred
Yes.
As stated, his estimation for the chance of a given player winning is
not the same as what the VEKN Elo system computes as that player's
chances.
To use the psuedo-random number for a die roll as an exagerrated
example, it would be like using a pseudo-random integer from 1 to
10 and returning
1 for 1
2 for 2
3 for 4 and 5
4 for 6 and 7
5 for 8 and 9
6 for 10
>Adding up these probabilities, I only get a 0.83929
>or 83.93% chance of any player beating all other players.
>I don't know if this means that the formula expects there to
>be a tie for first the other 16.07% of the time, or if it
>just isn't feasible to use that formula this way. (The former
>seems somewhat unlikely given how much more likely it is for
>player A to win than for anyone else to win.)
You're assuming that every possible player-player interaction
is independent. The missing 16.07% is the intransitive
outcomes (e.g. A beats B beats C beats A) that can't happen
in a V:tES game.
The ELO definition is designed for 2-player and is indeterminate
when applied to multiplayer, since hypothetically there can
be correlations amongst player outcomes (if A beats B and
B beats C, A must beat C). If you assume players must
come out in a transitive ranking but there are no other correlations
(e.g. A beating B doesn't change C's chance against D) then
you'll come out with something isomorphic to my system.
I haven't proven it, but it seems pretty obvious.
(Sorry about the abortive post. #$%&*$# AOL)
>Yes, but it doesn't, TTBOMK, generate (match) the VEKN probabilities. It
>generates some other, possibly similar, probabilities.
It *does* generate identical probabilities. Calculate the chance
of A beating B using my system and it's identical to the VEKN system.
That what's the business about summing them up and dividing by the
number of players? (see the oft-quoted text).
I think what Curt's saying is: the VEKN Elo-based scoring
system doesn't have any inherent method for calculating the
chance of winning the game for each player in a multiplayer
game. It only has a formula for determining the expected
chance of one player beating one other player.
If you use Curt's "performance" stat to calculate the chance
of one player beating one other player - ie look at the oft-
quoted text and assume a two-player game - it's equivalent
to the VEKN expected-chance formula.
But as he explained in response to the message where I
tried to calculate "chances of beating all other players"
using the VEKN formula, there are about 16% of outcomes
"unaccounted for" after calculating "chance for A to beat
D-E, chance for B to beat A&C-E, etc". This is, he proposes,
because the VEKN formula assumes that it's possible for A
to beat C and C to beat B and B to beat A.
It seems plausible to me, given that the probabilities
I was getting for "A beats all, B beats all, etc" weren't
going to reach 1 even when adding "A ties with B, A ties
with C, etc". And since the VEKN system has no way to tell
us anything about the results within non-A players if we
just look at A's chances against each other player, we have
to (I think) use some other system to interpret what their
"skills" mean in terms of chances for each outcome.
So: I'm not 100% sure that Curt's method of generalizing to
multiplayer is the best way to match the VEKN expected-win
formulas. But it seems very plausible, given that it does
match for the 2-player case and that as far as I can see
there is no way to use the exact VEKN formulas to produce
a result.
Josh
might hack more numbers later
It does once you define what "winning the game" you're talking about.
That's why I'm asking for clarification.
What "chance of winning the game" (what "winning" is being computed)?
Simply summing and dividing will not produce the same
chances as are actually calculated by the VEKN system.
If one begins with the assumption that it cannot be done
meaningfully, then why do it?
I'm confused. How are you saying the VEKN system can compute
a chance for "winning the game"? Can you supply a formula?
As far as I can tell, the VEKN system *doesn't* calculate a
chance for "winning the game" (= "scoring the highest number
of VPs in the game, beating all other players").
> Simply summing and dividing will not produce the same
> chances as are actually calculated by the VEKN system.
I can't see where the VEKN system calculates chances for
"winning the game". It only calculates pairwise chances for
beating players individually.
> If one begins with the assumption that it cannot be done
> meaningfully, then why do it?
It seems to me still meaningful to model chances of winning,
based on what the ratings imply about skill, even if the
existing system doesn't provide us enough guidance to use
only it as the basis for the model. No?
Josh
modeling clay
If you mean, "What chance does a given player have of receiving more
victory points than any other player in a game of Jyhad, I guess *I'm*
confused why this is relevant. (At least why it's directly relevant.)
The ratings formula says nothing about predicting the outcome in that
manner. It only rates players' chance of finishing with more victory
points than one another, not what their chances are of finishing with
more points than any other player.
Fred