Hi everyone. This topic was discussed a lot 10 years ago.
I wonder. Does anyone here play against Jellyfish as well as either GNUBG or Snowie? What is your experience? Do you feel Snowie and GNU play fair while Jellyfish cheats? Lets have a vote.
I am convinced Jellyfish cheats. I have played 100s of games with automatic dice and 100s with manual dice. My record is 40% wins with automatic dice but a hugely superior result of 55% wins with manual dice (I play the free version of Jellyfish and I play against level 5).
2. A little meta discussion may be in order since (while you didn't read the jellyfish documentation) you actually did provide some data to base your suspicion on:
> I have played 100s of games with > automatic dice and 100s with manual dice. > My record is 40% wins with > automatic dice but a hugely superior result of 55% wins with manual > dice (I play the free version of Jellyfish and I play against level > 5).
That's not an exact quantitative analysis, and we also don't know how diligently you performed the test (e.g. did really each and every game go into the analysis?). But still, it would probably make me suspicious too -- after all, it's hundreds of games! So let's assume the data is valid and somebody did the maths and the result is within a 95% confidence interval for the assumption that jellyfish cheats. Would I be convinced? No. Why not? Because the 95% is only valid for a phenomenon *we do not know anything else about*. But -- I do know how software is being produced -- there is no conceivable incentive for a programmer to implement a cheat -- many other people checked and didn't find evidence -- many other people (not many really good players) *thought* that JellyFish cheats but were unable to provide evidence -- the psychological mechanisms that make people feel they are cheated are pretty obvious -- the cheat would probably have leaked by now (how long ago was JellyFish 3.5 published?).
Did Americans set foot on the moon? I strongly believe yes. `The psychological mechanisms leading to believing they didn't are obvious, it would have leaked by now, many scientists, engineers and astronauts wouldn't have played that game but blown the whistle instead. America *did* have a strong incentive to cheat, but still I don't believe it. In two words, the idea just doesn't fit the picture. Even if somebody presented pretty strong evidence I'd still look to disprove that evidence.
Of course that's the attitude of many people back when Einstein presented his relativity theory. Then and when, every 500 years, the big picture changes, and that's because a tiny little detail (e.g. constant speed of light) can't be made to fit, not at any cost. But these events are rare. To speak in terms of medical diagnosis: the frequent is frequent, and the rare is rare.
On Mar 2, 8:06 am, Jay Allin <jaymari...@gmail.com> wrote:
> Hi everyone. This topic was discussed a lot 10 years ago.
> I wonder. Does anyone here play against Jellyfish as well as either > GNUBG or Snowie? > What is your experience? Do you feel Snowie and GNU play fair while > Jellyfish cheats? > Lets have a vote.
> I am convinced Jellyfish cheats. I have played 100s of games with > automatic dice and 100s with manual dice. My record is 40% wins with > automatic dice but a hugely superior result of 55% wins with manual > dice (I play the free version of Jellyfish and I play against level > 5).
One possible reason for this disparity is that your reasons for rerolling manual dice are not inflexible enough.
Maybe it sometimes happens that you're not sure if the dice are cocked or not. Then it would be natural (with no human opponent to negotiate with you) to reroll if the dice are bad for you, and leave the roll standing otherwise.
Even if only 0.5% of rolls are in this uncertain state, that could certainly explain your disparity in results.
In article <a5cd6977-c559-4bfe-b2ce-5aa692bf1...@l24g2000prh.googlegroups.com>, Jay Allin <jaymari...@gmail.com> wrote:
>Hi everyone. This topic was discussed a lot 10 years ago.
>I wonder. Does anyone here play against Jellyfish as well as either >GNUBG or Snowie? >What is your experience? Do you feel Snowie and GNU play fair while >Jellyfish cheats? >Lets have a vote.
>I am convinced Jellyfish cheats. I have played 100s of games with >automatic dice and 100s with manual dice. My record is 40% wins with >automatic dice but a hugely superior result of 55% wins with manual >dice (I play the free version of Jellyfish and I play against level >5).
I think all this shows is that YOU cheat.
Kees (God Eet tofu, want you, lj pronounced polisie in Amsterdamn op 76, nu pijnlijk overkomen als mietjesgedrag spoort al FANMEEL over op etiquette verzoekt de kleding van Beton, of stereotyping entire paragraph is alleged cheating, can't tell any aces and bannans!)
In article <a5cd6977-c559-4bfe-b2ce-5aa692bf1...@l24g2000prh.googlegroups.com>, Jay Allin <jaymari...@gmail.com> wrote:
>I am convinced Jellyfish cheats. I have played 100s of games with >automatic dice and 100s with manual dice. My record is 40% wins with >automatic dice but a hugely superior result of 55% wins with manual >dice (I play the free version of Jellyfish and I play against level 5).
One reason your results cannot be trusted is that you have not "blinded" yourself to whether manual dice are being used. For example, perhaps you play more confidently when you use manual dice, and allow yourself to get upset with automatic dice because you are convinced that the program cheats. This would explain the disparity in your results without proving that the program cheats.
If you believe that the program cheats by, for example, giving itself doubles more often, then you can test the hypothesis more rigorously by having an assistant play games against Jellyfish and record the automatic dice. The same assistant can also generate sequences of manual dice rolls. The assistant can then give you both sets of dice without giving any hint as to which was automatically generated. You could then enter both sets of dice manually and compare the results.
This won't, however, test whether Jellyfish cheats by giving itself rolls that are favorable *depending on the position*. I'm not sure offhand how to design a rigorous test of this hypothesis (assuming you're not willing to go to the effort of reverse engineering the object code to figure out how the algorithm works). -- Tim Chow tchow-at-alum-dot-mit-dot-edu The range of our projectiles---even ... the artillery---however great, will never exceed four of those miles of which as many thousand separate us from the center of the earth. ---Galileo, Dialogues Concerning Two New Sciences
Jay Allin wrote: > Hi everyone. This topic was discussed a lot 10 years ago.
> I wonder. Does anyone here play against Jellyfish as well as either > GNUBG or Snowie? > What is your experience? Do you feel Snowie and GNU play fair while > Jellyfish cheats? > Lets have a vote.
> I am convinced Jellyfish cheats. I have played 100s of games with > automatic dice and 100s with manual dice. My record is 40% wins with > automatic dice but a hugely superior result of 55% wins with manual > dice
The easiest way to determine if a bot cheats with the dice is to set up two computers with the bot software on each. Have the bot play itself with one "player" controlling the dice and the other using manual dice. If the instance with dice control wins a statistically significant percentage of games/matches you know that it manipulates the dice.
This is an easy experiment to do. Get back to us if you have any interesting results to report.
In article <z4kjn.927$Vq1....@en-nntp-03.dc1.easynews.com>,
Walt <walt_ask...@yahoo.com> wrote: >The easiest way to determine if a bot cheats with the dice is to set up >two computers with the bot software on each. Have the bot play itself >with one "player" controlling the dice and the other using manual dice. > If the instance with dice control wins a statistically significant >percentage of games/matches you know that it manipulates the dice.
Very nice! Wish I'd thought of it. -- Tim Chow tchow-at-alum-dot-mit-dot-edu The range of our projectiles---even ... the artillery---however great, will never exceed four of those miles of which as many thousand separate us from the center of the earth. ---Galileo, Dialogues Concerning Two New Sciences
Thanks Peter. About the seeded pseudo-random number generator .. I agree with you. Jellyfish cannot interfere with the dice. As you say, the numbers are fixed by the mathematical formula.
So I agree .. Jellyfish does not cheat by manipulating the dice. No. Jellyfish seems to be cheating by looking ahead one or two rolls. That's its method.
Why do I suspect this? Unless you know what dice rolls are coming, why would double with only a 60% chance of winning? You don't want to put the cube in the opponent's hands for just 10%. Sometimes Jellyfish offers the cube too optimistically. Ie, its chances are not high enough. I suspect Jellyfish has looked ahead and seen the next two dice rolls. If they are good, Jellyfish will double. So .. In those case, if I take the cube, I switch immediately to manual dice (just to be safe).
Sometimes I use a "dice rolls file" instead of the generator. When Jellyfish makes a doubling play that is suspiciously optimistic, I switch to manual, play it through .. and then I check the file afterwards to see what the next two rolls WOULD have been. And too many times the next two rolls are suspiciously strong in favor of Jellyfish.
> That's not an exact quantitative analysis, and we also don't know how > diligently you performed the test (e.g. did really each and every game go
True, there is no way to prove cheating conclusively. Even your friend Einstein did not have proof 100% about his theory of General Relativity .. It was only proven completely some decades later when new measurement techniques allowed the phenomena to be observed :-)
Unfortunately absolute proof is not possible. Only anecdotal evidence is possible and "home-made statistics" (unless the writer of Jellyfish releases the program code publicly then we will know for sure). But just because we cannot PROVE it, doesn't mean it isn't so. Yes, there is no PROOF 100% of cheating. But, there is no proof 100% that Jellyfish is always fair either :-)
It all boils down to this question: Do we trust the programmer is playing fair ?? I know you trust him (or her). But see my answer below to your statement about there being "no conceivable incentives to cheat"
In the absence of perfect statistics, let me offer you this logic: ASSUMPTIONS (a) Computer_A (A for short) has a superior neural net / and plays better than Computer_B (B for short) (b) Both A and B (both having internal dice generators) play fair (assumption is that the programmer has no incentive to cheat). Therefore A SHOULD get better results than B OBSERVATIONS (a) But it is observed that against human opponents, A and B get the same results.
Therefore one of our assumptions must be faulty.
Now we know from people who have done large manual tests comparing GNUBG, Snowie and Jellyfish, that GNU and Snowie are very close (50.5% to 49.5%), but Jellyfish is far behind (45% against GNUBG and just 43% against Snowie). Therefore assumption "a" seems correct (that Computer_A is stronger). Therefore the other assumption must be the faulty one -- about cheating. Ie, Computer_B must cheat with the dice :-)
While I applaud your scientific approach (wanting solid statistics to back a claim), the above approach is also valid.
If you have ever spent time watching Jellyfish play, it is visibly inferior to GNUBG. It gets into trouble much more often and plays quite a bit more wildly than GNUBG. Is this possible? They are both neural net programs?? Yes, it is possible. The neural net is only as strong as the players it has learnt from. I believe Jellyfish was trained by overly aggressive games. Therefore the AI "thinking" is slightly wrong. What is a programmer to do when his baby which he hoped would be world no.1 is actually only winning 43% games against Snowie?? Why not a little bit of cheating to balance the books :-)
> -- many other people (not many really good players) *thought* that > JellyFish cheats but were unable to provide evidence > -- the psychological mechanisms that make people feel they are cheated are > pretty obvious
Once again we agree. Weaker players always complain about their luck. But I am quite a strong player. I have never complained about luck against humans. Also I don't complain about luck against GNUBG !! Even though GNUBG beats me just as much as Jellyfish does. I fully understand that luck can sometimes be extreme. A one in 36 event will happen once in a game. A one in 1000 event will happen once in 27 games. Beginners think it is strange, I agree with you there.
But I am not looking at the gross luck that Jellyfish is getting. I am recognising a commenting on a PATTERN of luck, not an amount of luck.
Note: I get similar results against GNUBG and Jellyfish. Therefore you might imagine I rate them as equally lucky ?? No .. GNUBG seems to get its result fairly. Jellyfish gets into trouble much more regularly than GNUBG, but mysteriously seems to sneak out of trouble with great rolls just after the Cube comes into play. Its a pattern of luck dear Juggler. Very hard to analyse statistically, but perfect for the human mind to identify since the brain is fabulously designed to recognise patterns.
> -- there is no conceivable incentive for a programmer to implement a cheat
Ah, there we disagree. I am glad of this as so far we have been agreeing too much :-) There is every incentive to cheat. (a) If you write a neural net AI backgammon player -- a massive project -- and it comes up short and loses to Snowie and GNUBG, how can you sell it for a lot of money? Maybe cheat a tiny bit to improve the statistics. A tiny bit of cheating on just a few important rolls .. nobody could prove that :-) (b) If you believe most players accept the cube too optimistically, and you want to train people to be more cautious before accepting a double .. why not cheat a little. If beginners are bitten a few times, then they might learn not to accept the cube so eagerly :-)
Its been nice to get your comments "Peter the juggler". You offer solid statistical arguments. But I am not convinced :-)
> This won't, however, test whether Jellyfish cheats by giving itself rolls > that are favorable *depending on the position*. I'm not sure offhand how > to design a rigorous test of this hypothesis
Just repeating JF's web site -- like with any other pseud random number generator the sequence of numbers from JellyFish's pseudo random number generator is defined beforehand, given the seed (which JF's web site suggests to jot down when in doubt about cheating). That obviously precludes rolling to one's favour. This discussion ends here.
It would make it possible, however, to adapt one's play in order to exploit the knowledge of future rolls. Such play would (again paraphrasing http://www.jellyfish-backgammon.com/dice.htm) be manifest in the game log by "officially" suboptimal moves which just happened to fit very nicely with the rolls to come. Alas, such moves do not occur: JellyFish always plays what it thinks is optimal considering all possible rolls.
see my other post to see why it is impossible that JellyFish cheats by taloring its play to future rolls. It's plainly impossible. (I had not read your new post before I wrote my other post).
> What is a programmer to do when his baby which he > hoped would be world no.1 is actually only winning 43% games against > Snowie?? Why not a little bit of cheating to balance the books :-)
JF was a huge success. It was stunning. It *was* no. 1. It was so good at its time that it changed the way humans play backgammon. Snowie and gnubg came (much) later, so for a while JF didn't have serious competition. Fredrik Dahl, JF's creator, has all reason to be proud of his baby, and I don't think he inserted a cheat function after it turned out that Snowie plays a little better.
> Also I don't complain about luck against GNUBG !! > Even though GNUBG beats me just as much as Jellyfish does.
Oh. Maybe then you are underestimating the strength of JF. Form a human perspective I believe they are about equally strong (i.e. the skill difference between most human players and JF is much larger than that between JF and gnubg).
> Its a pattern of luck dear Juggler. Very hard to analyse > statistically, but perfect for the human mind to identify since the > brain is fabulously designed to recognise patterns.
> So I agree .. Jellyfish does not cheat by manipulating the dice. No. > Jellyfish seems to be cheating by looking ahead one or two rolls. > That's its method.
> Why do I suspect this? Unless you know what dice rolls are coming, > why would double with only a 60% chance of winning?
Edit the position in Gnubg an make a rollout. If Gnubg does differ from the Jellyfish doubling decision significantly and evaluates the double as bad ore very bad I would get suspicious.
If you like to know more about doubling strategy in matches read this:
> One reason your results cannot be trusted is that you have not "blinded" > yourself to whether manual dice are being used. For example, perhaps you > play more confidently when you use manual dice, and allow yourself to get
Nice observation :-) That is very possible I agree. I hope to do some other tests to get around that problem.
> If you believe that the program cheats by, for example, giving itself > doubles more often, then you can test the hypothesis more rigorously by > having an assistant play games against Jellyfish and record the automatic > dice. The same assistant can also generate sequences of manual dice rolls. > The assistant can then give you both sets of dice without giving any hint > as to which was automatically generated. You could then enter both sets > of dice manually and compare the results.
> This won't, however, test whether Jellyfish cheats by giving itself rolls > that are favorable *depending on the position*. I'm not sure offhand how > to design a rigorous test of this hypothesis (assuming you're not willing > to go to the effort of reverse engineering the object code to figure out > how the algorithm works).
You can test this with a 1-ply analysis. That will analyse all the outcomes one move ahead. Choose the best move for each dice roll. Then order the dice from best to worst (assign numbers 1 to 36). Then you get a statistic of how good Jellyfish's dice are for each situation. With average luck, you should see 18.5 as the average result.
Or even better, you can use the "cubeless equities" value. Why? Because sometimes the best dice is FAR ahead of the second best. Luck is measured better this way, so you can catch Jellyfish out even if it is only cheating on one or two very important rolls (which I think it is)
On Mar 3, 5:05 pm, "Peter Schneider" <schneiderp_REMOVET...@gmx.net> wrote:
> ... the sequence of numbers from JellyFish's pseudo random number > generator is defined beforehand, given the seed ... > It would make it possible, however, to adapt one's play in order to exploit > the knowledge of future rolls. Such play would (again paraphrasing > http://www.jellyfish-backgammon.com/dice.htm) be manifest in the game log > by "officially" suboptimal moves which just happened to fit very nicely > with the rolls to come. Alas, such moves do not occur: JellyFish always > plays what it thinks is optimal considering all possible rolls.
I play "free Jellyfish", so I cannot see the log. Does the log show the "cubeless equities" of each move? This will only show suboptimal MOVES .. not suboptimal offering (or over- optimistic offering) of the doubling cube. I am saying Jellyfish appears to cheat only in doubling. It doubles too early sometimes (that is suboptimal but will not appear in the log), but it only doubles when some great dice rolls are waiting.
The way to test is not to look in the Jellyfish log. The way to test is to take the game, transfer to GNUBG, and see if GNU offers the cube in the same situations.
... Juggler ... I too see the appeal of Occam's razor. "Most layers are beginners. Most players wrongly complain about luck. Therefore Jellyfish does not cheat, beginner players are only imagining it."
Then we go ahead and squish all arguments in to fit that conclusion. Until we do a test as suggested by "Walt", nothing can be definitely decided. Certainly a discussion entitled "I do not cheat" on the Jellyfish website, cannot provide the complete answer :-)
> I am saying Jellyfish appears to cheat only in doubling. It doubles > too early sometimes (that is suboptimal but will not appear in the > log), but it only doubles when some great dice rolls are waiting.
So I think we agree that the sequence of rolls is predefined by the seed; but you think that JF uses this knowledge by giving cubes that are no cubes because it knows that great rolls lie ahead. If you cannot enter a position into JF and check it (because you are using the player version) I can see no direct way to disprove that suspicion. If you could check a position with a different seed or manual dice you would see that JF indeed thought it's a legitimate double, and that JF would always double in that position, no matter what the seed is and hence what rolls lie ahead. Of course you can check the cubes in question with gnubg but there is no gurantee that the two programs agree.
> Certainly a discussion entitled "I do not cheat" on the Jellyfish > website, cannot provide the complete answer :-)
It's not a discussion, it's a detailed recipe for proving that you are wrong, written by Bill Robertie. It's just that you cannot do it because you are using the crippled freeware version.
Btw, while Bill Robertie provides a way to *prove* that JF is not cheating, Walt's recipe is *not* a perfect proof. Even if you gained a statistical 95% confidence that JF is cheating I would still contest it -- 1 out of 20 trials you would reach that result although JF is not cheating, even if there is no bias whatsoever in the trial (and people here suggested a few valid ideas how bias could be unintentionally introduced). That's roughly the probability of rolling 16 from the bar and hitting my slotted bar pt checker. Happens all the time. Statistics are never proofs of anything -- they are indications, if done properly, which is not always easy.
Peter Schneider wrote: > Walt's recipe is *not* a perfect proof. Even if you gained a statistical > 95% confidence that JF is cheating I would still contest it -- 1 out of 20 > trials you would reach that result although JF is not cheating, even if > there is no bias whatsoever in the trial
Yep, it is possible that you could get unlucky and run a trial that hit the one-in-twenty possibility. Such a result would imply that it cheats *slightly*, perhaps winning an extra one or two percent of games when it has the dice. But the people who claim that it cheats are not whining about a one or two percent edge - they're claiming a 15% or more disparity (see the original post in this thread where he claims 55% vs 40% winning rate disparity).
If the hypothesis is "jf cheats" then you just might get a trial that confirms this hypothesis - it's a one-in-twenty chance as you say. But if your hypothesis is "jf cheats to give it an advantage greater than 10%" (or some other reasonable percentage) you're not going to get anywhere close to confirming that result.
So, while my recipe might fail to demonstrate that "jellyfish doesn't cheat at all", it will clearly refute the wild claims that it cheats enough to matter. If the typical "it cheats" claims were true, it would be readily apparent after only a few trials of my test.
In article <b7a44d52-4edd-46f9-ba41-053b942f9...@s36g2000prf.googlegroups.com>, Jay Allin <jaymari...@gmail.com> wrote:
>So I agree .. Jellyfish does not cheat by manipulating the dice. No. >Jellyfish seems to be cheating by looking ahead one or two rolls. >That's its method.
I hope you are not running Jellyfish on the same computer as where you do your online banking. If you do, make sure to often check all your accounts for unexplained withdrawals.
Kees (Te dashuroj Alentejano language software, CD's, kennelijke staat: Blaas jij klaar is cheating to redeem himself is gevormd.)
On Mar 3, 10:05 pm, "Peter Schneider" <schneiderp_REMOVET...@gmx.net> wrote:
> Of course you > can check the cubes in question with gnubg but there is no gurantee that > the two programs agree.
> Even if you gained a statistical > 95% confidence that JF is cheating I would still contest it > Statistics are never proofs of anything --
Let's face it Peter. Nothing will convince you. You already have your beloved hypothesis and it's burned into your brain. But your stout belief that Jellyfish does NOT cheat is based purely on your pre-conceptions, ie (a) you don't believe programmers have any incentive to cheat, (b) you have seen many beginner players complain about luck and they are always wrong, and (c) you like Jellyfish so you will not question its integrity.
You state plainly that no amount of statistical evidence will convince you. I expect a large amount of evidence would not even raise your suspicion. You even have written an excuse in advance (that GNUBG might not necessarily agree with Jellyfish), just in case that test shows against Jellyfish.
I (on the other hand) choose to keep an open mind on the subject and allow for the possibility that Jellyfish cheats. It would not be the first time. Many computer games cheat in order to make the game more balanced / more challenging.
Sure, it may well turn out that Jellyfish plays fair. Tim Chow (who made some very astute comments earlier) might well be right (about the bias in my tests).
But I don't believe you should dig your heels in and refuse to be moved. Wait until the test results come in and let's discuss further. Don't say "even if there is a 95% confidence in the statistics, I will still not agree". If strong evidence comes in on the subject, it should be met with good logical criticism, not just stubbornly dismissed without discussion
I'll add some more comments after devising a good test (hopefully I can automate a test to get a large sample set of results)
In article <b7a44d52-4edd-46f9-ba41-053b942f9...@s36g2000prf.googlegroups.com>, Jay Allin <jaymari...@gmail.com> wrote:
>Why do I suspect this? Unless you know what dice rolls are coming, >why would double with only a 60% chance of winning? You don't want to >put the cube in the opponent's hands for just 10%.
It looks like you don't know that much about correct cube strategy. In a match, it can sometimes be correct to double even when you are very likely to *lose*. For example, say your opponent has two points to go and you're on the bar waiting for a last-ditch shot. If you get a shot it may be correct to double *before* hitting the shot even though you're an underdog to hit the shot, because if you don't hit the shot you're going to get gammoned anyway and lose the match, so you lose nothing by doubling if you miss. On the other hand, perhaps you're a favorite to win if you hit, so you have a correct double.
In other situations, having only a 60% chance of winning is easily enough to double if you have excellent chances of winning a gammon, but if you wait a roll to double your opponent will drop. By waiting to double you increase your chances to win 1 point, but you pass up the opportunity to win 4 points. So you should double. -- Tim Chow tchow-at-alum-dot-mit-dot-edu The range of our projectiles---even ... the artillery---however great, will never exceed four of those miles of which as many thousand separate us from the center of the earth. ---Galileo, Dialogues Concerning Two New Sciences
In article <ed4ae800-3ef7-48b2-b176-0ad761d6d...@t17g2000prg.googlegroups.com>, Jay Allin <jaymari...@gmail.com> wrote:
>I (on the other hand) choose to keep an open mind on the subject and >allow for the possibility that Jellyfish cheats. It would not be the >first time. Many computer games cheat in order to make the game more >balanced / more challenging.
This is not plausible in the case of Jellyfish. When I first started surfing the web for articles about backgammon, I was *astounded* by the number of complaints people made about cheating bots. To me this is enormously strong evidence that the bots *don't* cheat, because no programmer would do something to increase the number of complaints, which is already astronomical. It's not proof, of course, but it's very strong evidence.
The only bot I've seen that I suspect cheats is a backgammon program that I played on an airplane. It seemed to me that the bot was cheating in order to make it easier for the *player to win*. That is, it was giving the player good dice and itself bad dice, and making dumb moves, even on the "hard" level. Here I think it is easy to see why someone would program the bot to cheat in this way. The airline doesn't want customers complaining that the computer cheats. Therefore it wants a very weak backgammon program. Nobody complains about cheating bots when they *win*. But a strong bot will provoke complaints even if it is not cheating. -- Tim Chow tchow-at-alum-dot-mit-dot-edu The range of our projectiles---even ... the artillery---however great, will never exceed four of those miles of which as many thousand separate us from the center of the earth. ---Galileo, Dialogues Concerning Two New Sciences
> Hi everyone. This topic was discussed a lot 10 years ago.
> I wonder. Does anyone here play against Jellyfish as > well as either GNUBG or Snowie? > What is your experience? Do you feel Snowie and GNU play > fair while Jellyfish cheats? > Lets have a vote.
> I am convinced Jellyfish cheats. I have played 100s of > games with automatic dice and 100s with manual dice. > My record is 40% wins with automatic dice but a hugely > superior result of 55% wins with manual > dice (I play the free version of Jellyfish and I play > against level 5).
oh no, is this paranoia stil rampant?
Listen, brianiac, it's simple - just take any position, and switch colors. Jellyfish will always play the same move, regardless of which side it's playing.
what I wanted to say is that with statistics you only get probabilities. No matter what the confidence interval is you choose, the result can (and will occasionally) be wrong. Yes, I assume that we will -- provided the test we perform is well set up -- quickly reach a high confidence that JF doesn't cheat. But will Jay be convinced? Dunno. It's not a proof: it's just probability.
But then Jas is claiming that he *does* have statistical backing already that *supports* his suspicion. In his favour I had -- as a thought experiment -- assumed that his test was well set up and the maths were correct. So we arrived at a, say, 95% probability that JF is indeed cheating, *if the only thing we know is the data*. But hell, we know a good deal more. Some of the readers may even have had a beer with the programmer. So I'm not convinced. I would not be convinced if the confidence were 99.9% -- I have rolled consecutive boxes in my life.
> Peter Schneider wrote: >> Walt's recipe is *not* a perfect proof. Even if you gained a statistical >> 95% confidence that JF is cheating I would still contest it -- 1 out of >> 20 trials you would reach that result although JF is not cheating, even >> if there is no bias whatsoever in the trial
> Yep, it is possible that you could get unlucky and run a trial that hit > the one-in-twenty possibility. Such a result would imply that it cheats > *slightly*, perhaps winning an extra one or two percent of games when it > has the dice. But the people who claim that it cheats are not whining > about a one or two percent edge - they're claiming a 15% or more > disparity (see the original post in this thread where he claims 55% vs > 40% winning rate disparity).
> If the hypothesis is "jf cheats" then you just might get a trial that > confirms this hypothesis - it's a one-in-twenty chance as you say. But > if your hypothesis is "jf cheats to give it an advantage greater than > 10%" (or some other reasonable percentage) you're not going to get > anywhere close to confirming that result.
> So, while my recipe might fail to demonstrate that "jellyfish doesn't > cheat at all", it will clearly refute the wild claims that it cheats > enough to matter. If the typical "it cheats" claims were true, it would > be readily apparent after only a few trials of my test.
> The only bot I've seen that I suspect cheats is a backgammon program > that I played on an airplane. It seemed to me that the bot was cheating > in order to make it easier for the *player to win*. That is, it was > giving the player good dice and itself bad dice, and making dumb moves, > even on the "hard" level. Here I think it is easy to see why someone > would program the bot to cheat in this way. The airline doesn't want > customers complaining that the computer cheats. Therefore it wants a > very weak backgammon program. Nobody complains about cheating bots when > they *win*.
True! I played probably the same bot and I was very satisfied with myself. Darn ;-).
Not true. I suggested a proof which would settle the matter immediately and completely once and for all. If you show me suboptimal cubes by JF *at its own standard* I'm convinced.
> You already have your beloved hypothesis > and it's burned into your brain.
Its not hardwired. It's derived from a mix of diverse specific information (I'm a programmer, we know who programmed JF and some of his reasonings, I play backgammon myself) and general knowledge about human psychology (we *see* patterns) and how the world works (we were on the moon). I didn't mention all this before as pure chatter -- it's part of the input that shapes my opinion about this particular matter. I do not have an arbitrary hard-wired idea.
> But your stout belief that Jellyfish does NOT cheat is based purely on > your pre-conceptions, ie (a) you don't believe programmers have any > incentive to cheat, (b) you have seen many beginner players complain > about luck and they are always wrong, and (c) you like Jellyfish so > you will not question its integrity.
All true, except that I cannot see any pre-conceptions here. Everything you mention is reproducible empirical experience (if we translate "I like JF" into "it's a good program").
> You state plainly that no amount of > statistical evidence will convince > you.
At least no amount that can be produced by manual means, true. I may get suspicious at some automatically generated data that reproducibly supports your hypothesis.
> You even have written an excuse in advance (that GNUBG might not > necessarily agree with Jellyfish), just in case that test shows > against Jellyfish.
That was no excuse but the observation that if we check cubes with gnubg we are leaving the realm of proofs and are back at statistics, which prove nothing (gnubg will often, but not always concur, which means we'll do statistics again).
> I (on the other hand) choose to keep an > open mind on the subject and > allow for the possibility that > Jellyfish cheats. It would not be the > first time. Many computer games > cheat in order to make the game more > balanced / more challenging.
Do they? I don't know any, except if you call the piece of inapt programming Tim mentioned elsewhere a cheat.
> Don't say "even if there is a 95% > confidence in the statistics, I will > still not agree".
But I do. It would be insane to be convinced by 95%, given what we know about the subject. 95% confidence means it goes wrong one in 20 trials! I'd never roll boxes if I'd be convinced by 95% CIs.
> If strong evidence comes in on the subject, it should be met with good > logical criticism, not just stubbornly dismissed without discussion
Well now.
1. No strong evidence has been produced yet.
2. To suggest that I didn't discuss the matter logically, in depth and from different angles and didn't lay out the reasons that shape my opinion in painstaking detail is a bit surprising.
tc...@lsa.umich.edu wrote: > The only bot I've seen that I suspect cheats is a backgammon program > that I played on an airplane. It seemed to me that the bot was cheating > in order to make it easier for the *player to win*. That is, it was > giving the player good dice and itself bad dice, and making dumb moves, > even on the "hard" level. Here I think it is easy to see why someone > would program the bot to cheat in this way. The airline doesn't want > customers complaining that the computer cheats. Therefore it wants a > very weak backgammon program. Nobody complains about cheating bots when > they *win*.
> It looks like you don't know that much about correct cube strategy. In > a match, it can sometimes be correct to double even when you are very > likely to *lose*. For example, say your opponent has two points to > go and you're on the bar waiting for a last-ditch shot.
> In other situations, having only a 60% chance of winning is easily enough > to double if you have excellent chances of winning a gammon, but if you
Yes, thanks for that Tim. All that you wrote is good basic doubling theory. I should have been clearer. Don't worry. I know my basic doubling theory. Its pretty basic stats / game theory stuff. I was talking about Jellyfish appearing to double at 60% in mid stages of a match with both players fairly equal and no special situation like high chance of gammon etc.
The question is: Have I assessed the board "equity" wrong? It looks to me like 60% with no great danger of gammon. Therefore Jellyfish should not be be doubling (and subsequently it seems suspicious when the next 2 or 3 rolls go unusually lucky for Jellyfish). But .. my perception of the odds may be wrong in these situations.
A good way to check my suspicions is to use a game analyser. Something which tells you the "equity" of the current board position. Then I will know for sure what the real odds are, and I can check the validity of my claim. Is Jellyfish doubling at 60% which is suspiciously optimistic? Or is it really a 70% board and I am just estimating poorly.