What are possible rewards when checkpoints are enabled?

72 views
Skip to first unread message

Piotr Januszewski

unread,
Feb 28, 2020, 9:26:28 AM2/28/20
to Google Research Football
Hi!

1. In academy empty goal close when playing 32 random rollouts with time limit 100 I observe such rewards histogram (reward/count): 0./2474; 0.1/6; 0.2/4; 0.9/21; 1.1/1; 2./4. Is it expected behavior that the agent can get a reward of 0.2? I also observed a reward of 0.4 once.
2. Because in empty goal close we start behind most of the checkpoints for the first action '4' (possibly move forward, right?) we get a reward of 0.9 which is pretty clear. However, for actions '5'/'6'/'7' in the first step (possibly moving in other directions, right?) we get 0.9 only from time to time. For all the other actions in the first step a reward is 0. Is it expected? My intuition would be, that because we start behind the checkpoints, we should score all of them after the first action no matter what it is.
3. Last but not least. Does the academy end when the opponent scores too or only when the player scores?

Thanks,
Piotr

Anton Raichuk

unread,
Mar 2, 2020, 4:22:17 AM3/2/20
to Piotr Januszewski, Google Research Football
Hey Piotr,

Thanks for the report.
There's indeed seems to be an issue with the checkpoint reward, which is due to wrong detection of a team who owns the ball at the beginning of the episode.
We'll look into this.

As for end condition for academy scenarios: just check the scenario definition files and it should be clear which scenario ends when.
In particular look at lines like this:

--
You received this message because you are subscribed to the Google Groups "Google Research Football" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-research-fo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-research-football/94b7ca61-4380-4b65-9c8d-9c2ba3cff8af%40googlegroups.com.

Piotr Januszewski

unread,
Mar 2, 2020, 1:12:57 PM3/2/20
to Google Research Football
Thanks for the clarification! Waiting for fix then :) Can I help somehow?
To unsubscribe from this group and stop receiving emails from it, send an email to google-research-football+unsub...@googlegroups.com.

Anton Raichuk

unread,
Mar 3, 2020, 6:21:24 AM3/3/20
to Piotr Januszewski, Google Research Football
Have bad news for you here.

So the problem is that the player passes the ball backwards, and that pass happens in between the time steps (we perform a step every 100ms).
So neither at the step before nor at the step after the player does not have ball ownership. That's why checkpoint reward is not granted.

There does not seem to be an easy fix that does not break backwards compatibility (we'll have to publish a new version of the env, rerun all experiments, update paper).
Given the limited scope of the problem and the fact that it happens only to checkpoint reward (a custom reward that we came up with to simplify the learning),
we are leaning towards leaving things as they are right now.

To unsubscribe from this group and stop receiving emails from it, send an email to google-research-fo...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Google Research Football" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-research-fo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-research-football/3023aaae-1d0d-47b7-bb9b-c947afe89c20%40googlegroups.com.

Piotr Januszewski

unread,
Mar 3, 2020, 9:10:50 AM3/3/20
to Anton Raichuk, Google Research Football
I understand, thanks.

Greetings,
Piotr Januszewski
Reply all
Reply to author
Forward
0 new messages