What are possible rewards when checkpoints are enabled?

Piotr Januszewski

unread,

Feb 28, 2020, 9:26:28 AM2/28/20

to Google Research Football

Hi!

1. In academy empty goal close when playing 32 random rollouts with time limit 100 I observe such rewards histogram (reward/count): 0./2474; 0.1/6; 0.2/4; 0.9/21; 1.1/1; 2./4. Is it expected behavior that the agent can get a reward of 0.2? I also observed a reward of 0.4 once.

2. Because in empty goal close we start behind most of the checkpoints for the first action '4' (possibly move forward, right?) we get a reward of 0.9 which is pretty clear. However, for actions '5'/'6'/'7' in the first step (possibly moving in other directions, right?) we get 0.9 only from time to time. For all the other actions in the first step a reward is 0. Is it expected? My intuition would be, that because we start behind the checkpoints, we should score all of them after the first action no matter what it is.

3. Last but not least. Does the academy end when the opponent scores too or only when the player scores?

Thanks,

Piotr

Anton Raichuk

unread,

Mar 2, 2020, 4:22:17 AM3/2/20

to Piotr Januszewski, Google Research Football

Hey Piotr,

Thanks for the report.

There's indeed seems to be an issue with the checkpoint reward, which is due to wrong detection of a team who owns the ball at the beginning of the episode.

We'll look into this.

As for end condition for academy scenarios: just check the scenario definition files and it should be clear which scenario ends when.

In particular look at lines like this:

  builder.config().end_episode_on_score = True
  builder.config().end_episode_on_out_of_play = True
  builder.config().end_episode_on_possession_change = True

--
You received this message because you are subscribed to the Google Groups "Google Research Football" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-research-fo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-research-football/94b7ca61-4380-4b65-9c8d-9c2ba3cff8af%40googlegroups.com.

Piotr Januszewski

unread,

Mar 2, 2020, 1:12:57 PM3/2/20

to Google Research Football

Thanks for the clarification! Waiting for fix then :) Can I help somehow?

To unsubscribe from this group and stop receiving emails from it, send an email to google-research-football+unsub...@googlegroups.com.

Anton Raichuk

unread,

Mar 3, 2020, 6:21:24 AM3/3/20

to Piotr Januszewski, Google Research Football

Have bad news for you here.

So the problem is that the player passes the ball backwards, and that pass happens in between the time steps (we perform a step every 100ms).

So neither at the step before nor at the step after the player does not have ball ownership. That's why checkpoint reward is not granted.

There does not seem to be an easy fix that does not break backwards compatibility (we'll have to publish a new version of the env, rerun all experiments, update paper).

Given the limited scope of the problem and the fact that it happens only to checkpoint reward (a custom reward that we came up with to simplify the learning),

we are leaning towards leaving things as they are right now.

To unsubscribe from this group and stop receiving emails from it, send an email to google-research-fo...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/google-research-football/94b7ca61-4380-4b65-9c8d-9c2ba3cff8af%40googlegroups.com.

--

You received this message because you are subscribed to the Google Groups "Google Research Football" group.

To unsubscribe from this group and stop receiving emails from it, send an email to google-research-fo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-research-football/3023aaae-1d0d-47b7-bb9b-c947afe89c20%40googlegroups.com.

Piotr Januszewski

unread,

Mar 3, 2020, 9:10:50 AM3/3/20

to Anton Raichuk, Google Research Football

I understand, thanks.

Greetings,
Piotr Januszewski

Reply all

Reply to author

Forward