> I think that we should have the following schedule:
>
> In Stage I (S1):
> RIPS
> Follow Me
> Go Get It
> Demo Challenge (with evaluation by the team leaders, normalized score)
> Who's Who?
> General Purpose Service Robot
> Open Challenge (with evaluation by the team leaders, normalized score)
>
> Best 50% go to stage II
>
> In Stage II (S2):
> Enhanced Who's Who (good teams can change Who's Who into EWW or
> already prepared this)
> Shopping Mall (requires good performance, too dangerous otherwise)
>
That doesn't leave so much tests in stage II.
> My reasoning is the following:
> The demo challenge in S1:
> This is a good test for good and not-so-good teams do give a nice
> demonstration. The demo challenge is often an inspiring test for the
> teams. But with 32 teams it is impossible, time wise, to have every
> team leader evaluate every team. So the evaluation by the team leaders
> should be per group. If we normalize the score (for example, 1500 for
> the best team, 1400 for the 2nd best, etc) then we don't have any
> problems between the two groups. A problem could arise if in one group
> there is a bias towards giving higher or lower ratings. If the average
> of points in one group is much lower than the other group this would
> be unfair.
>
Up to now, Demo Challenge evaluation was done by TC, not by the team
leaders. And since we wanted to see out-of-scope really good stuff
(completely unrelated to any other test) and giving only some kind of a
bonus score, we should keep it as it is, in my opinion. 1500 is also
something that should only be given in exceptional cases, e.g.,
max 500 pts: a team shows a nice demo that 1.) corresponds to the
challenge description and 2.) shows some nice stuff by maybe combining
abilities from other tests.
max 1000 pts (cum laude): 1.) + shows some robot abilities that are not
addressed in any other test, e.g., really ironing the clothes etc.
up to 1400 pts (magna cum laude): really good demo, awesome performance,
really new nice features etc.
the way to 1500 (summa cum laude): exceptionally good performance. I do
not expect any team to reach this score ever. I'd give for a robot that
I would directly buy just to have it in my own household ...
TC evaluation plus the above (actually what we had in the last years) is
what we should try to keep as "demo challenge" _and_ in Stage II, where
the top teams can show the really good and new stuff they are doing.
> GPSR in S1:
> I think that it is so important to get away from the state like
> programming that every team should focus on these capabilities or at
> least think about it. If it is S2 then I'm afraid that only a very few
> teams are willing to tackle this problem.
>
I would like to keep GPSR in Stage II, especially regarding all the new
stuff that we might add for 2012 (and that all new (stage I) things are
automatically in GPSR). Furthermore, the changes on GPSR itself, like
getting mental age tests in there etc. Whatever we do with the other
tests, I think that GPSR should always be a Stage II test.
> Keeping EWW in S2:
> EWW is a good example where many capabilities have to be integrated
> into a difficult test. I think that this is a really complex test ad
> should remain in S2
>
If we do not merge Enhanced Who is Who and Who is Who, then YES. The
tests not being merged definitely belong into two stages.
> Shopping Mall in S2:
> Too dangerous for poor performing teams.
>
Totally agree.
But in a similar fashion, I'd also like to keep GPSR in stage II. In
GPSR teams may be told to solve three tests at the same time or
sequentially (but at least within 10 minutes). A low scoring team in
stage I might not be able to solve a single one and according to the
scoring (which I do not want to change), score comes at 50% and more.
> So what do you think? Which test(s) should go to stage I, and why?
>
Hard decision. Looking at my comments above, I'd like to keep the stage
II tests in stage II. Who is Who-Merger is OK, if not, enhanced Who is
Who should also in Stage II. We could add an additional (easy) test to
stage I, but that's a step backwards, and that is _really_ something
that I want to avoid.
> Cheers,
>
> Tijn
I would like to keep GPSR in Stage II, especially regarding all the new
> GPSR in S1:
> I think that it is so important to get away from the state like
> programming that every team should focus on these capabilities or at
> least think about it. If it is S2 then I'm afraid that only a very few
> teams are willing to tackle this problem.
>
stuff that we might add for 2012 (and that all new (stage I) things are
automatically in GPSR). Furthermore, the changes on GPSR itself, like
getting mental age tests in there etc. Whatever we do with the other
tests, I think that GPSR should always be a Stage II test.
Yes, that's something that I proposed last year already. To have some
mini challenges within a first stage I test. Something like having the
robot enter the scenario and then accomplishing some simple tasks, like
"follow me", "stop", "move to the kitchen table". Basically the GPSR has
arisen from that, but I agree. We could come up with some minimal set of
capabilities that every robot should have and test them in something
like a technical challenge. The most important thing here is (see
examples above), to have very simple commands (no GPSR here) and, more
important, tasks that can be done independent of each other, e.g.
recognizing a person doesn't make sense without training at least two
persons (and that's more or less a complete test again just like who is
who). But if we manage to extract some minimal set of tasks (defining
the "challenges" described by Jesus above) that can easily be tested and
evaluated as it is done in other leagues, than "YES!", let's do it!
> Best
>
> Jesus
>
>
Stage I:
RIPS
Follow Me
Go Get It
Who's Who?
Open Challenge (with evaluation by the team leaders, normalized score)
General Purpose Service Robot
Stage II (Best 50%):
Enhanced Who's Who
Demo Challenge (with evaluation by the team leaders, normalized score)
Shopping Mall
This also satisfies the RCF request of allowing teams to participate in
more tests.
With this scheme every team participates to 1 registration test, 3
standard tests, 1 general test, 1 open test
that is surely worth the trip to the competition.
It is certainly true that GPSR is the most difficult test and that we
will have several 0 points scores,
however, I am sure that being in Stage I will be a benefit for the
overall performance.
GPSR will be the last test of Stage I for two reasons:
1) in this way all the basic capabilities will be tested before and the
only difference between having it in Stage II
is to allow for just a few additional time slots.
2) during GPSR OC will have more time to put together Open Challenge
results, which is quite time consuming,
as we have experienced last year.
This year we will have two similar and fully equipped apartments each
one of 15 m x 7.5 m,
that are enough to run tests in parallel. So we should have enough time
slots to do more tests in Stage I.
Finally, before defining this as final, I suggest to wait a few days to
have an idea on the number of
pre-registered teams.
Happy Holidays,
Luca.
Last year, we had 31 preregistered teams, 26 teams were qualified, and
24 teams participated in the competition. So, do we have to divide
qualified teams? Why not single group? If two arenas were used, we would
have some problems:
- noise from the other arena (moderation, robot's utterances,...)
- fairness (some referees had 50cm bonus to every team using on-board
microphone in 2009)
Also, I prefer to make the stage structure as it is. Namely, GPSR should
be in Stage II in my opinion, because GPSR is:
- time consuming for referees since the referees collect and distribute
objects listed in the team's manipulatable objects whenever the sentence
is randomly generated.
- difficult even for high level teams, and only a couple of teams got
non-zero score in 2010. If GPSR was taken place in Stage I, most teams
would get 0 point, which is not attractive for audiences.
Thus, I guess the following structure is better since major changes are
made bi-anualy in RoboCup@Home, and we should not this year. What do you
think?
Stage I:
RIPS
Follow Me
Go Get It
Who's Who?
Open Challenge (with evaluation by the team leaders, normalized score)
Stage II (Best 50%):
Enhanced Who's Who
General Purpose Service Robot
Demo Challenge (with evaluation by the team leaders, normalized score)
Shopping Mall
Best,
Komei
2011/1/13 Luca Iocchi <luca....@dis.uniroma1.it>:
>
> Also, I prefer to make the stage structure as it is. Namely, GPSR should
> be in Stage II in my opinion, because GPSR is:
> - time consuming for referees since the referees collect and distribute
> objects listed in the team's manipulatable objects whenever the sentence
> is randomly generated.
> - difficult even for high level teams, and only a couple of teams got
> non-zero score in 2010. If GPSR was taken place in Stage I, most teams
> would get 0 point, which is not attractive for audiences.
>
> Thus, I guess the following structure is better since major changes are
> made bi-anualy in RoboCup@Home, and we should not this year. What do you
> think?
> Stage I:
> RIPS
> Follow Me
> Go Get It
> Who's Who?
> Open Challenge (with evaluation by the team leaders, normalized score)
>
>
> Stage II (Best 50%):
> Enhanced Who's Who
> General Purpose Service Robot
> Demo Challenge (with evaluation by the team leaders, normalized score)
> Shopping Mall
>
I completely agree. GPSR is a test for test for stage II because of the
above points and the fact that it explicitly includes "all the
capabilities" from stage I. Having GPSR in stage I would lead to some
recursive definition :)
It is designed as a stage II test, and it should stay a stage II test
imho.
Cheers,
Dirk
>> Last year, we had 31 preregistered teams, 26 teams were qualified, and
>> 24 teams participated in the competition. So, do we have to divide
>> qualified teams? Why not single group? If two arenas were used, we would
>> have some problems:
>> - noise from the other arena (moderation, robot's utterances,...)
>> - fairness (some referees had 50cm bonus to every team using on-board
>> microphone in 2009)
>>
> I guess we do not need to decide now whether or not we split up the
> teams in two groups, but could do that in Istanbul once we see how many
> teams made it to RoboCup. So we should just add both possibilities in
> the rulebook (having one group of teams and having two groups). That
> also makes it more "compatible" with local competitions like
> japan/german/iran etc. open.
>
We have to test with 2 groups, because in the future we will have 32
teams. Also we have about 30 pre-registered teams now ad last year the
schedule was too crowded and mistakes were made. I do not like mistakes
especially if we can avoid them. So let's have a learning organization.
Also the argument of noise does not hold. Whether there is one area
which is used almost 100% of the time of 2 areas interleaving about 50%
of the time --> there is always noise.
About the referees --> better referee instructions and training should
solve this. If the referees for both areas are trained at the same time,
there should be no differences.
>>
>> Thus, I guess the following structure is better since major changes are
>> made bi-anualy in RoboCup@Home, and we should not this year. What do you
>> think?
>> Stage I:
>> RIPS
>> Follow Me
>> Go Get It
>> Who's Who?
>> Open Challenge (with evaluation by the team leaders, normalized score)
>>
>>
>> Stage II (Best 50%):
>> Enhanced Who's Who
>> General Purpose Service Robot
>> Demo Challenge (with evaluation by the team leaders, normalized score)
>> Shopping Mall
This is not a solution. The trustees want more tests in Stage I. So I
guess it is either the Demo Challenge in stage I, or the GPRS. Pick any
one, but one needs to be chosen...
Cheers,
--Tijn
This is not a solution. The trustees want more tests in Stage I. So I guess it is either the Demo Challenge in stage I, or the GPRS. Pick any one, but one needs to be chosen...
Thus, I guess the following structure is better since major changes are
made bi-anualy in RoboCup@Home, and we should not this year. What do you
think?
Stage I:
RIPS
Follow Me
Go Get It
Who's Who?
Open Challenge (with evaluation by the team leaders, normalized score)
Stage II (Best 50%):
Enhanced Who's Who
General Purpose Service Robot
Demo Challenge (with evaluation by the team leaders, normalized score)
Shopping Mall
Cheers,
--Tijn
As for the structure of tests, let me summarize my point of view:
1) we have an explicit request for RoboCup Federation to increase the
minimum number of tests
of each team
2) the two groups scheme with <= 30 teams pre-registered should not make
any problem in
allowing more teams to do one test (i.e. to move a test from Stage II to
Stage I)
3) Moving a test from Stage II to Stage I is not a major change in the
rule, since all teams are
expected to prepare all tests
4) having GPRS as the last test of Stage I or the first of Stage II does
not make any difference
in terms of preparation for the teams. It will be done in any case after
basic functionalities have been tested.
The only difference is in the number of teams that will do this test.
It is true that we will have more zero-score points if we allow all
teams to participate,
however I think that if teams know that this test is in Stage I, they
will prepare it more carefully
and the average score of this test will be better than if it is declared
as a Stage II test.
5) GPRS is very important from the scientific view point and an increase
of performance in it
will be very valuable for the teams (I can see many pubblications out of
it) and for the
RoboCup@Home League (nobody is doing such kind of test in the world!)
Anyway, we have to come to a conclusion, because it is urgent to start
writing the rulebook.
I propose that TC vote for these three options
1) Leave as in 2010
2) Move GPRS to Stage I
3) Move Demo Challenge to Stage I
My vote is for 2
Best,
Luca.
Stage II (Best 50%):Enhanced Who's WhoGeneral Purpose Service RobotDemo Challenge (with evaluation by the team leaders, normalized score)This is not a solution. The trustees want more tests in Stage I. So I guess it is either the Demo Challenge in stage I, or the GPRS. Pick any one, but one needs to be chosen...Shopping Mall
I propose that TC vote for these three options
1) Leave as in 2010
2) Move GPRS to Stage I
3) Move Demo Challenge to Stage I
My vote is for 2
1- On one hand, we have an explicit request from the RoboCup Federation to increase the minimum number of tests for each team ... in other words, the trustees want more tests in Stage-I.So we have a problem and we need a solution. One suggestion was that we move one of the tests from Stage-II to Stage-I, and that test could be either the GPSR or Demo-Challenge. Personally, I don't really like this solution! But we want people to do more tests ... so here's a suggestion:
2- On the other hand, we have a RoboCup@Home legacy that major modification to the rulebook is done every two year ... and this year, is the one that we shouldn't make major modifications to the rulebook.
Instead of moving a test from Stage-II to Stage-I, how about we let the teams decide to try another test. What I have in mind is that this test could be either from Stage-I or Stage-II. Officially, we have 5 tests in Stage-I (RIPS, Follow me, Go Get it, Who is who, Open-Challenge), and each team will have 6 chances in the first stage. Maybe a team performs poor in a test (maybe by bad luck), then they will have another chance to do better and earn more scores (of course, this will replace the poor score, instead of being added to the total score so far). Or, if a team has done well in all tests of Stage-I, they can choose a test from Stage-II to give it a try and earn more (which will be added to the total score so far).
Note that in Stage-II, teams should only participate in 3 tests (from 4 available choices: GPSR, Demo, Enhanced who is who, Shopping mall). Another note is that if a team has tried a test from the 2nd stage in Stage-I, they cannot repeat that test in Stage-II and thus they will have to participate in the other 3 tests of Stage-II.
Stage II (Best 50%):Enhanced Who's WhoGeneral Purpose Service RobotDemo Challenge (with evaluation by the team leaders, normalized score)
This is not a solution. The trustees want more tests in Stage I. So I guess it is either the Demo Challenge in stage I, or the GPRS. Pick any one, but one needs to be chosen...Shopping Mall
I prefer to have GPRS in stage I. Main reason is to give the possibility to start working in this test to all teams. Otherwise, teams that normally don´t go to stage II will not consider this test as an important one, and will not focus any important effort on it.
Instead of moving a test from Stage-II to Stage-I, how about we let the teams decide to try another test. What I have in mind is that this test could be either from Stage-I or Stage-II. Officially, we have 5 tests in Stage-I (RIPS, Follow me, Go Get it, Who is who, Open-Challenge), and each team will have 6 chances in the first stage. Maybe a team performs poor in a test (maybe by bad luck), then they will have another chance to do better and earn more scores (of course, this will replace the poor score, instead of being added to the total score so far). Or, if a team has done well in all tests of Stage-I, they can choose a test from Stage-II to give it a try and earn more (which will be added to the total score so far).
Note that in Stage-II, teams should only participate in 3 tests (from 4 available choices: GPSR, Demo, Enhanced who is who, Shopping mall). Another note is that if a team has tried a test from the 2nd stage in Stage-I, they cannot repeat that test in Stage-II and thus they will have to participate in the other 3 tests of Stage-II.
About dividing the teams in Stage-I into 2 parallel groups, I'm concerned about fairness. We have two tests in Stage-I that are scored by the team-leaders (RIPS + Open-Challenge) ... dividing into two groups means: two different refereeing audience! Which could have quite different levels of satisfaction!! This is an important issue. I really don't have a good solution right now but it should be considered by the TC.
We have been thinking about this and have not come up with a final solution. There are several options. One thing we usually do is to eliminate the (two) lowest and highest score (as in, for example, ice skating).
About dividing the teams in Stage-I into 2 parallel groups, I'm concerned about fairness. We have two tests in Stage-I that are scored by the team-leaders (RIPS + Open-Challenge) ... dividing into two groups means: two different refereeing audience! Which could have quite different levels of satisfaction!! This is an important issue. I really don't have a good solution right now but it should be considered by the TC.
Then, for example, after calculating the score we could normalize the points using ranking, and distribute the scoring based on the ranking.
But I think that this will not be needed. A team get scores from a dozen or more other team leaders. With this amount statistics start to work and I doubt it that the average scores of the two parallel groups will be very far apart.
Two different 'levels of satisfaction' could also be because on average one group had better performances than the other...
Regards,
Tijn
Dear Seyed,
thank you for your post.
I have two replies to your suggestion:
1) I do not think that moving a test from Stage II to Stage I is a major change in the rules.
Can you explain why you think so?
2) adding a new free slot where teams can decide what to do is very nice (we had this a couple
of years ago), but also very difficult from the organization viewpoint: can you imagine
30 teams each one deciding a different test, having to organize referees, arrange the arenas,
etc. on the fly? Also this increases substantially the number of slots needed and I am not sure
that in this way we will have enough.
> As we are all engineers, we well know that nothing is achievable
> without cost. And the more you want to achieve, the greater the cost
> is. The suggestion of additional time slots is more flexible than
> moving the GPSR to the first stage and also I believe that an
> additional time slot will truly provide the teams with more
> opportunities, while I believe that the GPSR will not (mainly because
> of its difficulty).
>
It would be nice to have additional time slots for team to redo a test.
But it is really not possible to organize (we've tried). Also that would
imply that we have to actually skip another test, because the schedule
is already very full. So although the idea is good, we simply can't
implement it.
--Tijn
btw, I'm not an engineer, but an AI guy ;-)
@parallel group
> We have to test with 2 groups, because in the future we will have 32
> teams. Also we have about 30 pre-registered teams now ad last year
> the schedule was too crowded and mistakes were made. I do not like
> mistakes especially if we can avoid them. So let's have a learning
> organization.
I have a different opinion here. The point is that this year there's
not going to be 32 teams.
I completely agree to have a learning organization. Imho, thinking
about potential problems are our duty. So, please clarify the mistakes
so that we can discuss the matter.
> Also the argument of noise does not hold. Whether there is one area
> which is used almost 100% of the time of 2 areas interleaving about
> 50% of the time --> there is always noise.
I don't understand the point. I mean by the word "noise" any source
which is not the user utterance, such as ambient noise, announcement,
moderator's speech, etc.
It is clear that most teams failed to handle noise in Singapore. Even
referees sometimes failed to catch robot's words since the arena was
too noisy.
>> *About dividing the teams in Stage-I into 2 parallel groups, I'm
>> concerned about **fairness*. We have two tests in Stage-I that are
>> scored by the team-leaders (RIPS + Open-Challenge) ... dividing into
>> two groups means: two different refereeing audience! Which could have
>> quite different levels of satisfaction!! This is an important issue. I
>> really don't have a good solution right now but it should be
>> considered by the TC.
> We have been thinking about this and have not come up with a final
> solution. There are several options. One thing we usually do is to
> eliminate the (two) lowest and highest score (as in, for example, ice
> skating).
I'm also anxious about RIPS (and GPSR) since TC members give partial
scores in RIPS and GPSR. I believe TC members will try to be fair but
it is going to be difficult since the population is small.
@GPSR
> Otherwise, teams that
> normally don't go to stage II will not consider this test as an
> important one, and will not focus any important effort on it.
That's true, Javier. But last year, Dirk, David and I were refereeing
in the GPSR test, and we felt re-arranging the objects was quite
time-consuming. Maybe my explanation was not so clear, though...
My vote is
> 1) Leave as in 2010
since
- GPSR is designed for Stage II
- Last year most teams got 0 scores.
- Major changes would be necessary since GPSR is a ten-minute, 2000 point-test.
Best,
Komei
> - Major changes would be necessary since GPSR is a ten-minute, 2000 point-test.
I do not really understand the arguments about the major changes. The
"not making major changes except for the demo challenge" is about the
tests. It is not even a rule but a general agreement.
Fiddling with the schedule and allowing teams to either do the demo
challenge or the GPSR test in stage I is not a major change to any of
the tests. So let's get past this part of the discussion and decide to
either do the demo challenge or the GPSR in stage one, since we have to
do more in stage I. I have not seen another solution which is also
organizable. The proposal of not having an extra test in stage I is not
a solution to the problem of needing an extra test in stage I...
Best,
Tijn
> @parallel group
>
>> We have to test with 2 groups, because in the future we will have 32
>> teams. Also we have about 30 pre-registered teams now ad last year
>> the schedule was too crowded and mistakes were made. I do not like
>> mistakes especially if we can avoid them. So let's have a learning
>> organization.
>
> I have a different opinion here. The point is that this year there's
> not going to be 32 teams.
> I completely agree to have a learning organization. Imho, thinking
> about potential problems are our duty. So, please clarify the mistakes
> so that we can discuss the matter.
The mistakes (which were all solved) were that we were totally
overwhelmed with the organizational aspect. So to relieve the
organization and make it manageable, from our experience we know that up
to 16 teams is no problem. So the best solution is to have 2 times 16
teams as a maximum, in two different groups.
>> Also the argument of noise does not hold. Whether there is one area
>> which is used almost 100% of the time of 2 areas interleaving about
>> 50% of the time --> there is always noise.
>
> I don't understand the point. I mean by the word "noise" any source
> which is not the user utterance, such as ambient noise, announcement,
> moderator's speech, etc.
> It is clear that most teams failed to handle noise in Singapore. Even
> referees sometimes failed to catch robot's words since the arena was
> too noisy.
That's a problem that we always have at the RoboCup. So what the
technical committee should ensure is that there is a good sound system
and write down the requirements for the local organization so that we
can get the sound system and hear the robots. Perhaps we need several
wireless head sets like they have on those "silent parties". Would that
be a solution?
>>> *About dividing the teams in Stage-I into 2 parallel groups, I'm
>>> concerned about **fairness*. We have two tests in Stage-I that are
>>> scored by the team-leaders (RIPS + Open-Challenge) ... dividing into
>>> two groups means: two different refereeing audience! Which could have
>>> quite different levels of satisfaction!! This is an important issue. I
>>> really don't have a good solution right now but it should be
>>> considered by the TC.
>> We have been thinking about this and have not come up with a final
>> solution. There are several options. One thing we usually do is to
>> eliminate the (two) lowest and highest score (as in, for example, ice
>> skating).
>
> I'm also anxious about RIPS (and GPSR) since TC members give partial
> scores in RIPS and GPSR. I believe TC members will try to be fair but
> it is going to be difficult since the population is small.
Small? We have 30 pre-registered teams and will probably end up with
app. 26-28 teams. This will create two groups of 13-14 teams. This means
that the @Home league is the largest senior league. But besides this
point. Do you have a better solution? It's good to worry, but please
provide a solution or alternative so we can weigh the pros and cons.
> @GPSR
>
>> Otherwise, teams that
>> normally don't go to stage II will not consider this test as an
>> important one, and will not focus any important effort on it.
>
> That's true, Javier. But last year, Dirk, David and I were refereeing
> in the GPSR test, and we felt re-arranging the objects was quite
> time-consuming. Maybe my explanation was not so clear, though...
And that is why we have to organize ourselves and create a schedule
where teams have to provide assistants, whether they are referees or
people who move objects around is not important. But a schedule is
something we definitely need :-) This is one of the things we learned
from last year.
Cheers,
Tijn
----------------------
Javier @ iPhone
The current status of voting is
1) Leave as in 2010
Komei
2) Move GPRS to Stage I
Luca, Tijn, Javier, Mohan
3) Move Demo Challenge to Stage I
Jesus
Dirk and Anne-Lise are missing
If option 1 will be dropped, Komei can vote again for either 2) or 3) ?
Please let's close this discussion as soon as possible, because we have
to work
on the rulebook, which is also important.
Best,
L.
I want to add anyway that other leagues have the same problems with
parallel groups, but they live with them.
For example, in the soccer leagues, since there are no walls between the
fields, in some cases robots can see
the elements (e.g., goals) of another field. This is a sensor noise that
teams have to take care of.
In @Home, we can certainly try as much as possible to minimize the sound
noise when a test is running,
by using the two @Home fields interleaving actual runs of the tests, or
by defining a schedule such that
when a test critical for speech is running in Field A, Group B will do
something not so noisy.
As for the votes of TC in the tests, I believe that we can guarantee
that all TC members
that are required to vote will do it in both the groups. So in this
matter, there will be no
difference between a single or parallel groups.
Best regards,
Luca.