Divergence after many steps in OffPACWorld?

11 views
Skip to first unread message

Sergio Donal

unread,
Aug 25, 2013, 2:12:25 PM8/25/13
to github...@googlegroups.com
Hi,


look at the console output:

Episodes 28935: 36, -39.326523
Episodes 28936: 37, -42.037025
Episodes 28937: 42, -46.569968
Episodes 28938: 37, -40.950535
Episodes 28939: 48, -60.572556
Episodes 28940: 34, -38.598512
Episodes 28941: 36, -40.962410
Episodes 28942: 38, -41.586507
Episodes 28943: 39, -43.514350
Episodes 28944: 40, -45.953217



So far, so good. Nevertheless at some point, it starts to oscillate and finally diverge:


Episodes 29106: 42, -45.893919
Episodes 29107: 50, -53.907965
Episodes 29108: 36, -38.827420
Episodes 29109: 58, -62.801208
Episodes 29110: 82, -87.256460
Episodes 29111: 364, -368.908211
Episodes 29112: 90, -92.501503
Episodes 29113: 657, -660.819735
Episodes 29114: 85, -89.145785
Episodes 29115: 163, -165.658204
Episodes 29116: 212, -216.848964
Episodes 29117: 119, -123.520506
Episodes 29118: 105, -108.844825
Episodes 29119: 152, -155.527153
Episodes 29120: 173, -177.658955
Episodes 29121: 66, -70.910970
Episodes 29122: 234, -238.532306
Episodes 29123: 54, -88.992188
Episodes 29124: 116, -147.048541
Episodes 29125: 87, -140.181902
Episodes 29126: 163, -239.334263
Episodes 29127: 79, -134.087640
Episodes 29128: 87, -143.696483
Episodes 29129: 52, -57.820330
Episodes 29130: 138, -145.859647
Episodes 29131: 244, -285.744016
Episodes 29132: 41, -47.632453
Episodes 29133: 56, -110.209197
Episodes 29134: 317, -357.575993
Episodes 29135: 353, -380.840344
Episodes 29136: 246, -278.514386
Episodes 29137: 100, -172.252488
Episodes 29138: 476, -515.261099
Episodes 29139: 274, -342.670195
Episodes 29140: 89, -154.757309
Episodes 29141: 102, -182.129436
Episodes 29142: 1646, -1698.989758
Episodes 29143: 1116, -1143.391348
Episodes 29144: 792, -845.076645
Episodes 29145: 924, -951.942184
Episodes 29146: 828, -868.843309
Episodes 29147: 356, -406.858048
Episodes 29148: 240, -260.518609
Episodes 29149: 83, -100.535152
Episodes 29150: 60, -85.593015
Episodes 29151: 55, -102.150341
Episodes 29152: 57, -62.662194
Episodes 29153: 105, -129.756835
Episodes 29154: 57, -98.186359
Episodes 29155: 46, -74.074748
Episodes 29156: 171, -238.600049
Episodes 29157: 210, -249.549994
Episodes 29158: 80, -145.171590
Episodes 29159: 56, -95.148255
Episodes 29160: 83, -127.316200
Episodes 29161: 74, -184.301768
Episodes 29162: 52, -121.180119
Episodes 29163: 92, -113.140587
Episodes 29164: 40, -45.957784
Episodes 29165: 127, -147.280709
Episodes 29166: 157, -193.841512
Episodes 29167: 159, -188.190773
Episodes 29168: 301, -338.664405
Episodes 29169: 65, -150.858337
Episodes 29170: 402, -452.067564
Episodes 29171: 44, -101.912887
Episodes 29172: 65, -81.679612
Episodes 29173: 135, -167.505204
Episodes 29174: 147, -213.386848
Episodes 29175: 98, -228.166775
Episodes 29176: 64, -129.673298
Episodes 29177: 41, -85.009894
Episodes 29178: 39, -76.114247
Episodes 29179: 38, -65.666042
Episodes 29180: 70, -128.941685
Episodes 29181: 166, -322.085175
Episodes 29182: 64, -107.393295
Episodes 29183: 54, -88.495207
Episodes 29184: 153, -234.070123
Episodes 29185: 37, -57.774455
Episodes 29186: 135, -228.806690
Episodes 29187: 38, -44.402223
Episodes 29188: 775, -3639.323611
Episodes 29189: 1331, -11395.605650
Episodes 29190: 37, -41.543704
Episodes 29191: 93, -130.825559
Episodes 29192: 50, -57.294886
Episodes 29193: 43, -46.603071
Episodes 29194: 44, -48.708545
Episodes 29195: 83, -114.571961
Episodes 29196: 1186, -8455.643462
Episodes 29197: 35, -54.776266
Episodes 29198: 175, -292.019967
Episodes 29199: 48, -90.252699
Episodes 29200: 101, -104.736710
Episodes 29201: 37, -65.203110
Episodes 29202: 84, -88.040081
Episodes 29203: 128, -131.577459
Episodes 29204: 51, -53.849120
Episodes 29205: 64, -89.699576
Episodes 29206: 152, -303.458464
Episodes 29207: 50, -95.862733
Episodes 29208: 41, -45.448978
Episodes 29209: 39, -56.989206
Episodes 29210: 45, -74.366947
Episodes 29211: 84, -142.853811
Episodes 29212: 43, -47.836418
Episodes 29213: 40, -63.451722
Episodes 29214: 38, -42.378296
Episodes 29215: 53, -56.679479
Episodes 29216: 38, -61.816397
Episodes 29217: 85, -89.007109
Episodes 29218: 49, -82.426780
Episodes 29219: 45, -71.530368
Episodes 29220: 35, -65.438954
Episodes 29221: 51, -73.112579
Episodes 29222: 40, -55.906790
Episodes 29223: 39, -58.428326
Episodes 29224: 424, -1082.156444
Episodes 29225: 37, -54.646532
Episodes 29226: 41, -64.005652
Episodes 29227: 117, -120.291723
Episodes 29228: 1084, -5426.609170
Episodes 29229: 145, -148.702575
Episodes 29230: 38, -50.749924
Episodes 29231: 608, -6334.831548
Episodes 29232: 41, -44.099021
Episodes 29233: 577, -1237.240096
Episodes 29234: 5000, -5016.534455
Episodes 29235: 37, -107.817493
Episodes 29236: 1923, -1931.040553
Episodes 29237: 5000, -5645.784331
Episodes 29238: 5000, -5328.357479
Episodes 29239: 5000, -5478.798385
Episodes 29240: 5000, -5861.022496
Episodes 29241: 5000, -5857.057924
Episodes 29242: 5000, -6589.932192
Episodes 29243: 5000, -7339.777974
Episodes 29244: 5000, -7988.745483
Episodes 29245: 5000, -8401.377098
Episodes 29246: 5000, -7981.067250
Episodes 29247: 5000, -7335.949505



Is this normal?
What could be the reason?

Thanks!
Sergio


Saminda Abeyruwan

unread,
Aug 25, 2013, 7:40:26 PM8/25/13
to github...@googlegroups.com
Hi Sergio,

I ran OffPACPuddleWorld for a long time and produced the same problem around:



2013/8/25 Sergio Donal <serte...@gmail.com>

--
 
---
You received this message because you are subscribed to the Google Groups "RLPark" group.
To unsubscribe from this group and stop receiving emails from it, send an email to githubrlpark...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Sergio Donal

unread,
Aug 25, 2013, 8:24:34 PM8/25/13
to github...@googlegroups.com
Hi Sam,

Thanks for your respond!

I am running the "Demos" plugin in a fresh installation of Zephyr as stand alone application and the problem is consistent.

I have also tried with a different seed:
private final Random random = new Random(13214);

but it follows the same pattern (it takes longer to get mad though):

Episodes 50556: 34, -53.009268
Episodes 50557: 37, -50.084672
Episodes 50558: 37, -45.664548
Episodes 50559: 35, -47.383328
...

Episodes 50622: 388, -444.812407
Episodes 50623: 284, -383.584361
Episodes 50624: 106, -197.427692
Episodes 50625: 5000, -5072.025183
Episodes 50626: 5000, -5045.348153
Episodes 50627: 5000, -5037.837733
Episodes 50628: 5000, -5027.581984
Episodes 50629: 5000, -5029.631330
Episodes 50630: 5000, -5128.543086
Episodes 50631: 5000, -5183.745050


Could it be related to the 48bits of the default setSeed method of the java.util.Random generator?

Thanks!
Sergio

Thomas Degris

unread,
Aug 26, 2013, 5:32:58 AM8/26/13
to github...@googlegroups.com
Hi Sergio,

you have been busy last weekend! Let me answer your questions in order:

1. Divergence after many steps

Divergence is probably because of the value of the step-sizes. Finding good values for step-size parameters is difficult because one would like the highest value as possible (so that learning is as fast as possible), but not too high because the algorithm will diverge. In the OffPAC paper, I have used a parameter sweep to determine the best step-sizes on the first 5000 episodes, but there is no guarantee that such values are good for longer runs. 

While not fully satisfying, you could adjust the step-size to slow down learning over time to prevent divergence. See the discussion in the Off-PAC paper about setting the step-size. 

I finally would like to mention that step-size adaptation is an active topic of research (see for instance Adaptive Step-Size for Online Temporal Difference Learning from William Dabney and Andrew G. Barto in AAAI 2012). I have been working on this topic for some time now and I am optimistic in being able to propose a nice solution very soon. 

2. Discouraged access: The type OffPolicyAgentEvaluable is not accessible due to restriction on required library.
 
This warning means you are using an interface or class internal to RLPark. There is no problem with that. It just means that I am not satisfied with the current design/implementation of that class/package. So, it is likely to change in the future.

3. !MESSAGE Exception launching the Eclipse Platform:
!STACK
java.lang.ClassNotFoundException: org.eclipse.core.runtime.adaptor.EclipseStarter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)

That means that Zephyr is trying to use a class implemented in a plugin that has not been loaded. To fix that, you could have go to the Plug-ins tab of your Run Configuration to select the missing plugin, or click on the "Add Required Plug-ins" button, which uses the Dependencies declared in META-INF/MANIFEST.MF to select missing plugins.

4. Exception in thread "ZephyrRunnable-0" java.lang.UnsupportedClassVersionError: my/example/MyExample01 : Unsupported major.minor version 51.0
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:615)

That message usually means that a jar compiled with a version of java was loaded by a java of an anterior version (e.g. compiled for 1.7, loaded by 1.6).

Best regards,

Thomas 

Sergio Donal

unread,
Aug 27, 2013, 7:49:29 PM8/27/13
to github...@googlegroups.com
Hi Thomas,

Thank you very much for your explanations.

Please, let me ask a new bundle of questions:


1) Reward function.

I am trying to plot the reward function plot in Figure 1 of the Off-PAC paper but I think I am not understanding it well.

I am using the following Matlab script:

numSamples = 1e3;

px
= 0:.01:1;
py
= 0:.01:1;
reward
= zeros(length(px), length(py));

for i=1:length(px)
    disp
(px(i))
   
for j=1:length(py)
        mu1x
= .3; % mean
        sd1x
= .1; % std deviation
        n1x
= px(i) - mu1x + sd1x*randn(numSamples,1);

        mu1y
= .6;
        sd1y
= .03;
        n1y
= py(j) - mu1y + sd1y*randn(numSamples,1);

        mu2x
= .4;
        sd2x
= .03;
        n2x
= px(i) - mu2x + sd2x*randn(numSamples,1);

        mu2y
= .5;
        sd2y
= .1;
        n2y
= py(j) - mu2y + sd2y*randn(numSamples,1);

        mu3x
= .8;
        sd3x
= .03;
        n3x
= px(i) - mu3x + sd3x*randn(numSamples,1);

        mu3y
= .9;
        sd3y
= .1;
        n3y
= py(j) - mu3y + sd3y*randn(numSamples,1);

        reward
(i,j) = mean(-1 + -2.*(n1x.*n1y + n2x.*n2y + n3x.*n3y));
   
end
end

surf
(px, py, reward)

which yields the following saddle point reward function.



What am I doing wrong?

Thanks!
Sergio

Sergio Donal

unread,
Aug 27, 2013, 7:52:19 PM8/27/13
to github...@googlegroups.com
This is the saddle-poing shape of the reward function I get following the latter script:



Thanks!
Sergio

Thomas Degris

unread,
Aug 28, 2013, 5:08:27 AM8/28/13
to github...@googlegroups.com
The reward function implemented in the OffPAC paper is implemented in the method:
static private PuddleWorld createEnvironment(Random random)
of the PuddleWorld-OffPAC demo.

Thomas
Reply all
Reply to author
Forward
0 new messages