value network?

SeongHyeok Kang

unread,

Nov 15, 2016, 4:40:05 AM11/15/16

to Oakfoam

Hi.

I want to understand value network in oakfoam. is this the same value network in alphago?

or is it different?

Detlef Schmicker

unread,

Nov 15, 2016, 4:50:21 AM11/15/16

to oak...@googlegroups.com

It is the same, but I put it in the same network as policy.

one call on a positions gives the result of policy and value network.

Idea was, part of policy network might be helpful for value as well, but it turned out not to help much for high playing strenght :(

Detlef

SeongHyeok Kang

unread,

Nov 15, 2016, 6:26:49 AM11/15/16

to Oakfoam

policy network is included value network?

SeongHyeok Kang

unread,

Nov 15, 2016, 6:31:03 AM11/15/16

to Oakfoam

or oakfoam have the same value and policy network as in alphago?

Detlef Schmicker

unread,

Nov 15, 2016, 6:54:52 AM11/15/16

to oak...@googlegroups.com

Structural they are the same (or at least you can use policy and value networks which are the same, but you can also use value networks, which depend on layers of the policy network. All depends on you caffe network definitions, which is independent from oakfoam using them)

Technically both networks are called at once in oakfoam, so every time you call the policy network the value network "called" too! it is realized by a network with two output layers, one for policy and one for value. At the moment I use a network, which has two independent sub-nets, so conceptually it is the same as alpha go...

Detlef

Am 15.11.2016 um 12:31 schrieb SeongHyeok Kang <scv...@gmail.com>:

SeongHyeok Kang

unread,

Nov 15, 2016, 8:41:55 AM11/15/16

to Oakfoam

umm... oakfoam call just a network. but the network has policy net and value net.

so oakfoam use policy net and value net. right?

Detlef Schmicker

unread,

Nov 15, 2016, 8:53:09 AM11/15/16

to oak...@googlegroups.com

yes, it supports use of only policy network as well ... During loading the net you tell oakfoam, if it has both or only policy net....

Detlef

Am 15.11.2016 um 14:41 schrieb SeongHyeok Kang <scv...@gmail.com>:

SeongHyeok Kang

unread,

Nov 15, 2016, 9:02:00 AM11/15/16

to Oakfoam

Thanks a lot.

can i use only policy network?

is this parameter?

param cnn_value_lambda 0.25

Detlef Schmicker

unread,

Nov 15, 2016, 9:20:08 AM11/15/16

to oak...@googlegroups.com

yes, this must be 0 if only policy net,

let me know if you need a pre trained net ...

Detlef

SeongHyeok Kang

unread,

Nov 15, 2016, 9:24:27 AM11/15/16

to Oakfoam

ok. i'll try it.

by the way, what is prediction network?

Detlef Schmicker

unread,

Nov 15, 2016, 9:57:07 AM11/15/16

to oak...@googlegroups.com

policy, sorry, I used the word prediction net before alphago paper introduced the name policy net :)

Detlef

Am 15.11.2016 um 15:24 schrieb SeongHyeok Kang <scv...@gmail.com>:

SeongHyeok Kang

unread,

Nov 15, 2016, 10:15:41 AM11/15/16

to Oakfoam

ok. anyway,

param cnn_value_lambda 1.0

is it just set this parameter to use only value network?

anything else?

Detlef Schmicker

unread,

Nov 15, 2016, 11:44:10 AM11/15/16

to oak...@googlegroups.com

in pricipal yes, but as oakfoam does playouts in the time needed to get value cnn and as you usually will have set param expand_after (or a simelar name I dont remember at the moment)

you will have the effect, that e.g. 20 playouts are done at a node, than, as value becomes availible replace the result by value net. all nodes not having 20 playouts will use playouts ......

complicated, I know ...

Detlef

Message has been deleted

SeongHyeok Kang

unread,

Nov 20, 2016, 9:04:28 AM11/20/16

to Oakfoam

Hi Detlef.
I want to use only value net. So I change the lambda 1.

But in "lenet_train_test.prototxt", do I have to add some layers?
such as "TanH"
I read it about Hyperbolic Tangent in Caffe site.
And there is a sample in that site.

layer {
  name: "layer"
  bottom: "in"
  top: "out"
  type: "TanH"
}

Do I have to add this layer? or what should I do...?

And I change the "lenet_train_test.prototxt" to "lenet_train_test_value_original.prototxt"!
Could you read my file...?

Thank you:)

lenet_train_test_value_original.prototxt

Detlef Schmicker

unread,

Nov 20, 2016, 9:12:18 AM11/20/16

to Oakfoam

This is ok,

I use the SIGMOID function, which is similar to TanH, but has a result
between 0 and 1, so it is directly interpreted as winrate...

SeongHyeok Kang

unread,

Nov 27, 2016, 6:23:58 AM11/27/16

to Oakfoam

always thanks you.

anyway. can oakfoam reinforce learning?
never? or need very difficult something?

Detlef Schmicker

unread,

Nov 27, 2016, 6:53:19 AM11/27/16

to oak...@googlegroups.com

hmm, at the moment I try to train with only the moves of the winner of the pro games,

I will see, how close this comes to reinforcement.

Reinforcement is not difficult, but I do not have the computational resources to do it.

you need some million selfplay games the way alphago did it. just trying it will take half a year for me...

Detlef

SeongHyeok Kang

unread,

Nov 27, 2016, 8:49:29 AM11/27/16

to Oakfoam

I was wondering if can i take your data about reinforce learning?

SeongHyeok Kang

unread,

Nov 27, 2016, 9:02:29 AM11/27/16

to Oakfoam

I hope to do reinforcement learning. I would like to try reinforcement learning at Oakform now, how can I tell you how?

And I want to know the source code.

Detlef Schmicker

unread,

Nov 27, 2016, 10:46:46 AM11/27/16

to oak...@googlegroups.com

Reinforcement learning is not implemented yet, but it should not be too difficult.

But, as I won't do it the next few weeks, you will have to do it your self :(

I did not read it very carefully in the alphago paper, but my impression was, they did not document all the details, e.g. I did not find anything about learning rates...

My actual source is at

https://bitbucket.org/dsmic/oakfoam

Detlef

Am 27.11.2016 um 15:02 schrieb SeongHyeok Kang <scv...@gmail.com>:

Reply all

Reply to author

Forward