value network?

74 views
Skip to first unread message

SeongHyeok Kang

unread,
Nov 15, 2016, 4:40:05 AM11/15/16
to Oakfoam
Hi.

I want to understand value network in oakfoam. is this the same value network in alphago?
or is it different?

Detlef Schmicker

unread,
Nov 15, 2016, 4:50:21 AM11/15/16
to oak...@googlegroups.com
It is the same, but I put it in the same network as policy.

one call on a positions gives the result of policy and value network.

Idea was, part of policy network might be helpful for value as well, but it turned out not to help much for high playing strenght :(

Detlef
--
You received this message because you are subscribed to the Google Groups "Oakfoam" group.
To unsubscribe from this group and stop receiving emails from it, send an email to oakfoam+u...@googlegroups.com.
To post to this group, send email to oak...@googlegroups.com.
Visit this group at https://groups.google.com/group/oakfoam.
For more options, visit https://groups.google.com/d/optout.

SeongHyeok Kang

unread,
Nov 15, 2016, 6:26:49 AM11/15/16
to Oakfoam
policy network is included value network?

SeongHyeok Kang

unread,
Nov 15, 2016, 6:31:03 AM11/15/16
to Oakfoam
or oakfoam have the same value and policy network as in alphago?

Detlef Schmicker

unread,
Nov 15, 2016, 6:54:52 AM11/15/16
to oak...@googlegroups.com
Structural they are the same (or at least you can use policy and value networks which are the same, but you can also use value networks, which depend on layers of the policy network. All depends on you caffe network definitions, which is independent from oakfoam using them)

Technically both networks are called at once in oakfoam, so every time you call the policy network the value network "called" too! it is realized by a network with two output layers, one for policy and one for value. At the moment I use a network, which has two independent sub-nets, so conceptually it is the same as alpha go...

Detlef

Am 15.11.2016 um 12:31 schrieb SeongHyeok Kang <scv...@gmail.com>:

or oakfoam have the same value and policy network as in alphago?

--

SeongHyeok Kang

unread,
Nov 15, 2016, 8:41:55 AM11/15/16
to Oakfoam
umm... oakfoam call just a network. but the network has policy net and value net.
so oakfoam use policy net and value net. right?

Detlef Schmicker

unread,
Nov 15, 2016, 8:53:09 AM11/15/16
to oak...@googlegroups.com
yes, it supports use of only policy network as well ... During loading the net you tell oakfoam, if it has both or only policy net....

Detlef

Am 15.11.2016 um 14:41 schrieb SeongHyeok Kang <scv...@gmail.com>:

umm... oakfoam call just a network. but the network has policy net and value net.
so oakfoam use policy net and value net. right?

--

SeongHyeok Kang

unread,
Nov 15, 2016, 9:02:00 AM11/15/16
to Oakfoam
Thanks a lot.

can i use only policy network? 

is this parameter?
param cnn_value_lambda 0.25 

Detlef Schmicker

unread,
Nov 15, 2016, 9:20:08 AM11/15/16
to oak...@googlegroups.com
yes, this must be 0 if only policy net,

let me know if you need a pre trained net ...

Detlef
--

SeongHyeok Kang

unread,
Nov 15, 2016, 9:24:27 AM11/15/16
to Oakfoam
ok. i'll try it.

by the way, what is prediction network?

Detlef Schmicker

unread,
Nov 15, 2016, 9:57:07 AM11/15/16
to oak...@googlegroups.com
policy, sorry, I used the word prediction net before alphago paper introduced the name policy net :)

Detlef

Am 15.11.2016 um 15:24 schrieb SeongHyeok Kang <scv...@gmail.com>:

ok. i'll try it.

by the way, what is prediction network?

--

SeongHyeok Kang

unread,
Nov 15, 2016, 10:15:41 AM11/15/16
to Oakfoam
ok. anyway,

param cnn_value_lambda 1.0

is it just set this parameter to use only value network?
anything else?

Detlef Schmicker

unread,
Nov 15, 2016, 11:44:10 AM11/15/16
to oak...@googlegroups.com
in pricipal yes, but as oakfoam does playouts in the time needed to get value cnn and as you usually will have set param expand_after (or a simelar name I dont remember at the moment) 

you will have the effect, that e.g. 20 playouts are done at a node, than, as value becomes availible replace the result by value net. all nodes not having 20 playouts will use playouts ......

complicated, I know ...

Detlef
--
Message has been deleted

SeongHyeok Kang

unread,
Nov 20, 2016, 9:04:28 AM11/20/16
to Oakfoam
Hi Detlef.
I want to use only value net. So I change the lambda 1.

But in "lenet_train_test.prototxt", do I have to add some layers?
such as "TanH"
I read it about Hyperbolic Tangent in Caffe site.
And there is a sample in that site.
layer {
  name: "layer"
  bottom: "in"
  top: "out"
  type: "TanH"
}

Do I have to add this layer? or what should I do...?

And I change the "lenet_train_test.prototxt" to "lenet_train_test_value_original.prototxt"!
Could you read my file...?

Thank you:)
lenet_train_test_value_original.prototxt

Detlef Schmicker

unread,
Nov 20, 2016, 9:12:18 AM11/20/16
to Oakfoam
This is ok,

I use the SIGMOID function, which is similar to TanH, but has a result
between 0 and 1, so it is directly interpreted as winrate...

SeongHyeok Kang

unread,
Nov 27, 2016, 6:23:58 AM11/27/16
to Oakfoam
always thanks you.

anyway. can oakfoam reinforce learning?
never? or need very difficult something?

Detlef Schmicker

unread,
Nov 27, 2016, 6:53:19 AM11/27/16
to oak...@googlegroups.com
hmm, at the moment I try to train with only the moves of the winner of the pro games,

I will see, how close this comes to reinforcement.


Reinforcement is not difficult, but I do not have the computational resources to do it.

you need some million selfplay games the way alphago did it. just trying it will take half a year for me...

Detlef

SeongHyeok Kang

unread,
Nov 27, 2016, 8:49:29 AM11/27/16
to Oakfoam
I was wondering if can i take your data about reinforce learning?

SeongHyeok Kang

unread,
Nov 27, 2016, 9:02:29 AM11/27/16
to Oakfoam
I hope to do reinforcement learning. I would like to try reinforcement learning at Oakform now, how can I tell you how?

And I want to know the source code.

Detlef Schmicker

unread,
Nov 27, 2016, 10:46:46 AM11/27/16
to oak...@googlegroups.com
Reinforcement learning is not implemented yet, but it should not be too difficult.

But, as I won't do it the next few weeks, you will have to do it your self :(

I did not read it very carefully in the alphago paper, but my impression was, they did not document all the details, e.g. I did not find anything about learning rates...

My actual source is at

Detlef

Am 27.11.2016 um 15:02 schrieb SeongHyeok Kang <scv...@gmail.com>:

I hope to do reinforcement learning. I would like to try reinforcement learning at Oakform now, how can I tell you how?

And I want to know the source code.

--
Reply all
Reply to author
Forward
0 new messages