With my current thoughts I discovered 5 brain regions :
1. region : learn to map state-action pairs to a "successor state" (learn the underlaying physics)
2. region : learn to map a state to a reward
3. region : the agent, which learns to map a state-reward pair to an action
4. region : learn to map a state to an action
Each region can be run in a seperate thread.
If we want to reuse the brain for multiple domains,
we need an autoencoder (which I do not understand completely)
5. region : if we have 3 domains with n,m and k elemtns in the state vector:
for domain 1 : use an autoencoder, which encodes an n-element-vector into a min(m,n,k)-element vector
for domain 2 : use an autoencoder, which encodes an m-element-vector into a min(m,n,k)-element vector
for domain 3 : use an autoencoder, which encodes an k-element-vector into a min(m,n,k)-element vector
Input and output of region 1 are connected to the real world.
input and output of region 2 are connected to the real world.
The input of region 3 is connected to the outputs of region 1 and 2.
The input of region 4 is connected to the outputs of region 1 and 3.
If we are asking for behaviour, we just ask region 4 after or during training.
I'm not sure how to connect region 5. Maybe between 1 and 3 ?
But if so, how do we train region 5 ?
For training region 5, we would need input-output pairs to be presented to it, but we only know the input.
You people here use autoencoders. How do they work ? Where do you we get the output-portion, which must be presented
to region 5, which need input and output for training ?
Conclusion : 1 RL agent and 4 abstractors (i.e. neural nets or fourier basises)
Agree ?