Hi Everyone,
This is my first post in this usergroup. So please go easy on me if its a stupid question.
I am trying to create a network to recreate the work mentioned in this paper:
GazeFollow. I have attached the prototxt for this network. The network has three input branches.
1) One takes in raw image which I have used the ImageData layer to provide. This layer provides two tops - "data" and "label".
2) The second layer takes in cropped head image which also is provided using ImageData layer. Here, the tops are "face" and "label1".
3) Third layer takes in the position of the eye in the original image when it is split into a 13 x 13 grid. So this value is between 0 and 168. This is provided using the Data layer using lmdb. The tops are "eyes_grid" and "label2"
Since the network is trained end to end (no separate training for each branch), all the labels are the same. This means label, label1 and label2 are identical.
For my final Softmax layer, I have only connected it to label. label1 and label2 are left unconnected (connected to a silence layer).
I realized that after training, the accuracy is really bad. I checked the weights of some of the CNN layers and I believe some branches are not trained well. So I want to make sure it is not because of the way I have connected my labels.
Is it ok to leave label1 and label2 unconnected? If not, what should I do (ImageData expects two tops).
Also, I am curious to make sure that the data between the three inputs is always in sync since I am not explicitly using any shuffling options.
Thank you. I am using this for a project with a tight deadline. So any reply would be appreciated.
Best regards,
Sreejith