template <typename Dtype>
class XavierFiller : public Filler<Dtype> {
public:
explicit XavierFiller(const FillerParameter& param)
: Filler<Dtype>(param) {}
virtual void Fill(Blob<Dtype>* blob) {
CHECK(blob->count());
int fan_in = blob->count() / blob->num();
Dtype scale = sqrt(Dtype(3) / fan_in);
caffe_rng_uniform<Dtype>(blob->count(), -scale, scale,
blob->mutable_cpu_data());
CHECK_EQ(this->filler_param_.sparse(), -1)
<< "Sparsity not supported by this Filler.";
}
};
All the hiddent layers use the weights blob as the parameter of the XavierFiller.Fill(...), like this:
weight_filler->Fill(this->blobs_[0].get());
Are you sure it is right? I do not think it is the fan-in of a unit related to one specific connection. For inner product layer, the fan-in should be the width of the weight blob. And for the convolutional layer, it should be the channel number of the bottom blob multiplies the width and height of the weight blob. Do I have some misunderstanding? Can the author of the caffe answer my questions? Thank you.