Hi Dan,
I'm confused with the behavior of the learning-rate-factor. I'm doing something like transfer learning and use a pre-trained model. I want to fix the weights of the model so I do something like:
steps/nnet3/xconfig_to_configs.py --existing-model $pretrain_mdl \
--xconfig-file $dir/configs/network.xconfig \
--config-dir $dir/configs/
$train_cmd $dir/log/generate_input_mdl.log \
nnet3-copy --edits="set-learning-rate-factor name=* learning-rate-factor=0" $pretrain_mdl - \| \
nnet3-init --srand=1 - $dir/configs/final.config $dir/input.raw || exit 1;
to generate the input.raw as the initial model. And then I use steps/nnet3/train_raw_dnn.py to train the model.
However, I found that the weights of the pre-trained model in final.raw are very different from those of the input.raw. I checked input.raw and 0.raw as well. Their weights are the same as each other (also the same with the pre-trained model). I'm really confused that why the weights are still changed significantly even if I have set the learning rate factor to 0? Is there anything I missed?
The example of the weights are :
0.raw:
<ComponentName> tdnn1.affine <NaturalGradientAffineComponent> <LearningRateFactor> 0 <MaxChange> 0.75 <LearningRate> 0.0012 <LinearParams> [
2.568433 0.7693889 -0.2381797 0.1525945 -2.246516 0.4466618 1.102006 -0.1432067 -0.5903091 0.8782812 0.1022643 0.08371142 -0.02404941 0.1363059 -0.2316905 -0.2228654 -0.1153994 0.04606071 -0.6212884 0.0971213 -0.4538765 -0.2163398 -0.1846409 0.5193074 1.637326 -0.5269247 0.2706615 -0.4929326 -0.8492494 0.2991662 -0.06685083 -0.484638 -0.07633615 0.04551812 -0.07778752 0.06136333 -0.04509452 0.1947115 -0.1971418 0.2260058 0.2135421 0.1612404 0.01028283 -0.1597385 0.00885525 -0.1910186 -2.443084 1.568953 -2.43982 0.1278329 3.425874 -0.5619374 -2.047221 1.148477 0.7843929 -0.8205558 -0.1938612 0.09076887 0.4777972 -0.6429375 0.2837325 0.4889021 -0.4847438 -0.2476536 0.6346303 0.02845256 -0.6005114 -0.1370015 -0.2807251 -6.516876 1.262585 -1.125957 -2.160191 3.790924 1.487685 -2.502697 -1.167388 2.776567 -0.1657089 -1.498531 -0.3129631 1.025429 -0.1579675 -0.5479382 0.5111938 0.2005295 -0.0936794 0.02654545 0.2907091 0.2999536 -0.07730728 -0.6610407 -7.990298 -1.967343 2.860101 1.211731 -1.953299 0.6001937 1.771805 -1.756328 -0.6340543 1.667708 -0.01905632 -0.6712452 -0.4922445 1.067875 0.1867559 -0.4521364 0.1057445 0.3543701 -0.09882455 0.2526414 0.7321079 0.4256617 -0.6819053
...
final.raw
<ComponentName> tdnn1.affine <NaturalGradientAffineComponent> <LearningRateFactor> 0 <MaxChange> 0.75 <LearningRate> 0 <LinearParams> [
0.02261571 0.006774665 -0.002097233 0.001343634 -0.01978116 0.003932972 0.009703459 -0.001260972 -0.005197822 0.007733486 0.0009004634 0.0007371005 -0.0002117614 0.001200207 -0.002040094 -0.001962385 -0.001016122 0.0004055761 -0.005470607 0.0008551774 -0.003996501 -0.001904927 -0.001625811 0.004572635 0.01441706 -0.004639709 0.002383247 -0.0043404 -0.007477856 0.002634235 -0.0005886379 -0.004267363 -0.0006721598 0.0004007989 -0.0006849389 0.0005403198 -0.0003970687 0.001714486 -0.001735885 0.001990038 0.001880291 0.001419765 9.054293e-05 -0.00140654 7.797277e-05 -0.001681966 -0.02151195 0.01381505 -0.02148324 0.001125602 0.03016569 -0.004948 -0.01802629 0.01011263 0.006906783 -0.007225201 -0.001706998 0.0007992426 0.004207127 -0.00566123 0.002498338 0.004304907 -0.00426829 -0.002180652 0.00558808 0.0002505321 -0.005287655 -0.001206334 -0.002471855 -0.05738275 0.01111738 -0.009914332 -0.01902101 0.03338008 0.01309947 -0.02203688 -0.01027915 0.02444839 -0.001459108 -0.01319495 -0.002755718 0.009029168 -0.001390943 -0.004824736 0.004501191 0.001765712 -0.0008248709 0.0002337395 0.002559767 0.002641168 -0.0006807104 -0.005820629 -0.07035661 -0.01732296 0.02518389 0.0106696 -0.01719929 0.005284857 0.01560119 -0.01546491 -0.005583003 0.0146846 -0.0001677957 -0.005910477 -0.004334336 0.009402923 0.001644435 -0.00398118 0.0009311081 0.003120321 -0.0008701761 0.002224572 0.006446391 0.003748061 -0.006004354
...
I found another thread discussing this problem but it used chain model and I don't think it is related to my case.
Thank you so much.