That would be great, I'd love to see it added! I was hoping to reproduce the bottleneck results first and get some input from other users as to whether or not everything looks correct.
Your Readme says "I am using a smaller batch size because of hardware constraints, 32 instead of 128.". This could make quite a difference, as observed athttps://github.com/KaimingHe/resnet-1k-layers/#notes. How long is the training time? Maybe I can run it on one of our GPUs with batch size 64 or 128.
I saw that as well, I'm able to run batches of 64 on my GTX 980. It takes about 11 hours for me, with no real difference in performance between that and 32.
Looking at the Facebook code, they have a maxpool after the first convolution for bottleneck. There is no mention of using that maxpooling layer in 'Identity Mappings ...' but Kaiming He's 1k layers code defines it in the beginning but never uses it. I'm wondering if it is actually a necessary component.