Hi Emmanuel,
your solution would have been my first approach, too. And it is also probably the only practicable one (without adjusting the code of the convolutional layer). As for the weight sharing: I am pretty sure that caffe actually uses the same memory and the shared data is not taking additional space containing duplicates of it.
But maybe, if you are willing to get your hands dirty, you could look into the code and make the necessary adjustments to the convolutional layer itself. Maybe it is possible to support your scenario by adding a layer param option to the proto and making corresponding changes in the code. Since all the bits and pieces are there, just not exactly in the form you want them to, it should not be too hard to do. You look in other places in the code to see how weight sharing works and how grouping works.
Jan