I don't believe Caffe supports SLI mode. The two GPUs are treated as separate cards.
When you run Caffe and add the '-gpu' flag (assuming you are using the command line), you can specify which GPU to use (-gpu 0 or -gpu 1 for example). You can also specify multiple GPUs (-gpu 0,1,3) including using all GPUs (-gpu all).
When you execute using multiple GPUs, Caffe will execute the training across all of the GPUs and then merge the training updates across the models. This is effectively doubling (or more if you have more than 2 GPUs) the batch size for each iteration.
In my case, I started with a NVIDIA GTX 970 (4GB card) and then upgraded to a NVIDIA GTX Titan X (Maxwell version with 12 GB) because my models were too large to fit in the GTX 970. I can run some of the smaller models across both cards (even though they are not the same) as long as the model will fully fit into the 4GB of the smaller card. Using the standard ImageNet model, I could execute across both cards and cut my training time in half.
If I recall correctly, other frameworks (TensorFlow and maybe the Microsoft CNTK) support splitting a model among different nodes to effectively increase the available GPU memory like what you are describing. Although I haven't personally tried either one, I understand you can define on a per-layer basis where the layer executes.
Patrick