Since you are using the default, your batch_size is 100, max iterations is 4,000, and the CIFAR-10 contains 60,000 images.
Using a single GPU, you execute 100 images per iteration * 4,000 iterations = 400,000 images processed in 35 seconds (~11,428 images/sec)
Using two GPU, you execute 200 images per iteration * 4000 iterations = 800,000 images processed in 42 seconds (~19,047 images/sec)
So, in the two GPU case, you process twice as many images for a 20% increase in time. You are processing 167% more images per second.
If you want a same-same type comparison, change the batch_size parameter to 50 and then execute on both GPUs (so you have the same number of images processed). It will likely not be a 2X improvement in performance, but it will show a lower overall time.
I think part of the reason the results look odd is there is some additional overhead required to setup each GPU and the length of training time is so short. If you try training a larger number of iterations or a larger network (GoogLeNet or Alexnet), you should see the increase more dramatically.
Patrick