The current behavior is implemented so that dropout layer can be used simply as a regularizer at training time, and can be removed at test time.
If you leave undropped inputs unchanged during training, then you'll have to scale bottom by 1/(1-p) at test time, and dropout layer must be kept, which is sort of inconvenient.