Indeed it's possible to mix different backends (even CUDA + CPU, or CUDA + OPENCL, for example)
How to configure it:
backend: either
roundrobin or
multiplexing, it's hard to guess which one will give higher nps. There is also
demux which you can try but probably it's not good idea to use it in asymmetric configurations, it will be slower. But it doesn't hurt to try. Also for some of configurations (mainly demux, although it may help with others too), it may make sense to increase
minibatch-size from 256 to, say, 512.
Also, optimal number of theads may be 2, 3 or 4, depending on hardware configuration and backend (roundrobin or multiplexing or demux).
As for backend-opts, here it is, the same for all cases. If GPU0 is 1080ti and GPU1 is 2080ti
(gpu=0,backend=cudnn),(gpu=1,backend=cudnn-fp16)
or if your GPU0 is 2080ti and GPU1 is 1080ti:
(gpu=1,backend=cudnn),(gpu=0,backend=cudnn-fp16)
For two 2080ti's it will be
(gpu=0,backend=cudnn-fp16),(gpu=1,backend=cudnn-fp16)
P.S. There are many ways to write the same configuration, that's why there are many variants mentioned on the forum. In the end they are all identical all work the same way.
E.g. instead of
(gpu=0,backend=cudnn),(gpu=1,backend=cudnn-fp16)
it's possible to write e.g.
cudnn(gpu=0),cudnn-fp16(gpu=1)
and instead of
(gpu=0,backend=cudnn-fp16),(gpu=1,backend=cudnn-fp16)
it's possible to write
backend=cudnn-fp16,(gpu=0),(gpu=1)