In linux kernel, multicast loopback is done in the TX part of the stack. As you wrote, since we are bypassing the kernel, and we have a separated stack for each process, we can't implement loopback properly in the software layer.
In latest MLNX_OFED versions there was a change to allow loopback by the NIC. If you upgrade to latest MLNX_OFED/VMA MC loopback should work.
(for IB it was already working, and now it is working for ETH as well).
Regarding VMA_TX_MC_LOOPBACK, you are right.
In linux, the IP_MULTICAST_LOOP socket option allow to control the MC loopback behavior.
The control in the kernel is done in the TX flow.
VMA_TX_MC_LOOPBACK control the default value for this socket option.
But, since we can't currently control it in TX flow, we filter the loopback messages on RX flow.
In the future, I hope we will be able to control the loopback from the TX flow, by instructing the NIC to do loopback per packet.
For now, the proper name is probably VMA_RX_MC_LOOPBACK as you mentioned.
I will consider changing it, or leaving it as is for current users consistency.