Kristian,
I think this is a really interesting point about Loxone. Firstly, there is no doubt for maximum and easy integration the CasaTunes based Loxone Music Server is the best. I don't have one but can see how it works and sure it integrates well. However I find it expensive for what it is and primarily it's a music jukebox for people who have a significant local music collection. However I don't have many MP3s any longer and I stream all of my music over Spotify (whcih you can do with Loxone of course) so I don't need a whacking great harddrive or similar. Also remember that the Loxone Music Server pumps out line level signal, so you need another amplifier before connecting your speakers. Also note, this does not solve any video distribution whatsoever.
I find this all a bit unnecessary. What I have instead put together is a stack of Denon AVR-X2000 amps each of which gives me 2 zones controllable over IP (and therefore loxone). Using a bit of logic I can easily control each zones power, volume, source and if i'm really clever I can setup lots of control logic to make multiple zones work together or similar multiroom functionality. This means I have the source switching and amplification under control which is nearly everything that the loxone does, except for serving up content. This costs about £100 per zone (plus whatever speakers you want/need).
So, for serving up my content I use a combination of sky box (HD TV + Radio) + amazon fire tv (spotify, airplay + youtube etc). In my case, I have 1 'whole house' sky box and each TV zone has it's own amazon fire TV and we have 1 more amazon fire tv used as a 'whole house' device. Each of these 'whole house devices' is connected to an HDMI splitter with each amp taking a feed from each device. This creates a robust, low cost and very flexible HD AV matrix which is what is the most valuable part of any multi zone AV system. I can easily change source devices as trends change too.
Whilst this appears a bit homemade, I would suggest there is no real config overhead to this, my comms with amps is 2-way and the control is highly robust. I have found that sometimes the genuine denon app doesn't see the amps correctly however the Loxone is still able to read/write to the amps.
Having lived with a Systemline multi room AV and investigated many others I can confirm that this methodology provides many more features and flexibility than a proprietary system (I could even mix and match amps and splitters without any major issue).
With regards to control interface. I would not recommend more than just power, source, volume and maybe channel change without using a different 3rd party interface to control it. That said I haven't investigated the media controller much recently so maybe thats something useful.