If it were me (ah... wait... I HAVE been here) so, WHEN it was me, I got a debian based Raspberry SERVER distribution and started there.
Then I used deb files to put on qjackctrl and the things IT brings in.
THEN I set up to build jacktrip itself, since the deb files are for an older version than I wanted.
THEN I got jack_delay on and running.
jack_delay was my biggest reason for the home-built.
For MY money, if I'm going to use jack-agent, I'm going to have the read only image. Too many ways for me to break things if I defeat the read-only, and I would not feel right asking technical questions of the VSD team if I defeated one of their BIGGEST tech support tools.
That said, I DID figure out how to persist enabling electret microphone bias current. It's pretty geeky.
Since you are specifically interested in adding other stuff, I would go with what you ask as being a "better" approach. Maybe NOT the desktop, just a server.
The Raspberry is plenty gutsy enough to run jack. If you run a desktop and, say, a recording tool like Ardour... well, that bogs down my laptop, with is a few times more hefty than the Pi.
The killer is presentation space. Audio processing is heavy too, but presentation takes a might bite.
Presentation is via X-Windows. X-Windows works backwards: the X CLIENT goes on the headless Pi SERVER; the X SERVER goes on your separate GUI DESKTOP machine. Doing it this way works fine via SSH with maybe an ssh option or two (depending on your SSH and X-Windows tools.) It takes a bit load off the Pi. More CPU and I/O to handle jacktrip AND the other stuff you mention.
Looking forward to hearing how this goes for you,
Dave