There's always the posix standard "msync" call, e.g:
msync(address, 4096, MS_SYNC | MS_INVALIDATE);This is a bit brutal, though: it'll commit/clean, flush and invalidate instruction and data caches (all levels), TLBs and so on, for the specified range. It's not slow, but if you're doing tons of self-modifying code (e.g a JIT) it'll take a small fraction out of your runtime speed.
I'm also not entirely sure modern ARM Linux kernels do the right thing with msync(). The kernel implementation looks entirely different from the last time i saw it a couple of years ago :)
Failing that, the kernel 'swi' interface is pretty stable. It's been around like that for about, hmm, 6-7 years I think? You can always use 'swi' if building for ARM targets, and fall back to msync() otherwise.