Just a bit of math to back the measures up.
The GUI data thread actually works with "ticks", processing at once all all the data gathered between the last tick and the current one. The ticks are generated by the audio hardware with a time between them equal to audio_buffer_size/audio_sample_rate. In the default case 1024/44800, which is 22,9ms. The maximum latency is thus the time between ticks plus the added latency of the processing and output system. This is in the worst case, when the event happens just after a tick has been issued and has to wait a whole tick time to be processed. In the best case, when the event happens just before a tick, the latency can be just the processing and output. That's why reducing the buffer size not only reduces the maximum latency but also the variance, as there is less room for variation between ticks. Note that the audio buffer size is the max size, but not the actual number of samples (which will be always lower than that).
In theory there is also a lower limit you can get, due not just the USB transfer size but the dara burst size from the FPGA, but I wouldn't change those numbers. It should be around 10ms@30KS/s in the best transfer conditions but worst case scenario (event just after the tick). Your 8ms are lower but near, and it's normal that most of the events happen at around the middle of the buffer than just at the beggining, so the measurements make sense.
We'll probably change the dialog box so instead of specifying a buffer size users can enter a max latency target and we calculate the needed buffer size (but probably leaving the audio sample rate unchanged, as setting it lower than the source sample rate would make everything stop working).
I have to run some checks, though, because I'm not sure that a 64 or 128 buffer size is fully stable. I'll do that when I'm back from SfN to come up with a minimum buffer size that is sure to work.