Hi Joel. Great question. There is a signal-processing answer and there is a neuroscience answer. I'll start with the signal-processing answer.
The key insight is that neural signals are highly non-stationary (with some constraints). That's why we talk about frequency bands instead of precise frequencies. It makes sense to talk about "alpha" as energy between ~8 and ~12 Hz, but it doesn't really make sense to talk about energy at 11.23442 Hz. So, if you would extract a single Fourier coefficient from two channels, then over time, the phase difference is trivially stationary, as you noted. But a single Fourier coefficient is not really interpretable in brain data. Therefore, you would filter from 8-12 Hz, and then there will be frequency fluctuations over time between the two channels (or, more generally, two oscillators). Those two oscillators can be synchronized, desynchronized, or transiently synchronized. Even if you are considering only a single Fourier coefficient, because of temporal nonstationarities, it doesn't make sense to compute one Fourier coefficient over an entire recording of, say, 10 minutes. Instead, you would compute the FFT over 2-second intervals, and compare the phase differences between two channels at that Fourier coefficient across the distribution of 2-second intervals. In this case, synchronization is definitely not guaranteed, because the signal is changing over time.
I hope that explanation makes sense. The neuroscience explanation is more theoretical, but is grounded in >150 years of thinking and empirical data. The idea is that oscillations are used to group assemblies of neurons at multiple spatial scales, from local (dozens or hundreds of neurons within, say, 1 mm) to long-range (many mm's to cm's). This is attributable to oscillations reflecting mass-subthreshold potentials, and thus neurons that are "tuned" into an oscillation will all be relatively hyperpolarized or relatively depolarized at the same time, which means that the probability of action potentials is modulated by a circular clock. At a larger spatial scale, neural networks that are phase-synchronized should be able to transfer information better. You can find more detailed explanations of these ideas in review papers, e.g,. by Pascal Fries, Ole Jensen, Gyorgy Buzsaki, and many others.
For this reason, phase synchronization is generally assumed to be a measure of narrowband functional connectivity. It's also important to keep in mind that the signal-processing and statistics aspects of coherence (like with most other analyses applied in neuroscience) are fairly well mapped out, while the exact neurophysiological interpretation remains open, debated, and likely to change over time with new data, experiments, and theories.
Mike