Hi,
If I understand correctly, you have trained two GMMs, one for real speech (gmm_nat in your example) and another for synthetic speech (gmm_synt). Then, for a test speech sample, you are computing the difference between log likelihoods like in your line:
llr_gmm_score[k] = gmm_nat(mfcc)-gmm_synt(mfcc)
Generally speaking, the result of this subtraction will be positive if the test sample is likelier to be a real sample, since its log likelihood for gmm_nat will be larger than log likelihood for gmm_synt. If the result is negative, the sample is probably synthetic. A side note: you mentioned that you are using CQCC features, but I'm a bit confused why it is called MFCC when you are computing the log likelihoods.
But in a nutshell, if the score is positive, the test sample is likely real and if negative it is likely synthetic. But in a database evaluation, it is better to compute a threshold on the scores from the development set (and the threshold may be slightly different from 0) that will help you separate two classes better in your specific database.
I hope I could shed some light on your question.
Best,
Pavel