Hi Sudha,
For the sake of explicitness (even though I understand this is not your case): we avoid analyzing effect sizes obtained through S-PrediXcan. We focus on zscores.
Zscore direction can change depending on methodological differences on the GWAS, but in general this could mean the models are unreliable in those cohorts.
I would investigate which variants are in play (the models are sqlite databases, and you can query them programmatically to this end for your gene/s of interest).
The models contain variants that are at least moderately independent in the GTEx cohort, but that doesn't mean that variants couldn't be in LD in your cohort. Thus I would check the variants summary statistics in the GWAS, if they arise from imputation or were measured, etc.
Do many other genes have different directions in the two cohorts?
It makes sense to treat the second cohort as "replication", but I would investigate it further to check for any other potential source of error.
S-MultiXcan has more power to detect associations, but since it uses many models it can potentially amplify LD mismatches between GTEx and the target cohort. We typically refer to a separate colocalization measure like ENLOC for additional evidence on whether a causal mechanism underlies the gene-trait association.
Best,
Alvaro