TSEM output: the same gene is reported several times for the same tissue

41 views
Skip to first unread message

Giuseppe Fanelli

unread,
May 6, 2024, 1:07:03 PMMay 6
to Genomic SEM Users
Hello everyone,
By inspecting the output of the multivariate TWAS (usergwas step) for one of the factors of interest specified in the model, I obtained several rows containing the same gene for the same tissue. 
I also inspected the input derived from the preparation of the FUSION sumstats (read_fusion) step and noticed that the repeated rows corresponding to the same genes for the same tissue differed only with respect to β and SE of one of the phenotypes (among eight) considered.

Is this something expected, is there a bug or do you think it might be due to some mistake on my part?

Please let me know, I'm ready to provide further details if needed.
Many thanks, Giuseppe

Giuseppe Fanelli

unread,
May 10, 2024, 11:33:32 AMMay 10
to Genomic SEM Users
Just as an addition, neither the input sumstats nor the LD refs contain duplicate rsIDs (I checked this based on a similar previous thread on the group).
Thanks again, Giuseppe

agro...@gmail.com

unread,
May 10, 2024, 12:27:03 PMMay 10
to Genomic SEM Users
Hi Giuseppe, 

Do you also see the duplicate gene for that one trait in the univariate TWAS output or does it only appear as a duplicate after running read_fusion? 

Best, 
 Andrew

Message has been deleted

Giuseppe Fanelli

unread,
May 10, 2024, 1:00:49 PMMay 10
to Genomic SEM Users
Hi Andrew,
thanks for the reply.
I also see the duplicates for that one trait (but not for the others) in the univariate TWAS..

Giuseppe

agro...@gmail.com

unread,
May 10, 2024, 1:11:49 PMMay 10
to Genomic SEM Users
Hi Giuseppe, 

In that case it's not necessarily an error on your part, but the duplicate is happening outside of GenomicSEM (i.e., at the univariate TWAS stage) and read_fusion is just taking that duplicate forward. 

Since the issue is from the TWAS step it's harder for me to diagnose as far as why you are getting that duplicate. I imagine you are using the same set of gene expression reference weights for all your traits, so it's strange that you are getting the duplicate for only one of your traits. Possible steps forward would be to: 

1. If you did any kind of data merging of the univariate TWAS output ensure that there are no errors there (e.g., did you split running the TWAS across separate runs and potentially reran one gene in the same tissue multiple times). 

2. Did you create your own gene expression reference weights? If so, double check those to see why you might have a duplicate there. 

3. Try rerunning the univariate TWAS for that chromosome for that trait and see if you get the duplicate again. 

4. If none of these help diagnose the issue then the simplest way forward would be to delete that duplicate row for that trait prior to running read_fusion. Not the most ideal as it still doesn't tell us why the duplicate is appearing in the first place, but it's ultimately one gene out of thousands and would help you move forward with analysis. 

Best, 
  Andrew

giupepp...@gmail.com

unread,
Jun 24, 2024, 11:24:46 AMJun 24
to Genomic SEM Users
Many thanks Andrew!
best, Giuseppe
Reply all
Reply to author
Forward
0 new messages