linking genetics IDs (IID) to roster ID

329 views
Skip to first unread message

Jeff Phillips

unread,
Jun 20, 2015, 3:39:57 PM6/20/15
to adni...@googlegroups.com
Hi folks,

We are doing some genetics analysis but are having difficulties linking the IID variable in the GWAS data with roster IDs. We initially assumed the IID represented [SITEID]_S_[RID] or perhaps [SITEID]_S_[ID], but based on these assumptions, we have found none of the cases present in the genetics data in DXSUM or other tables.

Our best guess is that the genetics data uses distinct IDs from the rest of the study, and that we have been unable to find reference to some key file that associates the two sets of IDs. Can anyone verify this or otherwise tell us how to interpret genetics IDs? See below for a detailed description of the steps our RA took in investigating this issue.

Thanks,

Jeff Phillips

Postdoctoral Fellow
Penn Frontotemporal Degeneration Center
University of Pennylvania
Philadelphia, PA

*************

I am using the ADNI Data from ADNI’s servers to link specific SNPs genomic and diagnostic data. I used plink to pull out the genomic data for each specific SNP. However, when trying to link the diagnostic data with the genomic data, the identities don’t seem to quite match up. I’ll go over every file and process I used in order to obtain this data.

  1. I downloaded the .PLINK formatted for the ADNI Omni2.5M microarray SNP Data.

  2. I thank ran through the file using plink, and looking at specific SNPs in a .raw format.

  3. All this information was then organized into a .csv file, then sorted based on the IID column.

  4. The ADNI_Gene_Expression_Profile_DICT from the ADNI_Gene_Expression_Profile.zip file on the ADNI website provided with the following formula for subject identification:

    1. Subject ID including site; first three numbers indicate site (ZZZ), last four numbers indicate unique subject ID (XXXX, the RID); ZZZ_S_XXXX

  5. This format for ID is consistent with how the ADNI diagnosis data presents itself and is identified in the data dictionary. Upon comparison of the IDs from the two groups:

    1. Neither of the IDs matched one another from each group

    2. Also the RID was a non-specific identifier in the ADNI diagnosis data

  6. I took an ADNI WGS Data – CASAVA SNV call Set 01 of 29.vcf file to make certain the IDs from the plink data and actual IDs match. They do, meaning the genetic data ID doesn’t specifically match the diagnostic data.
**************

Jeff Phillips

unread,
Jun 20, 2015, 3:52:21 PM6/20/15
to adni...@googlegroups.com
I've answered my own question by referring to this post:

http://adni.loni.usc.edu/support/experts-knowledge-base/question/?QID=335

For anyone else who may encounter the issue in the future, the key field and table are PTID in ROSTER.csv.

Jeff
Reply all
Reply to author
Forward
0 new messages