The epsilon 4 allele of APOE is the strongest known genetic risk factor for AD with a two- to three-fold increased risk for AD in people with one epsilon 4 allele rising to about 12-fold in those with two alleles. APOE genotyping was performed at the time of participant enrollment and included in the ADNI database. The two SNPs (rs429358, rs7412) that define the epsilon 2, 3, and 4 alleles, are not on the Human610-Quad BeadChip, and therefore were genotyped using DNA extracted by Cogenics from a 3 mL aliquot of EDTA blood.
If you are looking at the BeadChip data, then you won't find the two SNPs (rs429358, rs7412) because they are not on the BeadChip.
The problem with the error you shared is that rs28615667 is on both chrX and chrY and plink --update-name wants everything in the file to be unique. Our engineer shares some bash commands to get only single-nucleotide variants (not insertions/deletions) and to remove duplicates:
You can check for the locations of rs429358 and rs7412 in that file:
Hello, Stephen.
Thank you for your interest in the UCSC Genome Browser.
Regarding the updated plink files results not changing, what changes were you expecting? The rs# IDs are meant to be stable across human genome assembly versions, although in practice there can be exceptions, especially in indel variants. For extremely well-studied common SNPs, it is not surprising for the rsID in hg18 to match the rsID in hg38. Could you provide an example of an expected change in rs# so we could look into it more closely?
I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.
Gerardo Perez
UCSC Genomics Institute
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAF%3DP_nJKywNweaMOYGy477ymrb82UWKC7P0MerqdtNpOVTX3cw%40mail.gmail.com.
Hello,
Thank you for sending your follow-up inquiry.
The genomic sequence changes from one assembly version to the next, so we expect the same genomic position, like "chr1:1234567", to represent a different sequence in hg18 vs. hg38. However, rs# IDs are supposed to be stable across genome assemblies despite the changing genomic positions. Regardless of the genome assembly version, "rs429358" refers to the same polymorphic variant in the 4th exon of the ApoE gene with T as the major allele and C as the minor allele. Its genomic position in hg18 is chr19:50103781, and its genomic position in hg38 is chr19:44908684.
If you start with the hg18 position chr19:50103781 and successfully map it to the corresponding position in hg38, then the result should be chr19:44908684. You should find the same rs# ID (rs429358) in both assemblies at the corresponding positions. For more context, dbSNP started assigning rs#s in the 1990s, before the human genome was assembled. At that point, SNPs were defined by flanking sequences with the varying base(s) in the middle -- local sequence context, with no genomic location because there was no genome.
The mapping from one genome assembly to the next is automatically generated from sequence alignments and is not complete nor perfect, just a best attempt. The stability of rs# IDs is why we recommend mapping across assemblies using rs# IDs first and then falling back to using cross-assembly mappings to look up the positions of items that didn't map using that method. When an rs# ID in hg18 is not found in the hg38 dbSNP data, that rs# ID has been withdrawn or replaced by a different rs# ID. In that case, mapping the hg18 position of the disappeared rs# to hg38 and looking for a new rs# at the corresponding position may help to find the new rs# (if there is one).
To help summarize the previous steps in this email thread:
1. Translate the plink file that has hg18 genomic coordinates and rs IDs to UCSC BED4 (https://genome.ucsc.edu/FAQ/FAQformat.html#format1)
2. Make a file of all rs IDs
3. Use the Table Browser to map that file of rs IDs to hg38 coordinates and rs IDs (BED4 output fields)
4. Make a file containing rsIDs that were not mapped by the Table Browser
5. Use the BED file from step 4 of rs IDs to extract (from the BED4 file created in the first step) a UCSC BED4 file of hg18 positions of rs IDs that were not mapped by the Table Browser
6. Run LiftOver on that file to get hg38 positions (BED4) of rs IDs that were not mapped by the Table Browser
7. Use the Data Integrator to map those hg38 positions to new rs IDs where possible (BED4 but discard the original name/rs ID and use the name/rs ID from hg38 dbSNP)
8. Combine the earlier Table Browser rs ID-mapped hg38 BED4 with the liftOver/Data Integrator-mapped BED4
9. Beware duplicates that will cause downstream problems with plink -- you need to decide whether to remove duplicates as unreliable or resolve duplicates.
10. After resolving/removing duplicates and carefully checking, translate the final BED4 back to the plink format with hg38 genomic coordinates
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
Jairo Navarro
UCSC Genome Browser
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAJQGhA1OUnvzs3w_wxh25wEvyYYX1QDPrZzegfXJ-fEFASb%3D4A%40mail.gmail.com.