Department of Biosciences and Centre for Stem Cell Research – University of Milano
Via Viotti 3/5
20133 Milano (Italy)
email: raffaele...@unimi.it
Hello Raffaele,
Thank you for your question about polyQ regions in the exons of different species. Unfortunately, you are right - the alignments we provide are not a good way to get sequence in repetitive parts of the genome like polyQ regions.
We suggest instead that you make a list of the coordinates of the exons containing your polyQ regions in the human genome. You can then use our liftOver tool (http://genome.ucsc.edu/cgi-bin/hgLiftOver) to find the coordinates of similar regions in other species. Using those coordinates, you can then get sequence from the other species and look for CAG repeats.
For example, the first exon of the HTT gene in the human hg19 assembly has the coordinates chr4:3,076,407-3,076,815. You can use the following steps to obtain sequence from the region most closely aligned to that exon in the rat Mar. 2012 (RGSC 5.0/rn5) assembly.
1. Open the UCSC LiftOver tool by visiting http://genome.ucsc.edu/cgi-bin/hgLiftOver (also available by going to the Tools menu of the UCSC Genome Browser and choosing LiftOver).
2. Select Human Feb. 2009 (GRCh37/hg19) as the original genome/assembly, and select Rat Mar. 2012 (RGSC 5.0/rn5) as the new genome/assembly.
3. Enter the coordinates chr4:3,076,407-3,076,815 into the data box.
4. Click Submit.
5. After some processing, the results will be displayed on the page as a link labeled "View Conversions". The link goes to a BED file. Download and open the BED file.
6. The BED file contains the following coordinates in the rn5 genome: chr14:81941727-81942079.
7. Open the UCSC Genome Browser Gateway page at http://genome.ucsc.edu/cgi-bin/hgGateway, select the Rat Mar. 2012 (RGSC 5.0/rn5) assembly, and enter those coordinates. Click Submit.
8. The browser will now display that region of the rat rn5 genome. You can obtain sequence for that section of the rat genome by browsing to that location, going to the View menu, and choosing DNA.
Note that the HTT gene appears on the - strand of the rat genome, so you will need to check both strands for polyQ data.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. Questions sent to that address will be archived in a publicly-accessible forum for the benefit of other users. If your question contains sensitive data, you may send it instead togenom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
--