I want alignment only for the specific amino acid which I am interested at. The full gene alignment is too much amino acid to look for. I want something like the following but instead of nucleotide it has to be amino acid.
You can use the NCBI protein multiple 'COBOLT' aligner to find common protein sequences:
http://www.ncbi.nlm.nih.gov/tools/cobalt/cobalt.cgi
Explanation of CDS FASTA header format
— Whole gene format: geneName_assemblyName peptideLength location
— Exon format: geneName_assemblyName_exonNum_totalExons exonLength inFrame outFrame location
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
Dear Shane,
Thank you for using the UCSC Genome Browser and for sharing the details around the NM_000348 (SRD5A2) entry in the refGene.exonAA.fa.gz file.
We plan to update this file and will include a method to also have a versioned RefSeq ID such as NM_000348.3. Our engineers explain that there is a one base deletion in the reference with respect to the mRNA that has caused this issue in the generation of the earlier file for the exon lines for this gene.
The rebuilt process would correct the mapped AA sequences by creating two separate exons. An example of what it should look like can be seen by clicking into the "CDS FASTA alignment" from multiple alignment link for RefSeq Gene SRD5A2 and then checking the box "Separate into exons" and narrowing the species selection:
>NM_000348_hg19_1_6 30 0 2 chr2:31805881-31805969- MQVQCQQSPVLAGSATLVALGALALYVAKP >NM_000348_hg19_2_6 64 0 2 chr2:31805690-31805880- SGYGKHTESLKPAATRLPARAAWFLQELPSFAVPAGILARQPLSLFGPPGTVLLGLFCLHYFHR
Versus the issue reported in refGene.exonAA.fa.gz
>NM_000348_hg19_1_5 93 0 1 chr2:31805690-31805969- MQVQCQQSPVLAGSATLVALGALALYVAKPPATGSTRRAZSRRLPACQPAPPGSCRSCLPSRCPRGSSPGSPSPSSGHLGRYFWASSAYITST
Thank you again for your message and helping improve the UCSC Genome Browser. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.
All the best,
Brian Lee
UCSC Genomics Institute
--