human chimp alignments hg19

33 views
Skip to first unread message

Alan Achenbach

unread,
Jan 30, 2018, 10:43:55 AM1/30/18
to gen...@soe.ucsc.edu
Hello,

Why are the length of the alignments and the length of the given positions in the axt alignment files frequently not equal.  This should not be possible.  For example the human chimp alignment first line has a 5296 length alignment and the positions equal 5290. 


Thank you for your time,

Alan Achenbach

Brian Lee

unread,
Feb 2, 2018, 5:17:38 PM2/2/18
to Alan Achenbach, gen...@soe.ucsc.edu
Dear Alan,

Thank you for using the UCSC Genome Browser and your question about the lengths of alignments and the length of the given positions in the axt alignment files having non-equal values.

It isn't clear about the 5296 and 5290 values referenced, but I have included an example below from the panTro4.hg19.net.axt that might help illustrate differences.

If you look at the axt file definition here, http://genome.ucsc.edu/goldenPath/help/axt.html, you will find the following values for the columns and accompanying sequence lines: 
  • Alignment number -- The alignment numbering starts with 0 and increments by 1, i.e. the first alignment in a file is numbered 0, the next 1, etc.
  • Chromosome (primary organism)
  • Alignment start (primary organism) -- The first base is numbered 1
  • Alignment end (primary organism) -- The end base is included.
  • Chromosome (aligning organism)
  • Alignment start (aligning organism)
  • Alignment end (aligning organism)
  • Strand (aligning organism) -- If the strand value is "-", the values of the aligning organism's start and end fields are relative to the reverse-complemented coordinates of its chromosome.
  • Blastz score -- Different blastz scoring matrices are used for different organisms.See the README.txt file in the alignments directory for scoring information specific to a pair of alignments.
  • Sequence lines primary assembly (line 2) and aligning assembly (line 3).
Here is an abbreviated version of the first line from the panTro4.hg19.net.axt file, you can find the full value further below. 

0 chr1 77637 78679 chr1 87772 88869 + 91752
CCAGCCCTCTTGGCC-TGTGGC-AATTTTTTCTTTT-T ....t---------------------------------------------------ct....CCTCTGA
CCAGCCCTCTTG... ... ... CCTCTGA

You can see that indeed if you subtract to find the differences in the primary organism coordinates (panTro4: 78679-77637 = 1,042) it will be different than the values in the aligning organism coordinates (hg19: 88869-87772 = 1,097), but in the primary assembly sequence line you can see where there isn't a match in more than one place (most notable gct---------------------------------------------------ctt) as seen in this alignment of hg19 to panTro4: http://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=brianlee&hgS_otherUserSessionName=hg19.panTro4

Here is the full top line of panTro4.hg19.net.axt:
0 chr1 77637 78679 chr1 87772 88869 + 91752
CCAGCCCTCTTGGCC-TGTGGC-AATTTTTTCTTTT-TAGCCTCATAAAATCACATTATTTGAGTGCCCATGGCTCCAAAACAAGCAGGGATGCCCATGGACCCTGATTATCCATTGTCACCCTTCCCTCCAAACAGCCACCTCTCCCCTGGAGACAGCCCCATACTCCACTCAGACCTGTGTACTTCCTGGTATCCTTGTCACCTGCTTTTTATGCCTCATTTTACAAACACCAAATTGGAAGACAGCAGGAGCTGCCCCATAATACCAGTAAAGTGAGAAGCAGAGATAAATTAGTCCTAGACAACCGACTCATGTTGGGGGCAGCCCACTCACAGTGGCCCTGACCCAACTCTGACTAGAGGCCACTTGctctcaacaccagggtgctcaatggcccgtcctggtactctgctcttctctctccaccttcgctttcctgcaatctatgcagcctgtgactccatccatgggctagagacccccagaccttttcctgggaccacaggcctgtgtctctatctgctgctcaacacctccccttgaacatccatggctaaaactgagctcctgacactctctccatacccgtttctctgtggattccccacctccacgaaggacagcttcatcttttcagctactcaggccagaagactgaagtcatctccttctccaggaaatcgtattgggggagctacaaatatccaaaatccgatcgcttctcctccactacacccgaggcccgccacccatttttgcctgaattgctgcagcagcctcctaaccaatctctgctttcacgtgggcacctcagttttttccagaacaacaaccagagagatctactcacacccaagtcagaccaggttactcctctgctctcatagcatttggaggaaaacccagagtgct---------------------------------------------------cttgccctcagcacccagagtgctcgtgacggccagcaaagcccggccccatctcctctgaacttccacctctcgccctctgcacc-agAGTGCTCGTGACGGCCAGCAGAGCCGGCCCCCATCTCCTCTGA
CCAGCCCTCTTGGCCCTGTGGCCAATTTTTTCTTCAATAGCCTCATAAAATCACATTATTTGAGTGCCCATGGCTCCAAAACAAGCAGGGATGCCCATGGACCCTGATTATCCATTGTCACCCTTCCCTCCAAACAGCCACCTCTCCCCTGGAGACAGCCCCATACTCCACTCAGACCTGTGCACTTTCTGGTATCCTTGTCACCTGCTTTTTATGTCTCATTTTACAAACACCAAATTGGAAGACAGCAGGAGCTGCCCCATAATACCAGTAAAGTGAGAAGCAGAGATAAACTAGTCCTAGACAGCCGACTCATGTTGGGGGCAGCCCACTCACAGTGGCCCTGACCCAACTCTGACTAGAGGCCACTTGctctcaacaccagggtgctcaatggcccgtcctggtactctgctcttctctctccaccttcgctttcctgcaatctatgcagcctgtgactccatccatgggctagtgacccccagaccttctcctgggaccacaggcctgtgtctctatctgctgctcaatacctcccctcgaacatccatggctaaaactgagctcctgatactctctccctacccgcttctctgtggattccccacctccgcgaaggacagcttcatcctttcagctactcaggccagaagattgaagtcatctccttctccaggaaatcgtattgagggagctacaaatatccaaaatccgatcgcttctcctccactacacccgaggcccgccacccatttttgcctgaattgctgcagcagcctcctaaccgatctctgctttcacgtgggcacctcagttttttccagaacaacaaccagagagatctgctcacacccaagtcagaccaggttactcctctgctctcatagcatttggaggaaaacccagagtgctcgtgttggccggcagagccggcccccatctcctctgacctcctccccacctcttgccctcagcacccagagtgctcgtgacggccagcagagccagcctccatctcctctgacctcccacctctcgccctcagcaccCAGAGTGCTCGTGTTGGCCAGCAAAGCCGGCCCCCATCTCCTCTGA

Thank you again for your inquiry and for using the UCSC Genome Browser. If you have any further questions and reply to gen...@soe.ucsc.edu messages will be archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UC Santa Cruz Genomics Institute

Training videos & resources: http://genome.ucsc.edu/training/index.html
Want to share the Browser with colleagues?
Host a workshop: http://bit.ly/ucscTraining

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/A27CC31880704940A9AF45B4405DF85E32D9BCC7%40X-MB9.xds.umail.utah.edu.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Reply all
Reply to author
Forward
0 new messages