Inconsistent query coordinates between chain and net alignments (Mm10 VS proCap1)

12 views
Skip to first unread message

Guillaume L

unread,
Mar 30, 2017, 12:07:36 PM3/30/17
to gen...@soe.ucsc.edu
Hi all,

In short:
I wonder why the query (proCap1) coordinates are not the same in net alignments and in chain alignments (looking at the same chainId).

In more details:

I tried to find the homologous genomic sequence of one gene (SMCHD1) in the rock hyrax (procavia Capensis). Unfortunately several hyrax contig are aligned to the mouse region (chr17:71344493-71475343) and I would like to retrieve a (hypothetical) continuous sequence.

Doing my request in the mysql database, I noticed that coordinates of fragments in proCap1 weren't the same, for a same "chainId" between the chain and net tables.

Here are my requests:

    use mm10;

    # Chain
    SELECT tName,tStart,tEnd,qStrand,qName,qStart,qEnd,score,id FROM chainProCap1
    WHERE (tName="chr17"
        AND (71344493 <= tStart AND tStart <= 71475343
            OR 71344493 <= tEnd AND tEnd <= 71475343))
    ORDER BY tStart;
    
    # Net
    SELECT level, tName,tStart,tEnd,strand,qName,qStart,qEnd,score,chainId FROM netProCap1
    WHERE (tName="chr17" AND type="top"
        AND (71344493 <= tStart AND tStart <= 71475343
            OR 71344493 <= tEnd AND tEnd <= 71475343))
    ORDER BY tStart;


I am not going to show the entire output here (18 and 19 rows), just the first row of each:

# Chain:
tName | tStart   | tEnd     | qStrand | qName           | qStart | qEnd   | score | id
chr17 | 71344307 | 71345537 | -       | scaffold_115863 |   4129 |   5390 | 33044   | 98261
# Net 
level | tName | tStart   | tEnd     | strand | qName           | qStart | qEnd   | score | chainId
1 | chr17 | 71344307 | 71345537 | -      | scaffold_115863 |    270 |   1531 | 33044 |   98261

This is the same piece of alignment in both cases, why are *qStart* and *qEnd* not matching?


Thanks in advance for your help. I realize I should probably use the .axt alignment, but now these different coordinates are puzzling me.

PS: If you want (a bit) more details, I asked the same question here, with the complete output : https://www.biostars.org/p/244058/

Best regards,

Guillaume

Cath Tyner

unread,
Apr 4, 2017, 3:42:10 PM4/4/17
to Guillaume L, UCSC Genome Browser Public Help Forum
Hello Guillaume,

Thank you for contact the UCSC Genome Browser support team.

In short, chain and net tables use different conventions for representing query coordinates for opposite-orientation alignments; net uses coordinates from the forward strand of the query and chain uses coords from the reverse strand of the query. Each chain includes a sequence of aligned blocks. In order to list the target and query blocks in the same order, and still have ascending start coords in each list, we have to use query reverse strand coords. This isn't necessary for the net format; we can use "+" strand query coords because the net format has nested "fill" and "gap" elements that each have a single start and end in both target and query.

Below, we are looking at the scaffold_115863 in hyrax. The entire scaffold is 5,660 bp.
scaffold_115863 
pos1 (5')..........................................................pos5660 (3') 

NET " + " coords
(start)|............[270 (A)-----1531 (B)]..............................| (end) 

CHAIN " - " coords
(end)|..............[5390 (C)----4129 (D)]..............................| (start) 

scaffold_115863 reverse orientation
pos5660 (3')............................................................pos1 (5')

If you start counting from the left of the scaffold, you count 270 bp to the start A or C. If you start counting from the right, you count 5390 bp to reach A or C.
If you start counting from the left of the scaffold, you count 4129 bp to B or D. If you start counting from the right, you count 1531 bp to reach B or D.
Genomic size is the same for these regions, 5660bp.

Note that these coords can be transformed back and forth by subtracting from the query sequence length:

5660 - 270 = 5390
5660 - 1531 = 4129

Please respond to this list if you have further questions!

Thank you again for your inquiry and for using the UCSC Genome Browser. 
​Please send new and follow-up questions to one of our UCSC Genome Browser mailing lists below:

  * Post to the Public Help Forum: E
mail 
gen...@soe.ucsc.edu
​ or search the Public Archives
​  * Post to the Mirror Help Forum: Email
 
genome...@soe.ucsc.edu 
or search the Mirror Archives​
​  * Confidential/private help: Email
 
genom...@soe.ucsc.edu

UCSC Genome Browser Announcements List (email alerts for new data & software):
  * Subscribe: Email genome-announce+subscribe@soe.ucsc.edu 
  * Unsubscribe: Email genome-announce+unsubscribe@soe.ucsc.edu

Join us on Social Media! FacebookTwitter, Wordpress BlogYouTube

​Enjoy,​
Cath
. . .
Cath Tyner
UCSC Genome Browser, Software QA & User Support
UC Santa Cruz Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAD7a9k7rwbyMXmXhUFqXh4YRu3KXgkuqjfOj80EMomwvHn7vfg%40mail.gmail.com.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Reply all
Reply to author
Forward
0 new messages