axtChain oddness

8 views
Skip to first unread message

Adrian Platts

unread,
May 22, 2023, 12:32:56 PM5/22/23
to genome...@soe.ucsc.edu
Hi All,

I'm not certain that this is the appropriate list for this question, apologies if I'm posting to the wrong place.

Essentially the issue I'm seeing is that when I run axtChain with fasta files for reference sequences, I seem to get different results to those
I get when I use 2bit files (or nib files) for reference sequences. 

In the example below I've aligned Capsella rubella and Arabidopsis thaliana using lastz with options --gapped --nochain --strand=both --step=10 --ambiguous=iupac  --format=axt
The output axt file looks ok in terms of the raw alignments I'm seeing.  

I then use axtChain to chain the axt alignments.  And this is where it gets odd.  If I use fasta files for the input, some of the chromosomes seem to crash 
axtChain even though I don't seem to see an error in the output (no error is reported even if I set verbose to 2):

$ ./axtChain -verbose=0 -faQ -faT minScore=5000 -linearGap=loose Caps_v_Thal.axt ../genomes/Caps_genome.fa.masked ../genomes/Thal_genome.fa Caps_v_Thal.axt.chain
$ grep 'Caps_scaffold5 ' Caps_v_Thal.axt.chain | sort -k2,2nr | head -n 10
chain 6317207 Caps_scaffold5 13734597 + 47153 1954805 Thal_scaffold1 30427671 - 25233431 26058793 37
chain 898461 Caps_scaffold5 13734597 + 1260188 1579179 Thal_scaffold1 30427671 + 4406001 4618309 85
chain 701478 Caps_scaffold5 13734597 + 7118885 7263942 Thal_scaffold1 30427671 + 24892242 25018990 103
chain 405333 Caps_scaffold5 13734597 + 670290 1530004 Thal_scaffold1 30427671 - 21066141 21482756 233
chain 403063 Caps_scaffold5 13734597 + 6028077 7464245 Thal_scaffold1 30427671 - 27807337 28189258 239
chain 311083 Caps_scaffold5 13734597 + 7025110 7041335 Thal_scaffold1 30427671 - 11171036 11206485 427
chain 310573 Caps_scaffold5 13734597 + 12135608 12168579 Thal_scaffold1 30427671 + 1978824 2029202 432
chain 309916 Caps_scaffold5 13734597 + 9895529 10046615 Thal_scaffold1 30427671 + 4459144 4576611 435
chain 289676 Caps_scaffold5 13734597 + 1635356 1654549 Thal_scaffold1 30427671 - 25958830 25972219 502
chain 285436 Caps_scaffold5 13734597 + 7025062 7041336 Thal_scaffold1 30427671 - 11160793 11190266 520

(so the top chains here are all to Thaliana chromosome 1, indeed all the chains output are to chromosome 1 suggesting it failed after this?)

But if I compress the genomes to 2bit format then I get:

$ ./axtChain -verbose=0 minScore=5000 -linearGap=loose Caps_v_Thal.axt ../genomes/Caps_genome.fa.masked.2bit ../genomes/Thal_genome.fa.2bit Caps_v_Thal.axt.chain
$ grep 'Caps_scaffold5 ' Caps_v_Thal.axt.chain | sort -k2,2nr | head -n 10
chain 361880734 Caps_scaffold5 13734597 + 1282432 13717728 Thal_scaffold3 23459830 + 9273726 23459789 6
chain 58194096 Caps_scaffold5 13734597 + 19684 13549179 Thal_scaffold2 19698289 + 75416 19664934 12
chain 6317207 Caps_scaffold5 13734597 + 47153 1954805 Thal_scaffold1 30427671 - 25233431 26058793 48
chain 3717716 Caps_scaffold5 13734597 + 1998514 3998274 Thal_scaffold5 26975502 - 10512321 11380596 63
chain 3689786 Caps_scaffold5 13734597 + 5011257 8562359 Thal_scaffold5 26975502 + 6899020 26365664 64
chain 3250174 Caps_scaffold5 13734597 + 6459196 7493524 Thal_scaffold5 26975502 - 2522356 3292397 69
chain 3189762 Caps_scaffold5 13734597 + 8569974 9203094 Thal_scaffold5 26975502 - 6196 574310 70
chain 1904830 Caps_scaffold5 13734597 + 8337712 8391841 Thal_scaffold3 23459830 - 5120917 5167145 87
chain 1078994 Caps_scaffold5 13734597 + 11680404 11793342 Thal_scaffold2 19698289 - 2067619 2190964 104
chain 1076961 Caps_scaffold5 13734597 + 7622979 13531096 Thal_scaffold5 26975502 + 7842906 10643369 105

And here we're getting chains with much better scores that connect alignments found not just on Thaliana scaffold1.

I've checked the thaliana and capsella multifasta reference sequences and neither seem to contain IUPAC characters or blank lines.  Is there some
obvious error in my command line here?  

Best regards

Adrian Platts
MSU

Gerardo Perez

unread,
May 25, 2023, 8:40:20 PM5/25/23
to Adrian Platts, genome...@soe.ucsc.edu

Hello, Adrian.

Thank you for your interest in the Genome Browser and for your question about axtChain.

Could you share with us the files that you are running with axtChain? Or share a smaller test case with the same symptoms? We would understand the issue better if we can replicate it ourselves. If you don't want to share this information with our publicly-archived mailing list (genome...@soe.ucsc.edu), you can send it to our confidential support list at genom...@soe.ucsc.edu.

Please include genome...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Gerardo Perez
UCSC Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Mirror-Specific Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome-mirro...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome-mirror/B04943E0-65D8-4192-923E-B473205042E0%40gmail.com.

Adrian Platts

unread,
May 26, 2023, 12:10:03 PM5/26/23
to Gerardo Perez, genome...@soe.ucsc.edu
Many thanks for your reply - I have sent the raw data to the genom...@soe.ucsc.edu list.
Reply all
Reply to author
Forward
0 new messages