Good Afternoon Diana:
We use the 'autoMZ' command which uses the 'roast' command
in the multiz/tba system. For example:
autoMZ + T=$tmp E=hg38 "`cat tree.nh`" hg38.*.sing.maf $chromPart
This is run in a directory where all the *.sing.maf files have
been collected together and the tree.nh specifies the required
order of alignments. In this case, the result output named by '$chromPart'
would be for one of the large pair-wise maf files split on large gap boundaries to
provide manageable sized maf files for large alignments.
For example, the tree.nh in this example is the 30-way alignment on hg38:
((((((((((((hg38 panTro5) panPan2) gorGor5) ponAbe2) nomLeu3) (((((rheMac8
(macFas5 macNem1)) cerAty1) papAnu3) (chlSab2 manLeu1)) ((nasLar1 colAng1)
(rhiRox1 rhiBie1)))) (((calJac3 saiBol1) cebCap1) aotNan1)) tarSyr2)
(((micMur3 proCoq1) (eulMac1 eulFla1)) otoGar3)) mm10) canFam3) dasNov3)
Each of the *.sing.maf files would have a name such as:
hg38.panTro5.sing.maf
hg38.gorGor5.sing.maf
... etc ... for each species in tree.nh
And the maf file 's' lines would specify the species name and the chromosome,
for example:
$ zgrep "^s "hg38_chr1.00.maf.gz | awk '{print $2}' | sort -u
hg38.chr1
panTro5.chr1
panTro5.chr11
panTro5.chr12
panTro5.chr15
panTro5.chr17
panTro5.chr19
panTro5.chr7
panTro5.chrUn_NW_015974624v1
(NOTE: We are depending on the dots in those names to separate the species name
from the chromosome name. You will need to avoid species names and chromosome
names with dots in them.)
The usage message from autoMZ indicates:
> roast.v3: roast -- reference guided multiple alignment.
> args: [+-] [R=?] [M=?] [P=?] [T=?] [X=?] [C=?] E=reference-species species-guid-tree maf-source destination
> R(30) dynamic programming radius.
> M(1) minimum block length of output.
> P(multiz) multiz: single coverage for reference row multic: no requirement on single coverage.
> T(/tmp) specify alternate temp directory
> X(0) utilize maf files with different suffix from differnt post processing.
> 0: .sing.maf from single coverage pairwise alignment
> 1: .toast.maf from full size toast
> 2: .toast2.maf from reduced size toast
You can follow the example of these processing steps for example in:
https://genome-source.gi.ucsc.edu/gitlist/kent.git/raw/master/src/hg/makeDb/doc/hg38/multiz30way.txt
It is mostly a big game of list making and book keeping to manage all the files and
get everything into place to run the autoMZ command.
--Hiram
On 6/15/20 2:30 PM, Diana Moreno Santillán wrote:
> To whom I may concern
>
> Hello, I am trying to run a Multiz analysis in 37 pairwise alignments in
> maf format.
>
> I am ready to run Multiz but I have not been successful, and I was
> wondering if you have run TBA-MULTIZ with your previous alignments.
>
> My command is:
>
> tba "(tree)" *.*.maf > bats_aln.out
>
> The error is:
>
> tba.v12: no alignment found for *ajamaicensis *and shondurensis.
>
> Accoding to the TBA manual, the name on the tree must match with the name
> on the sequences, so I edit the files so I have the same name in the three,
> inside the maf file and finally the maf filename (I highlighted in red),
> but the error is the same.
>
> I renamed my maf files in two different ways and both failed (raegyp is the
> reference genome):
> raegyp.*ajamaicensis*.maf
> *ajamaicensis*.raegyp.maf
>
> My tree is:
> ((((((((((((*ajamaicensis *shondurensis) cperspicillata) (pdiscolor
> tsaurophila)) (acaudifer gsoricina)) mhirsuta) drotundus) mcalifornicus)
> mblainvillei) pparnellii) nleporinus) ((((((mbrandtii mlucifugus) (mdavidii
> mmyotis)) mfeae) ((pkuhlii efuscus) lborealis)) (mschreibersii
> mnatalensis)) (tbrasiliensis mmolossus))) ((((rsinicus rferrumequinum)
> (harmiger hgaleritus)) (cthonglongyai mlyra)) (((palecto pvampyrus)
> ehelvum) msobrinus)))
>
> My maf file for artibeus looks like this:
>
> ##maf version=1 scoring=blastz
> ##matrix=axtChain 16
> 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91
> ##gapPenalties=axtChain O=400 E=30
> a score=13977.000000
> s raegyp:scaffold_m13_p_1 18989 863 + 185846147
> ATGAGATGACAGCAGAAATTA--TTTTA----GACATGTAATATGGTGTTATTCCCCAAATTAACACAAAATGGCGGCAGA-AATGAATATTCATGACCAGCTACATTTGGATGTTATTCCTCAAATGCCCACAAGATGGCCGCAGAAAAAAATTTTA--TGATGACATTAACCTGAGTGTTATTATCAAAATGTCAACGACATAGCAGCAGATAACAATTTTTTATAACTTGTACTCCAATTTTATTCTA-----AAGATTCACTTGAAATGGCAGCAGAGAACCACTTTATAACGACATATGCTGGAGTGTTATTTTTTAAATGTGCATGACATAGTAGTTAACA-----TTTTTCTT---GACAGGCACTGGGGTGTCATTCCCAAAATGCCCAGCAGATGGCAGTACAGAATGATTTTTCATGATGACA-GCATTTGGCTGTTACTCCTAAAATGCTCACCAGATGGCAGCAGAGGATCATTTATTTT---------------TACTT------------------ATGCTCTTG--TGTAAATTACAAGACGCACAAAATATTGGGGCATAGGATGAATTTTCATGATGCCCTACAGTAAGGTTTTATTCTCAAAGTTTCCACCAGATGCCAATGGAAAATGTTTGTTAATGTATACCCC----TGGAAGTAATTTTCGAAATGCCCACGAGATGGCAGCAGACAATGA--TTTTTCATGAAGTACACTCCTATGTTCATCCCAAGATTCACCAGAAATGAGAGCAGAGAAAAATTTTGTGCCATCATGCCC--TTGGGTGTTAGTTTTAAAATACGCATGACGTAGCAGCAGAAATA---ACTTTTCAT-GACATGTACTGTGGTGTTATTTGCTAAATGTTCACTACATAGCAGCAGAAAA------------------------------------TGACATT------------TAATCACTTACCCTCTAA
> s* ajamaicensis*:fragScaff_scaffold_502 50728 973 + 473472