Gaps in maf files

14 views
Skip to first unread message

Marie-Laurence Cossette

unread,
Jan 11, 2023, 12:33:04 PM1/11/23
to gen...@soe.ucsc.edu
Hi, 

I have been following the whole genome alignment howto page http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto to run some alignments on species that haven’t been included and I noticed that my maf files do not contain any gaps compared to ones I have downloaded from UCSC. I was wondering if this will be a problem once I try to merge everything with multiz? If so where might I have gone wrong? 

My lastz parameters are:M=0 K=3000 L=3000 Y=9400 E=30 H=2000 O=400 T=1

E.g. my human-shrew alignment 

s hg38.chr1              722392 60 + 248956422 CCACACACTTTGGGGGTGGTGGAACCTGGTAAAAGCTCACCTCCCACCATGGAGGAGGAG

s mSunEtr1.NC_064861.1 86644041 60 +  97742847 CCCCACACTTTGGGTGGGGTGGACCCTGGCAAAGGTCCCCCTCCCACCACAGAGGAAGAG


Your human-shrew alignment 

s hg38.chr1          64679 96 + 248956422 TCTGTATTATGCAAAATTTGTCTATGTTACACTTTTTTAACAACACAATCCTATTGCCCTTGAAATCTTCTTCAAAGCATTTCTCGAGTCACTCTT

s sorAra2.JH798183 5262313 82 +  20915803 TTTGCTTCATGCAAAACT-ATCTATGTTCCATTTCTTCAGTAGCAAATTC——————ACTCTACCTCAAAGCATTTTTTAATTTACTTTT


Thanks,

Marie


Jairo Navarro Gonzalez

unread,
Jan 25, 2023, 7:17:56 PM1/25/23
to Marie-Laurence Cossette, gen...@soe.ucsc.edu

Hello,

Thank you for using the UCSC Genome Browser and sending your inquiry.

The regions you shared are from two different locations in the hg38 genome. There isn't a requirement that there are gaps in a given alignment. If you are interested in how the Conservation track was generated on hg38, you can learn more from the following makedoc:

https://genome-source.gi.ucsc.edu/gitlist/kent.git/blob/master/src/hg/makeDb/doc/hg38/multiz100way.txt

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu.
All messages sent to that address are archived on a publicly accessible Google Groups forum.
If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Jairo Navarro

UCSC Genome Browser


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/885AC320-14E3-410D-AD43-49F9B3A12A72%40trentu.ca.

Marie-Laurence Cossette

unread,
Jan 26, 2023, 3:06:10 PM1/26/23
to gen...@soe.ucsc.edu


Begin forwarded message:

From: Marie-Laurence Cossette <mcos...@trentu.ca>
Subject: Re: [genome] Gaps in maf files
Date: January 25, 2023 at 9:30:38 PM EST
To: Jairo Navarro Gonzalez <jnav...@ucsc.edu>

Hi,
Thanks for the answer,  another concern I have is that I get a very high number of alignments that are all very short compared too any other alignment mafs from UCSC. I have high quality genomes with less than 400 scaffolds and that have been soft masked. Any idea why?

Thanks,
Marie

Hiram Clawson

unread,
Jan 26, 2023, 3:19:39 PM1/26/23
to gen...@soe.ucsc.edu
Good Afternoon Marie:

The quality of your alignments will depend on the lastz parameters
and the chain minScore.

This will depend upon how phylogenetic distant is your query genome
to this hg38 (primate) genome.

For example, primate-primate hg38 vs. gorGor6 used:

##aligner=lastz.v1.04.00 Y=15000 T=2 M=254 O=600 H=2000
Q=/hive/data/staging/data/blastz/human_chimp.v2.q K=4500 E=150
##matrix=lastz.v1.04.00 16
90,-330,-236,-356,-330,100,-318,-236,-236,-318,100,-330,-356,-236,-330,90
##gapPenalties=lastz.v1.04.00 O=600 E=150
##blastzParms=O=600,E=150,K=4500,L=4500,M=254

From the DEF file:

# human vs gorilla
BLASTZ=/cluster/bin/penn/lastz-distrib-1.04.00/bin/lastz
BLASTZ_T=2
BLASTZ_O=600
BLASTZ_E=150
BLASTZ_M=254
BLASTZ_K=4500
BLASTZ_Y=15000
BLASTZ_Q=/hive/data/staging/data/blastz/human_chimp.v2.q
# A C G T
# A 90 -330 -236 -356
# C -330 100 -318 -236
# G -236 -318 100 -330
# T -356 -236 -330 90

# TARGET: human hg38
SEQ1_DIR=/hive/data/genomes/hg38/hg38.2bit
SEQ1_LEN=/hive/data/genomes/hg38/chrom.sizes
SEQ1_CHUNK=20000000
SEQ1_LAP=10000
SEQ1_IN_CONTIGS=0

# QUERY: gorilla gorGor6
SEQ2_DIR=/hive/data/genomes/gorGor6/gorGor6.2bit
SEQ2_LEN=/hive/data/genomes/gorGor6/chrom.sizes
SEQ2_CHUNK=20000000
SEQ2_LAP=0
SEQ2_LIMIT=50

BASE=/hive/data/genomes/hg38/bed/lastzGorGor6.2019-11-20
TMPDIR=/dev/shm

with chain parameters:

zcat ../../pslParts/$1*.psl.gz \
| axtChain -psl -verbose=0 -scoreScheme=/hive/data/staging/data/blastz/human_chimp.v2.q
-minScore=3000 -linearGap=medium stdin \
/hive/data/genomes/hg38/hg38.2bit \
/hive/data/genomes/gorGor6/gorGor6.2bit \
stdout \
| chainAntiRepeat /hive/data/genomes/hg38/hg38.2bit \
/hive/data/genomes/gorGor6/gorGor6.2bit \
stdin $2


Whereas primate - mammal hg38 vs bosTau9:

##aligner=lastz.v1.04.00 T=2 M=254 O=400 H=2000 E=30
##matrix=lastz.v1.04.00 16 91,-114,-31,-123,-114,100,-125,-31,-31,-125,100,-114,-123,-31,-114,91
##gapPenalties=lastz.v1.04.00 O=400 E=30
##blastzParms=O=400,E=30,K=3000,L=3000,M=254

from DEF file:

# human vs Cow
BLASTZ=/cluster/bin/penn/lastz-distrib-1.04.00/bin/lastz
BLASTZ_T=2
BLASTZ_O=400
BLASTZ_E=30
BLASTZ_M=254
# default BLASTZ_Q score matrix:
# A C G T
# A 91 -114 -31 -123
# C -114 100 -125 -31
# G -31 -125 100 -114
# T -123 -31 -114 91

# TARGET: human hg38
SEQ1_DIR=/hive/data/genomes/hg38/hg38.2bit
SEQ1_LEN=/hive/data/genomes/hg38/chrom.sizes
SEQ1_CHUNK=20000000
SEQ1_LAP=10000
SEQ1_IN_CONTIGS=0

# QUERY: Cow bosTau9
SEQ2_DIR=/hive/data/genomes/bosTau9/bosTau9.2bit
SEQ2_LEN=/hive/data/genomes/bosTau9/chrom.sizes
SEQ2_CHUNK=20000000
SEQ2_LIMIT=10
SEQ2_LAP=0

BASE=/hive/data/genomes/hg38/bed/lastzBosTau9.2018-11-08
TMPDIR=/dev/shm

with chain parameters:

zcat ../../pslParts/$1*.psl.gz \
| axtChain -psl -verbose=0 -minScore=3000 -linearGap=medium stdin \
/hive/data/genomes/hg38/hg38.contigs.2bit \
/hive/data/genomes/bosTau9/bosTau9.2bit \
stdout \
| chainAntiRepeat /hive/data/genomes/hg38/hg38.contigs.2bit \
/hive/data/genomes/bosTau9/bosTau9.2bit \
stdin $2

See if your parameters are similar to these.

--Hiram

On 1/26/23 12:05 PM, Marie-Laurence Cossette wrote:
>
>
>> Begin forwarded message:
>>
>> From: Marie-Laurence Cossette <mcos...@trentu.ca>
>> Subject: Re: [genome] Gaps in maf files
>> Date: January 25, 2023 at 9:30:38 PM EST
>> To: Jairo Navarro Gonzalez <jnav...@ucsc.edu>
>>
>> Hi,
>> Thanks for the answer, another concern I have is that I get a very high number of alignments that are all very short compared too any other alignment mafs from UCSC. I have high quality genomes with less than 400 scaffolds and that have been soft masked. Any idea why?
>>
>> Thanks,
>> Marie
>>
>>> On Jan 25, 2023, at 7:17 PM, Jairo Navarro Gonzalez <jnav...@ucsc.edu> wrote:
>>>
>>> Hello,
>>>
>>> Thank you for using the UCSC Genome Browser and sending your inquiry.
>>>
>>> The regions you shared are from two different locations in the hg38 genome. There isn't a requirement that there are gaps in a given alignment. If you are interested in how the Conservation track was generated on hg38, you can learn more from the following makedoc:
>>>
>>> https://genome-source.gi.ucsc.edu/gitlist/kent.git/blob/master/src/hg/makeDb/doc/hg38/multiz100way.txt
>>>
>>> I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu <mailto:gen...@soe.ucsc.edu>.
>>> All messages sent to that address are archived on a publicly accessible Google Groups forum.
>>> If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu <mailto:genom...@soe.ucsc.edu>.
>>>
>>> Jairo Navarro
>>
>>> UCSC Genome Browser
>>>
>>>
>>> On Wed, Jan 11, 2023 at 9:33 AM Marie-Laurence Cossette <mcos...@trentu.ca <mailto:mcos...@trentu.ca>> wrote:
>>>> Hi,
>>>>
>>>> I have been following the whole genome alignment howto page http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto to run some alignments on species that haven’t been included and I noticed that my maf files do not contain any gaps compared to ones I have downloaded from UCSC. I was wondering if this will be a problem once I try to merge everything with multiz? If so where might I have gone wrong?
>>>>
>>>> My lastz parameters are:M=0 K=3000 L=3000 Y=9400 E=30 H=2000 O=400 T=1
>>>>
>>>> E.g. my human-shrew alignment
>>>> s hg38.chr1 722392 60 + 248956422 CCACACACTTTGGGGGTGGTGGAACCTGGTAAAAGCTCACCTCCCACCATGGAGGAGGAG
>>>> s mSunEtr1.NC_064861.1 86644041 60 + 97742847 CCCCACACTTTGGGTGGGGTGGACCCTGGCAAAGGTCCCCCTCCCACCACAGAGGAAGAG
>>>>
>>>> Your human-shrew alignment
>>>> s hg38.chr1 64679 96 + 248956422 TCTGTATTATGCAAAATTTGTCTATGTTACACTTTTTTAACAACACAATCCTATTGCCCTTGAAATCTTCTTCAAAGCATTTCTCGAGTCACTCTT
>>>> s sorAra2.JH798183 5262313 82 + 20915803 TTTGCTTCATGCAAAACT-ATCTATGTTCCATTTCTTCAGTAGCAAATTC——————ACTCTACCTCAAAGCATTTTTTAATTTACTTTT
>>>>
>>>> Thanks,
>>>> Marie
>>>>
>>>>
>>>> --
>>>>
>>>> ---
>>>> You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu <mailto:genome+un...@soe.ucsc.edu>.
>>>> To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/885AC320-14E3-410D-AD43-49F9B3A12A72%40trentu.ca <https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/885AC320-14E3-410D-AD43-49F9B3A12A72%40trentu.ca?utm_medium=email&utm_source=footer>.
>>
>

Hiram Clawson

unread,
Jan 26, 2023, 3:31:18 PM1/26/23
to gen...@soe.ucsc.edu, Marie-Laurence Cossette
>> Begin forwarded message:
>>
>> From: Marie-Laurence Cossette <mcos...@trentu.ca>
>> Subject: Re: [genome] Gaps in maf files
>> Date: January 25, 2023 at 9:30:38 PM EST
>> To: Jairo Navarro Gonzalez <jnav...@ucsc.edu>
>>
>> Hi,
>> Thanks for the answer, another concern I have is that I get a very high number of alignments that are all very short compared too any other alignment mafs from UCSC. I have high quality genomes with less than 400 scaffolds and that have been soft masked. Any idea why?
>>
>> Thanks,
>> Marie
>>
>>> On Jan 25, 2023, at 7:17 PM, Jairo Navarro Gonzalez <jnav...@ucsc.edu> wrote:
>>>
>>> Hello,
>>>
>>> Thank you for using the UCSC Genome Browser and sending your inquiry.
>>>
>>> The regions you shared are from two different locations in the hg38 genome. There isn't a requirement that there are gaps in a given alignment. If you are interested in how the Conservation track was generated on hg38, you can learn more from the following makedoc:
>>>
>>> https://genome-source.gi.ucsc.edu/gitlist/kent.git/blob/master/src/hg/makeDb/doc/hg38/multiz100way.txt
>>>
>>> I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu <mailto:gen...@soe.ucsc.edu>.
>>> All messages sent to that address are archived on a publicly accessible Google Groups forum.
>>> If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu <mailto:genom...@soe.ucsc.edu>.
Reply all
Reply to author
Forward
0 new messages