Question regarding different species liftover

43 views
Skip to first unread message

TALENTI Andrea

unread,
May 13, 2020, 12:38:26 PM5/13/20
to gen...@soe.ucsc.edu

Good morning,

I’m writing you about the genome liftover pipeline used for different species.

Looking at the doBlastzChainNet.pl, it looks like you proceed in the following way:

  1. Align with blastz with proper parameters and save as lav
  2. Convert to psl with lavToAxt -dropSelf | axtToPsl
  3. Chain everything with chainAxt | chainAntiRepeat
  4. Merge and sort with chainMergeSort
  5. Make a net file with chainPreNet | chainNet | netSyntenic
  6. Make the liftover chain file with netChainSubset | chainStitchId

 

Is this correct?

 

Also, since I’m trying to use lastz instead of blastz, I’d like to know if it is possible to obtain similar results with the following workflow:

  1. Align pairs of sequences (source to target) using lastz and save as axt
  2. Concatenate the axt files for every pairwise alignments of source/target sequences
  3. Use the new concatenated axt as input for chainAxt
  4. Proceed as above

 

Thank you in advance,

All the best

 

Andrea Talenti

The Roslin Institute, University of Edinburgh,

Easter Bush Campus,

Midlothian, EH25 9RG

 

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Hiram Clawson

unread,
May 13, 2020, 12:49:20 PM5/13/20
to TALENTI Andrea, gen...@soe.ucsc.edu
Good Morning Andrea:

This process at UCSC has been using 'lastz' since about 2008.
The legacy name 'blastz' is sometimes used in procedure names and
scripts but that is just symbolic, it is not the program we are using.
The reason we are using lav output and the conversions to psl is
due to the types of inputs we give to lastz and the resulting
output types from lastz given those .2bit input formats. Our input
and output formats have been selected to optimize the compute times
for the procedure.

We can not say if a different procedure would produce the same results.

--Hiram

On 5/13/20 7:55 AM, TALENTI Andrea wrote:
> Good morning,
> I’m writing you about the genome liftover pipeline used for different species.
> Looking at the doBlastzChainNet.pl, it looks like you proceed in the following way:
>
> 1. Align with blastz with proper parameters and save as lav
> 2. Convert to psl with lavToAxt -dropSelf | axtToPsl
> 3. Chain everything with chainAxt | chainAntiRepeat
> 4. Merge and sort with chainMergeSort
> 5. Make a net file with chainPreNet | chainNet | netSyntenic
> 6. Make the liftover chain file with netChainSubset | chainStitchId
>
> Is this correct?
>
> Also, since I’m trying to use lastz instead of blastz, I’d like to know if it is possible to obtain similar results with the following workflow:
>
> 1. Align pairs of sequences (source to target) using lastz and save as axt
> 2. Concatenate the axt files for every pairwise alignments of source/target sequences
> 3. Use the new concatenated axt as input for chainAxt
> 4. Proceed as above

Hiram Clawson

unread,
May 13, 2020, 1:10:03 PM5/13/20
to TALENTI Andrea, gen...@soe.ucsc.edu
The lifting is done on both target and query sequences because both
can be broken up into chunks. Typical sizes of inputs are
20,000,000 bases for target sequence, 10,000,000 for query sequence
with a 10,000 base overlap in the query chunks. When individual
sequences are not that large then multiple sequences can be chunked
together into those two inputs to prevent a proliferation of
individual compute jobs. I try to set sizes of everything to
keep the compute job count under 100,000.

On 5/13/20 9:59 AM, TALENTI Andrea wrote:
> Hi Hiram,
> Thank you very much for your answer. I got two more questions: the liftUp step after the conversion to psl is needed only if you chunk the genome by size, but not if you create a fasta per sequence, am I correct? Also, you chunk only the sequence to lift (source) or the one the is lifted (target)?
>
> Thanks again
> Andrea
> ________________________________
> From: Hiram Clawson <hi...@soe.ucsc.edu>
> Sent: Wednesday, May 13, 2020 5:49:14 PM
> To: TALENTI Andrea <Andrea....@ed.ac.uk>; gen...@soe.ucsc.edu <gen...@soe.ucsc.edu>
> Subject: Re: [genome] Question regarding different species liftover

TALENTI Andrea

unread,
May 13, 2020, 2:08:04 PM5/13/20
to gen...@soe.ucsc.edu, Hiram Clawson
Hi Hiram,
Thank you very much for your answer. I got two more questions: the liftUp step after the conversion to psl is needed only if you chunk the genome by size, but not if you create a fasta per sequence, am I correct? Also, you chunk only the sequence to lift (source) or the one the is lifted (target)?

Thanks again
Andrea

From: Hiram Clawson <hi...@soe.ucsc.edu>
Sent: Wednesday, May 13, 2020 5:49:14 PM
To: TALENTI Andrea <Andrea....@ed.ac.uk>; gen...@soe.ucsc.edu <gen...@soe.ucsc.edu>
Subject: Re: [genome] Question regarding different species liftover
 
Reply all
Reply to author
Forward
0 new messages