converting AGP to .chain, help required

350 views
Skip to first unread message

Stephane Plaisance | VIB |

unread,
Jun 6, 2017, 10:34:29 AM6/6/17
to gen...@soe.ucsc.edu
Dear,

I am still trying to produce liftover chains to transfer annotations from an original NGS assembly to one scaffolded using BioNano Genomics hybridscaffold (related to: https://groups.google.com/a/soe.ucsc.edu/forum/#!topic/genome/HQOMmNzdDPc)

I have a AGP file resulting from the scaffolding of my original NGS assembly and heard that AGP could be converted to chain using LiftUp
However, the inline help of LiftUp is not clear to me (to say the least).

Could someone provide and example command for converting my 'scaffolded_asembly.agp' to a 'asm_to_scaffolded.chain’ and adding other required inputs?

liftUp [-type=.xxx] destFile liftSpec how sourceFile(s)

Where for instance do I get ‘LiftSpec’?
What should ‘how' be?

Finally, can this command also be used to produce the reverse chain (adding -chainQ  to the command??) 

Thanks in advance
Stephane

stephane....@vib.be

Cath Tyner

unread,
Jun 7, 2017, 1:45:57 PM6/7/17
to Stephane Plaisance | VIB |, UCSC Genome Browser Public Help Forum
Hi Stephane,

Thanks for contacting the UCSC Genome Browser support team. Below are steps that you can try to accomplish your goal:

wget -O agpToLift \
"http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=blob_plain;f=src/utils/agpToLift"


chmod +x agpToLift

This constructs a lift file:

./agpToLift file.agp > file.liftUp

Then, you can use liftUp. For example, on a .bed file:

liftUp newCoordinates.bed file.liftUp oldCoordinates.bed

You can download the liftUp command from the userApps directory on hgdownload:

rsync -a -P rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/liftUp ./

Please respond to this mailing list if you have further questions as you move forward, so that our support team can assist if needed.

Thank you for contacting the UCSC Genome Browser support team. 
​Please send new and follow-up questions to one of our UCSC Genome Browser mailing lists below:

  * Post to the Public Help Forum: E
mail 
gen...@soe.ucsc.edu
​ or search the Public Archives
​  * Post to the Mirror Help Forum: Email
 
genome...@soe.ucsc.edu 
or search the Mirror Archives​
​  * Confidential/private help: Email
 
genom...@soe.ucsc.edu

UCSC Genome Browser Announcements List (email alerts for new data & software):
  * Subscribe: Email genome-announce+subscribe...@soe.ucsc.edu 
  * Unsubscribe: Email genome-announce+unsubscri...@soe.ucsc.edu

Join us on Social Media! FacebookTwitter, Wordpress BlogYouTube

​Enjoy,​
Cath
. . .
Cath Tyner
UCSC Genome Browser, Software QA & User Support
UC Santa Cruz Genomics Institute


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CE0DC8FC-75D3-4398-855C-C90281765C98%40vib.be.
For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout.

Stephane Plaisance | VIB |

unread,
Jun 8, 2017, 10:21:19 AM6/8/17
to Cath Tyner, UCSC Genome Browser Public Help Forum
Thanks a lot for this code Cath,

I have two short questions.

* Can I produce a bonafide liftOver ‘.chain' file using my Fasta and AGP inputs and the created file.liftUp and without having to align sequences? which extra steps do I need (any command primer would be welcome)?

I created the file.liftUp as described and after removing my ‘#’ header lines from the AGP (I had to use -revStrand because of reverse features))

# next liftUp command does not seem to work
liftUp my_agg.chain my_agg.liftUp warn my_agg_noheader.agp 

Got 7664 lifts in my_agg.liftUp
Lifting my_agg_noheader.agp
Expecting at least 12 words line 1 of my_agg_noheader.agp

# AGP has only 9 columns!

$ head -3 my_agg_noheader.agp
Super-Scaffold_2 1 634629 1 W 000000F_018_pilon 1 634629 +
Super-Scaffold_4 1 1183857 1 W 000060F_pilon 1 1183857 +
Super-Scaffold_4 1183858 1184356 2 N 499 scaffold yes map


* Can I produce a reverse-mapping file from my AGP like for instance by inverting the columns between target and query in my AGP and use the swapped AGP as input?

This puts me on the track to finally succeed with my annotation transfer, I am very grateful for sharing your code and advice.

Best Regards,
Stephane

line.png

Stéphane Plaisance – Staff Scientist | Bioinformatician 

VIB Nucleomics Core 
Campus Gasthuisberg
Herestraat 49 – Post Box 816 – 3000 Leuven – Belgium 
O&N4 Building – 8th Floor – Room 08.440 

Tel. +32 16 37 31 26  
Lync. +32 16 32 00 60  
www.nucleomics.be 

vib_footer.png

Hiram Clawson

unread,
Jun 8, 2017, 11:21:13 AM6/8/17
to Stephane Plaisance | VIB |, Cath Tyner, UCSC Genome Browser Public Help Forum
Good Morning Stéphane:

The AGP lift procedure will only 'lift' contig coordinates to their
assembled scaffold/chromosome coordinates. If you actually have
different sequence in two different assemblies, the AGP lift
procedure will not be useful.

Can you clarify the relationship between the two different assemblies
you are working with. Are these actually two different assemblies
of the same organism, or is one assembly merely contigs, and the second
are those contigs assembled into larger units ?

--Hiram

On 6/8/17 3:48 AM, Stephane Plaisance | VIB | wrote:
> Thanks a lot for this code Cath,
>
> I have two short questions.
>
> * Can I produce a bonafide liftOver ‘.chain' file using my Fasta and AGP inputs and the created file.liftUp and without having to align sequences? which extra steps do I need (any command primer would be welcome)?
>
> I created the file.liftUp as described and after removing my ‘#’ header lines from the AGP (I had to use -revStrand because of reverse features))
>
> # next liftUp command does not seem to work
> liftUp my_agg.chain my_agg.liftUp warn my_agg_noheader.agp
>
> Got 7664 lifts in my_agg.liftUp
> Lifting my_agg_noheader.agp
> Expecting at least 12 words line 1 of my_agg_noheader.agp
>
> # AGP has only 9 columns!
>
> $ head -3 my_agg_noheader.agp
> Super-Scaffold_2 1 634629 1 W 000000F_018_pilon 1 634629 +
> Super-Scaffold_4 1 1183857 1 W 000060F_pilon 1 1183857 +
> Super-Scaffold_4 1183858 1184356 2 N 499 scaffold yes map
>
>
> * Can I produce a reverse-mapping file from my AGP like for instance by inverting the columns between target and query in my AGP and use the swapped AGP as input?
>
> This puts me on the track to finally succeed with my annotation transfer, I am very grateful for sharing your code and advice.
>
> Best Regards,
> Stephane
>
>
> Stéphane Plaisance – Staff Scientist | Bioinformatician
> VIB Nucleomics Core
> Campus Gasthuisberg
> Herestraat 49 – Post Box 816 – 3000 Leuven – Belgium
> O&N4 Building – 8th Floor – Room 08.440
> Tel. +32 16 37 31 26
> Lync. +32 16 32 00 60
> www.nucleomics.be <http://www.nucleomics.be/> <http://www.nucleomics.be/> <http://www.nucleomics.be/> <http://www.nucleomics.be/> <http://www.vib.be/>
>> On 07 Jun 2017, at 19:45, Cath Tyner <ca...@ucsc.edu <mailto:ca...@ucsc.edu>> wrote:
>>
>> Hi Stephane,
>>
>> Thanks for contacting the UCSC Genome Browser support team. Below are steps that you can try to accomplish your goal:
>>
>> wget -O agpToLift \
>> "http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=blob_plain;f=src/utils/agpToLift <http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=blob_plain;f=src/utils/agpToLift>"
>>
>> chmod +x agpToLift
>>
>> This constructs a lift file:
>>
>> ./agpToLift file.agp > file.liftUp
>>
>> Then, you can use liftUp. For example, on a .bed file:
>>
>> liftUp newCoordinates.bed file.liftUp oldCoordinates.bed
>>
>> You can download the liftUp command from the userApps directory on hgdownload:
>>
>> rsync -a -P rsync://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/liftUp <http://hgdownload.soe.ucsc.edu/genome/admin/exe/linux.x86_64/liftUp> ./
>>
>> Please respond to this mailing list if you have further questions as you move forward, so that our support team can assist if needed.
>>
>> Thank you for contacting the UCSC Genome Browser support team. ​Please send new and follow-up questions to one of our UCSC Genome Browser mailing lists below:
>>
>> * Post to the Public Help Forum: Email gen...@soe.ucsc.edu <mailto:gen...@soe.ucsc.edu>​ or search the Public Archives <https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome>
>> ​ * Post to the Mirror Help Forum: Email genome...@soe.ucsc.edu <mailto:genome...@soe.ucsc.edu> or search the Mirror Archives​ <https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome-mirror>
>> ​ * Confidential/private help: Email genom...@soe.ucsc.edu <mailto:genom...@soe.ucsc.edu>
>>
>> UCSC Genome Browser Announcements List <https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome-announce> (email alerts for new data & software):
>> * Subscribe: Email genome-annou...@soe.ucsc.edu <http://genome-announce+subs...@soe.ucsc.edu/>
>> * Unsubscribe: Email genome-announ...@soe.ucsc.edu <http://genome-announce+unsub...@soe.ucsc.edu/>
>>
>> Join us on Social Media! Facebook <https://www.facebook.com/ucscGenomeBrowser>, Twitter, <http://www.twitter.com/GenomeBrowser> Wordpress Blog <http://genome.ucsc.edu/blog/>, YouTube <http://www.youtube.com/channel/UCQnUJepyNOw0p8s2otX4RYQ>
>>
>> ​Enjoy,​
>> Cath
>> . . .
>> Cath Tyner
>> UCSC Genome Browser, Software QA & User Support
>> UC Santa Cruz Genomics Institute <https://genomics.soe.ucsc.edu/>
>> UCSC Genome Browser <http://genome.ucsc.edu/contacts.html>
>>
>>
>> On Tue, Jun 6, 2017 at 1:37 AM, Stephane Plaisance | VIB | <stephane....@vib.be <mailto:stephane....@vib.be>> wrote:
>> Dear,
>>
>> I am still trying to produce liftover chains to transfer annotations from an original NGS assembly to one scaffolded using BioNano Genomics hybridscaffold (related to: https://groups.google.com/a/soe.ucsc.edu/forum/#!topic/genome/HQOMmNzdDPc <https://groups.google.com/a/soe.ucsc.edu/forum/#!topic/genome/HQOMmNzdDPc>)
>>
>> I have a AGP file resulting from the scaffolding of my original NGS assembly and heard that AGP could be converted to chain using LiftUp
>> However, the inline help of LiftUp is not clear to me (to say the least).
>>
>> Could someone provide and example command for converting my 'scaffolded_asembly.agp' to a 'asm_to_scaffolded.chain’ and adding other required inputs?
>>
>> liftUp [-type=.xxx] destFile liftSpec how sourceFile(s)
>>
>> Where for instance do I get ‘LiftSpec’?
>> What should ‘how' be?
>>
>> Finally, can this command also be used to produce the reverse chain (adding -chainQ to the command??)
>>
>> Thanks in advance
>> Stephane
>>
>> stephane....@vib.be <mailto:stephane....@vib.be>
>>
>> --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu <mailto:genome+un...@soe.ucsc.edu>.
>> To post to this group, send email to gen...@soe.ucsc.edu <mailto:gen...@soe.ucsc.edu>.
>> Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/ <https://groups.google.com/a/soe.ucsc.edu/group/genome/>.
>> To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CE0DC8FC-75D3-4398-855C-C90281765C98%40vib.be <https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CE0DC8FC-75D3-4398-855C-C90281765C98%40vib.be?utm_medium=email&utm_source=footer>.
>> For more options, visit https://groups.google.com/a/soe.ucsc.edu/d/optout <https://groups.google.com/a/soe.ucsc.edu/d/optout>.
>>
>

Stephane Plaisance | VIB |

unread,
Jun 8, 2017, 11:40:10 AM6/8/17
to Hiram Clawson, Cath Tyner, UCSC Genome Browser Public Help Forum
Dear Hiram,

Sorry for the length below this line :-)

The first assembly is the raw output of Falcon Unzip (N50 in 100’s of kb).
The second assembly is the hybrid-scaffold of the first using optical mapping data from Bionano (N50 of few MB’s).
The total ATGC sequence content is exactly the same in both but in the second, pieces of contigs/scaffolds from assembly #1 have been clipped, inverted, moved, merged to others and joined with stretches of N’s where the optical data gave distance information (now superscaffolds + all leftover pieces).

I am in parallel trying to lear with help from the UCSC support team (cjvi...@ucsc.edu : Re: [genome] issues running RunLastzChain.sh on server) but am not progressing very fast.
I have tried to apply the full liftover building technique from http://genomewiki.ucsc.edu/index.php/LiftOver_Howto  but my targets and queries being in the count of 1000’s (genome size is > 1Gb), the combinatorial for aligning with LastZ is in 10’s of millions and my single 2U R730 server cannot handle that many inodes (84 threads + 512GB RAM though, not a typical netbook).

I was hoping that the availability of the Bionano AGP companion file to the scaffolded assembly could be used to create the chain more easily … It describes all what was done to Asm1 in order to get Asm2 after all!

Ideally, when I can create a true .chain file from this AGP data, I could use CrossMap (or UCSC Liftover) to migrate the tons of annotation generated under Falcon Unzip dataset coordinate (BED, GFF, BIGWIG and more) to the final and more achieved bionano superscaffolds assembly.

What would be cool is also to be able to map down future annotations from asm2 back to asm1 (reverse lift) but this is less urgent.

The problem of large non-ref species genome annotation liftover seems quite widespread from several posts I found in BioStar and there is clearly a need for working workflows to achieve this by any mean. I keep looking on the Net but so far did not find a working protocol and this seems to be an art reserved to few experts which is not my case (yet)

Thanks for your help and suggestions,

Best Regards,
Stephane

line.png

Stéphane Plaisance – Staff Scientist | Bioinformatician 

VIB Nucleomics Core 
Campus Gasthuisberg
Herestraat 49 – Post Box 816 – 3000 Leuven – Belgium 
O&N4 Building – 8th Floor – Room 08.440 

Tel. +32 16 37 31 26  
Lync. +32 16 32 00 60  
www.nucleomics.be 

vib_footer.png


Chris Villarreal

unread,
Jun 9, 2017, 5:16:31 PM6/9/17
to Stephane Plaisance | VIB |, UCSC Genome Browser Public Help Forum

Dear Stephane,

Thank you for your question about the UCSC Genome Browser. The AGP file itself does not to be lifted. It is the instructions for lifting. liftOver uses chains as the instructions for lifting; liftUp uses a liftSpec. Convert the AGP to a liftSpec, then use liftUp with the AGP-derived liftSpec to lift annotations with Asm1 coordinates to Asm2 coordinates.

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

-Chris V
UCSC Genome Browser


​  * Post to the Mirror Help Forum: Email genome...@soe.ucsc.edu <mailto:genome-mirror@soe.ucsc.edu> or search the Mirror Archives​ <https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome-mirror>

​  * Confidential/private help: Email genom...@soe.ucsc.edu <mailto:genom...@soe.ucsc.edu>

UCSC Genome Browser Announcements List <https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome-announce> (email alerts for new data & software):


Join us on Social Media! Facebook <https://www.facebook.com/ucscGenomeBrowser>, Twitter, <http://www.twitter.com/GenomeBrowser> Wordpress Blog <http://genome.ucsc.edu/blog/>, YouTube <http://www.youtube.com/channel/UCQnUJepyNOw0p8s2otX4RYQ>

​Enjoy,​
Cath
. . .
Cath Tyner
UCSC Genome Browser, Software QA & User Support
UC Santa Cruz Genomics Institute <https://genomics.soe.ucsc.edu/>
UCSC Genome Browser <http://genome.ucsc.edu/contacts.html>


On Tue, Jun 6, 2017 at 1:37 AM, Stephane Plaisance | VIB | <stephane....@vib.be <mailto:stephane....@vib.be>> wrote:
Dear,

I am still trying to produce liftover chains to transfer annotations from an original NGS assembly to one scaffolded using BioNano Genomics hybridscaffold (related to: https://groups.google.com/a/soe.ucsc.edu/forum/#!topic/genome/HQOMmNzdDPc<https://groups.google.com/a/soe.ucsc.edu/forum/#!topic/genome/HQOMmNzdDPc>)

I have a AGP file resulting from the scaffolding of my original NGS assembly and heard that AGP could be converted to chain using LiftUp
However, the inline help of LiftUp is not clear to me (to say the least).

Could someone provide and example command for converting my 'scaffolded_asembly.agp' to a 'asm_to_scaffolded.chain’ and adding other required inputs?

liftUp [-type=.xxx] destFile liftSpec how sourceFile(s)

Where for instance do I get ‘LiftSpec’?
What should ‘how' be?

Finally, can this command also be used to produce the reverse chain (adding -chainQ  to the command??)

Thanks in advance
Stephane

stephane....@vib.be <mailto:stephane....@vib.be>

-- 

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+unsubscribe@soe.ucsc.edu<mailto:genome+unsub...@soe.ucsc.edu>.

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/898C3FD3-B90C-4682-9608-B7A83B372F83%40vib.be.

Stephane Plaisance | VIB |

unread,
Jun 12, 2017, 10:14:19 AM6/12/17
to Chris Villarreal, UCSC Genome Browser Public Help Forum
Dear Chris and Support,

I think I misunderstood the aim of liftUp.

After looking closer to LiftUp, I fear I cannot use it as my component are also split during scaffolding and the Lift format doe snot allow to store coordinates for subsequences.

One example
a contig [123456789] becomes part of a scaffold [abcdefghijklmnopq] starting at ‘e’ on ‘+’ strand ([abcd123456789nopq) would work in Lift format: 'offset oldName oldSize newName newSize'

but when only [45678] is scaffolded and both ends trimmed and/or sent elsewhere, I am stuck and I cannot tell the lift file to start at 4 for a length of 5!

The only way to fix this that I see is to create 3 new sub-contigs features [123] [45678] [9] and assign the second one but by doing this I change the query naming and break the link with the annotation files!
This is actually what BioNano Genomics did to create their AGP and they renamed sub-contig parts like contigX_subsequence123456:7891011 which make them different from the annotation bearing contigX in the annotation data (GFF  for instance)

Please correct me if I am wrong?

With which tool can I lift original contig annotations onto scaffolded data of the same genome (no changes in sequences, only scaffolding by split, invert, gap, join operations)?
I have all scaffolding info in the AGP and do not wish to realign sequences to do the lift.
I do not have a chain file for the assembly scaffolding operation.

Best Regards,

Stephane

line.png

Stéphane Plaisance – Staff Scientist | Bioinformatician 

VIB Nucleomics Core 
Campus Gasthuisberg
Herestraat 49 – Post Box 816 – 3000 Leuven – Belgium 
O&N4 Building – 8th Floor – Room 08.440 

Tel. +32 16 37 31 26  
Lync. +32 16 32 00 60  
www.nucleomics.be 

vib_footer.png


Matthew Speir

unread,
Jun 22, 2017, 10:25:19 AM6/22/17
to Stephane Plaisance | VIB |, Chris Villarreal, UCSC Genome Browser Public Help Forum
Hi Stephane,

I apologize for the late reply. Yes, I believe you are correct that using liftUp won't work for you in this case. However, we believe that it's possible to create a straightforward script that would take your AGP file plus two files, one containing the names and sizes of all scaffolds and the other a similar file for contigs, an convert it directly into a .chain file that could be used by liftOver. Let us know if you're interested in this solution.


I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group

Stephane Plaisance | VIB |

unread,
Jun 22, 2017, 11:04:37 AM6/22/17
to Matthew Speir, Chris Villarreal, UCSC Genome Browser Public Help Forum
Dear Matthew,

Given the large adoption for 10x, dovetail, and bionano to only cite three makes very likely that people will need to lift annotations between assembly stages.

I now used flo to do it (and it worked) but it requires hours of aligning things with blat while the scaffolding info was already recorded in the AGP track provided by the technology.

The challenge I see is that scaffolding does break original contigs in several parts and report them with modified names and orientations (like Contig01_subsequence-123-456) which makes that a same contigs can be present in several new scaffolds as different parts of the original. This will require some additional processing to resolve one-to-many relationship (but all edits coordinate were stored).

If your expert(s) could invest some time in this, I would be happy to collaborate to produce the pipeline you describe. I can also provide test data if this helps. Doing it from scratch seems too big for me though and I will need some help.

Best Regards,
Stephane

line.png

Stéphane Plaisance – Staff Scientist | Bioinformatician 

VIB Nucleomics Core 
Campus Gasthuisberg
Herestraat 49 – Post Box 816 – 3000 Leuven – Belgium 
O&N4 Building – 8th Floor – Room 08.440 

Tel. +32 16 37 31 26  
Lync. +32 16 32 00 60  
www.nucleomics.be 

vib_footer.png


Reply all
Reply to author
Forward
0 new messages