RE: merging assemblies

199 views
Skip to first unread message

Tony Raymond

unread,
Nov 14, 2011, 8:26:33 PM11/14/11
to Vera Kaiser, trans...@googlegroups.com
Forwarding this onto trans-ABySS mailing list too... It probably belongs here more :)

-----Original Message-----
From: abyss...@googlegroups.com [mailto:abyss...@googlegroups.com] On Behalf Of Tony Raymond
Sent: Monday, November 14, 2011 5:18 PM
To: Shaun Jackman; Vera Kaiser
Cc: ABySS
Subject: RE: merging assemblies

Hi Vera,

There is no clean way to do this using the current release of trans-abyss. However, I added an option to abyss-map in ABySS 1.3.1 to identify duplicate sequences in a given fasta for the purpose of merging assemblies, and should do what you want. Attached is a wrapper to make the merged contigs fit a little easier into the trans-abyss pipeline. Mainly the purpose of the script is to fix the contig id's so that they are unique and so you can identify where they came from. This would be the usage:

mergeLibs.sh TOPDIR "LIB1 LIB2 ..." OUT_DIR

Where TOPDIR is your trans-abyss topdir. The script will cat all the fasta files like TOPDIR/LIB/Assembly/abyss-1.2.5/merge/LIB-contigs.fa into the file OUT_DIR/contigs.fa (changes the contig id's from 'id' to 'LIB:id'), runs abyss-map to identify the duplicated sequences, and then removes the duplicated sequences with MergeContigs. The final merged contigs file will be OUT_DIR/contigs-2.fa.

Let me know if you have any questions or problems running the script.

Cheers,
Tony

-----Original Message-----
From: Shaun Jackman
Sent: Monday, November 14, 2011 12:04 PM
To: Vera Kaiser
Cc: Tony Raymond; ABySS
Subject: Re: merging assemblies

Hi Vera, (cc'ed to abyss-users)

I've forwarded your email to Tony Raymond, who will help you out.

Cheers,
Shaun

On Fri, 2011-11-11 at 12:35 -0800, Vera Kaiser wrote:
> Hi Shaun,
>
>
> I want to merge different trans-Abyss assemblies (based on different
> tissues), to get all transcripts expressed in all tissues in a single
> assembly.
>
>
> I have the trans-Abyss files
>
>
> ....../Assembly/abyss-1.2.5/merge/XXX-contigs.fa
>
>
> and want to merge these contigs again.
>
>
> Can I do this using one of the existing scripts in Abyss or
> trans-Abyss?
>
>
> If so, can you send me the exact commands because my scripting
> abilities are a bit limited....
>
>
> Thanks,
> Vera

mergeLibs.sh

Tony Raymond

unread,
Jan 20, 2012, 1:25:12 PM1/20/12
to Braj, trans...@googlegroups.com
Hi Braj, (CC'd to the trans-abyss mailing list)

Sorry for the delay in response. I'm not sure if it is mandatory to filter these short contigs out, but it would make the analysis a lot faster. Using the wrapper the command 'trans-abyss -1 ...' will do the filtering and merging for you. Running filtering and merging separately you would run utilities/assembly.py and then wrappers/merge.pl (or the merge script I've posted previously that uses abyss-map for merging).

Much of this information I've taken from the trans-abyss users manual, section 4.4. There is a link to download the manual on the trans-abyss download page:
http://www.bcgsc.ca/platform/bioinfo/software/trans-abyss/releases/1.2.0

Cheers,
Tony
________________________________________
From: Braj [brajb...@gmail.com]
Sent: Wednesday, January 18, 2012 11:10 PM
To: Tony Raymond
Subject: Re: merging assemblies

Hi,

In my <lib>-contigs.fa generated for multiple kmers using ABySS
1.2.5 ,I have many
contigs of length smaller than 150 bp.

Now,Before going on the next step (i.e assembly and merging of <lib>-
contigs.fa from diffrent kmers,

Is is it mandatory to remove contigs lesser than 150 bp for each kmer
generated), If yes ..

Is there any parameter in trans-abyss/abyss command to do this or Do I
have to write my own script to filter such contigs ?

Can filtering of such contigs (lesser than 150bp in each kmers) can be
done after assembly of each kmers and before merging?

Thanks in advance,

Braj

> mergeLibs.sh
> < 1KViewDownload

Braj

unread,
Jan 21, 2012, 6:09:17 PM1/21/12
to Tony Raymond, Readman Chiu, trans...@googlegroups.com
Hi Tony,

Thanks for your suggestion.

I got the assembly of the kmers in my library using trans-abyss
wrapper script in stage -1 , but then there was no file generate in ../
merge/merging folder.

I thought to run merge.pl in wrappers separately using command:

/root/software/transabyss/trans-ABySS/wrappers/merge.pl \
/data/wm_win/PROJ/LIB1/Assembly/abyss-1.2.5 \
LIB1 contigs \
/data/wm_win/PROJ/LIB1/Assembly/abyss-1.2.5/merge/merging \

But now I am getting the error in round1 itself which I guess is
related to the fastmap.

Can I request you to suggest me, how I can tackle this error (plz see
below) ?

Error:

Maximum single piece size (5000) exceeded by query 1292594 of size
(5191). Larger pieces will have to be split up
until no larger than this limit when the -fastMap option is used.

/usr/local/bin/blat\
/data/wm_win/PROJ/LIB1/Assembly/abyss-1.2.5/k29/LIB1-contigs.fa \
/data/wm_win/PROJ/LIB1/Assembly/abyss-1.2.5/k25/LIB1-contigs.fa \
/data/wm_win/PROJ/LIB1/Assembly/abyss-1.2.5/merge/merging/contigs/
round1/k25-k29.psl \
-minIdentity=100 -maxGap=0 -fastMap failed: 65280 at /root/software/
transabyss/trans-ABySS/wrappers/merge.pl line 213, <FILE> line 17.

Thanks in advance,

Braj


On Jan 20, 11:25 pm, Tony Raymond <traym...@bcgsc.ca> wrote:
> Hi Braj, (CC'd to the trans-abyss mailing list)
>
> Sorry for the delay in response. I'm not sure if it is mandatory to filter these short contigs out, but it would make the analysis a lot faster. Using the wrapper the command 'trans-abyss -1 ...' will do the filtering and merging for you. Running filtering and merging separately you would run utilities/assembly.py and then wrappers/merge.pl (or the merge script I've posted previously that uses abyss-map for merging).
>
> Much of this information I've taken from the trans-abyss users manual, section 4.4. There is a link to download the manual on the trans-abyss download page:http://www.bcgsc.ca/platform/bioinfo/software/trans-abyss/releases/1.2.0
>
> Cheers,
> Tony
> ________________________________________

> From: Braj [brajbio...@gmail.com]

Ka Ming Nip

unread,
Jan 23, 2012, 1:39:29 PM1/23/12
to Trans-ABySS
Hi Braj,

I remember a fellow Trans-ABySS user had reported this error message
in the past.
It is very likely that the default "Query type" and "Database type" is
'prot' instead of 'dna' for the copy of BLAT you were running.
Therefore, the upper limit for query size is 5000 instead of 25000 for
you.

http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html
BLAT limitations
DNA input sequences are limited to a maximum length of 25,000 bases.
Protein or translated input sequences must not exceed 5000 letters. As
many as 25 multiple sequences may be submitted at the same time. The
maximum combined length of DNA input for multiple sequence submissions
is 50,000 bases (with a 25,000 base limit per individual sequence).
For protein or translated input, the maximum combined input length is
12,500 letters (with a 5000 letter limit per individual sequence).

So... either configure BLAT to use 'dna' as default type or specify
the correct query/target types in projects.cfg:
blat_1_2: FA2 FA1 OUT_DIR/KOUT1-KOUT2.psl -minIdentity=100 -maxGap=0 -
fastMap -t=dna -q=dna
blat_2_1: FA1 FA2 OUT_DIR/KOUT2-KOUT1.psl -minIdentity=100 -maxGap=0 -
fastMap -t=dna -q=dna

Ka Ming

Braj

unread,
Feb 2, 2012, 11:41:16 PM2/2/12
to Trans-ABySS
Thanks alot Ka Ming

Ka Ming Nip

unread,
Feb 3, 2012, 6:26:50 PM2/3/12
to Trans-ABySS
Hi Braj,

Your error message was:
"Maximum single piece size (5000) exceeded by query 1292594 of size
(5191). Larger pieces will have to be split up
until no larger than this limit when the -fastMap option is used."

It looks like I was wrong about the query type for this error. Sorry
about that.
Basically, merge.pl from Trans-ABySS v1.2.0 isn't compatible with the
latest version of BLAT...
You will either have to find an older version of BLAT that didn't have
a 5000 query size limit for BLAT's fastmap option OR use the scripts
that I posted here:
http://groups.google.com/group/trans-abyss/browse_thread/thread/188ddb4da32268ab

Ka Ming

Ka Ming Nip

unread,
Feb 10, 2012, 1:07:27 PM2/10/12
to Braj, trans...@googlegroups.com
Hi Braj,

Please keep your questions on the mailing list.

Sorry, the query type was a different issue. So, you can ignore my earlier comment about that for now.
A fellow Trans-ABySS user reported that BLAT has a query size limit of 5000 when the fastmap option is used. As far as I know, there wasn't such limit when Trans-ABySS 1.2.0 was released. So, you have a few options:
a) find an older version of BLAT that didn't have the query size limit of 5000
b) compile your own BLAT executable with the query size limit increased
c) use the scripts that I posted in this thread: http://groups.google.com/group/trans-abyss/browse_thread/thread/188ddb4da32268ab

> 1.What statistics from assembly merging, I should check to rely on
merging ?

The total number of contigs in the merged contig set should be less than the total number of contigs in your input assemblies for merging. Your merged contig set should have the longer contigs while the smaller contigs are removed.

> 2.what the term 'buried' and 'canonical' mean here.

A contig is classified as "buried" if it is part of a longer contig. For example, "ATCGATCG" is buried within "CCCCCCATCGATCGCCCC". During merging, the buried contigs are removed while the larger contig containing the buried contig is kept. A set of contigs are classified as "canonical" when its buried contigs have been removed.

> 3.What count is 603314 and 1206628?

I guess you were trying to understand this line in the log: "603314(query) + 603314(target) = 1206628" ?
The merging algorithm uses BLAT to align contigs from one assembly to another assembly. So, the number for query refers to the number of contigs in the query assembly while the number for target refers to the number of contigs in the target assembly. "1206628" is the sum of these two numbers.


Hope that helps!

Ka Ming
________________________________________
From: Braj [brajb...@gmail.com]
Sent: February 9, 2012 8:33 PM
To: Ka Ming Nip
Cc: brajb...@gmail.com
Subject: Re: merging assemblies

Thanks Ka Ming,

I had tried merging my assemblies (containing contigs as well as
scaffold in <lib>-contigs.fa) of kmers in range k25-k49.
I used -fastmap and -q=dna option for blat but this does not work and
throws the same error ('length limit of 5000') .
In my subsequent attempt, I run the merge.pl with blat without the -
fastmap -q=dna options, and it works fine this time,
but took more time then expected as fastmap was off.

In the log file for merge.pl I got the last few lines as :

.
.
.
k30:687932 buried in k30:411532
k30:691572 buried in k30:506461 (identical)
k30:693116 buried in k30:459038 (identical)
k30:693774 buried in k30:498079 (identical)
k31:173596 buried in k31:610477 (identical)
k31:299367 buried in k31:619930 (identical)
k31:610477 buried in k31:173596 (identical)
k31:619930 buried in k31:299367 (identical)
k32:235719 buried in k32:551318 (identical)
k32:551318 buried in k32:235719 (identical)
k33:22593 buried in k33:499736
k34:342080 buried in k34:440312
k35:217177 buried in k35:393891
k35:69842 buried in k35:386549
k35:87580u buried in k35:395708
k38:62749 buried in k38:287750
k40:303417u buried in k40:518439u
num contigs buried: 112
112(buried) + 603202(canonical) = 603314
603314(query) + 603314(target) = 1206628

Could you please help me to know

1.What statistics from assembly merging, I should check to rely on
merging ?
2.what the term 'buried' and 'canonical' mean here.
3.What count is 603314 and 1206628?

thanks

Braj


On Feb 4, 4:26 am, Ka Ming Nip <km...@bcgsc.ca> wrote:
> Hi Braj,
>

> Your error message was:
> "Maximum single piece size (5000) exceeded by query 1292594 of size
> (5191). Larger pieces will have to be split up
> until no larger than this limit when the -fastMap option is used."
>
> It looks like I was wrong about the query type for this error. Sorry
> about that.
> Basically, merge.pl from Trans-ABySS v1.2.0 isn't compatible with the
> latest version of BLAT...
> You will either have to find an older version of BLAT that didn't have
> a 5000 query size limit for BLAT's fastmap option OR use the scripts

> that I posted here:http://groups.google.com/group/trans-abyss/browse_thread/thread/188dd...

Ka Ming Nip

unread,
Feb 10, 2012, 1:17:50 PM2/10/12
to Braj, trans...@googlegroups.com
Hi again,

I forgot to point this out in my last message.

> I had tried merging my assemblies (containing contigs as well as
> scaffold in <lib>-contigs.fa) of kmers in range k25-k49.

If you want to use the analysis portions of Trans-ABySS (ie. detecting fusions, snv, indels), you need to break up all scaffolds into contigs before running the alignment steps (reads-to-contig, contigs-to-genome).

Scaffold:
>123
ATCTNNNATCGAAA

Contigs:
>123_1
ATCT
>123_2
ATCGAAA


Ka Ming
________________________________________
From: Ka Ming Nip
Sent: February 10, 2012 10:07 AM
To: Braj
Cc: trans...@googlegroups.com

Reply all
Reply to author
Forward
0 new messages