-----Original Message-----
From: abyss...@googlegroups.com [mailto:abyss...@googlegroups.com] On Behalf Of Tony Raymond
Sent: Monday, November 14, 2011 5:18 PM
To: Shaun Jackman; Vera Kaiser
Cc: ABySS
Subject: RE: merging assemblies
Hi Vera,
There is no clean way to do this using the current release of trans-abyss. However, I added an option to abyss-map in ABySS 1.3.1 to identify duplicate sequences in a given fasta for the purpose of merging assemblies, and should do what you want. Attached is a wrapper to make the merged contigs fit a little easier into the trans-abyss pipeline. Mainly the purpose of the script is to fix the contig id's so that they are unique and so you can identify where they came from. This would be the usage:
mergeLibs.sh TOPDIR "LIB1 LIB2 ..." OUT_DIR
Where TOPDIR is your trans-abyss topdir. The script will cat all the fasta files like TOPDIR/LIB/Assembly/abyss-1.2.5/merge/LIB-contigs.fa into the file OUT_DIR/contigs.fa (changes the contig id's from 'id' to 'LIB:id'), runs abyss-map to identify the duplicated sequences, and then removes the duplicated sequences with MergeContigs. The final merged contigs file will be OUT_DIR/contigs-2.fa.
Let me know if you have any questions or problems running the script.
Cheers,
Tony
-----Original Message-----
From: Shaun Jackman
Sent: Monday, November 14, 2011 12:04 PM
To: Vera Kaiser
Cc: Tony Raymond; ABySS
Subject: Re: merging assemblies
Hi Vera, (cc'ed to abyss-users)
I've forwarded your email to Tony Raymond, who will help you out.
Cheers,
Shaun
On Fri, 2011-11-11 at 12:35 -0800, Vera Kaiser wrote:
> Hi Shaun,
>
>
> I want to merge different trans-Abyss assemblies (based on different
> tissues), to get all transcripts expressed in all tissues in a single
> assembly.
>
>
> I have the trans-Abyss files
>
>
> ....../Assembly/abyss-1.2.5/merge/XXX-contigs.fa
>
>
> and want to merge these contigs again.
>
>
> Can I do this using one of the existing scripts in Abyss or
> trans-Abyss?
>
>
> If so, can you send me the exact commands because my scripting
> abilities are a bit limited....
>
>
> Thanks,
> Vera
Sorry for the delay in response. I'm not sure if it is mandatory to filter these short contigs out, but it would make the analysis a lot faster. Using the wrapper the command 'trans-abyss -1 ...' will do the filtering and merging for you. Running filtering and merging separately you would run utilities/assembly.py and then wrappers/merge.pl (or the merge script I've posted previously that uses abyss-map for merging).
Much of this information I've taken from the trans-abyss users manual, section 4.4. There is a link to download the manual on the trans-abyss download page:
http://www.bcgsc.ca/platform/bioinfo/software/trans-abyss/releases/1.2.0
Cheers,
Tony
________________________________________
From: Braj [brajb...@gmail.com]
Sent: Wednesday, January 18, 2012 11:10 PM
To: Tony Raymond
Subject: Re: merging assemblies
Hi,
In my <lib>-contigs.fa generated for multiple kmers using ABySS
1.2.5 ,I have many
contigs of length smaller than 150 bp.
Now,Before going on the next step (i.e assembly and merging of <lib>-
contigs.fa from diffrent kmers,
Is is it mandatory to remove contigs lesser than 150 bp for each kmer
generated), If yes ..
Is there any parameter in trans-abyss/abyss command to do this or Do I
have to write my own script to filter such contigs ?
Can filtering of such contigs (lesser than 150bp in each kmers) can be
done after assembly of each kmers and before merging?
Thanks in advance,
Braj
> mergeLibs.sh
> < 1KViewDownload
Thanks for your suggestion.
I got the assembly of the kmers in my library using trans-abyss
wrapper script in stage -1 , but then there was no file generate in ../
merge/merging folder.
I thought to run merge.pl in wrappers separately using command:
/root/software/transabyss/trans-ABySS/wrappers/merge.pl \
/data/wm_win/PROJ/LIB1/Assembly/abyss-1.2.5 \
LIB1 contigs \
/data/wm_win/PROJ/LIB1/Assembly/abyss-1.2.5/merge/merging \
But now I am getting the error in round1 itself which I guess is
related to the fastmap.
Can I request you to suggest me, how I can tackle this error (plz see
below) ?
Error:
Maximum single piece size (5000) exceeded by query 1292594 of size
(5191). Larger pieces will have to be split up
until no larger than this limit when the -fastMap option is used.
/usr/local/bin/blat\
/data/wm_win/PROJ/LIB1/Assembly/abyss-1.2.5/k29/LIB1-contigs.fa \
/data/wm_win/PROJ/LIB1/Assembly/abyss-1.2.5/k25/LIB1-contigs.fa \
/data/wm_win/PROJ/LIB1/Assembly/abyss-1.2.5/merge/merging/contigs/
round1/k25-k29.psl \
-minIdentity=100 -maxGap=0 -fastMap failed: 65280 at /root/software/
transabyss/trans-ABySS/wrappers/merge.pl line 213, <FILE> line 17.
Thanks in advance,
Braj
On Jan 20, 11:25 pm, Tony Raymond <traym...@bcgsc.ca> wrote:
> Hi Braj, (CC'd to the trans-abyss mailing list)
>
> Sorry for the delay in response. I'm not sure if it is mandatory to filter these short contigs out, but it would make the analysis a lot faster. Using the wrapper the command 'trans-abyss -1 ...' will do the filtering and merging for you. Running filtering and merging separately you would run utilities/assembly.py and then wrappers/merge.pl (or the merge script I've posted previously that uses abyss-map for merging).
>
> Much of this information I've taken from the trans-abyss users manual, section 4.4. There is a link to download the manual on the trans-abyss download page:http://www.bcgsc.ca/platform/bioinfo/software/trans-abyss/releases/1.2.0
>
> Cheers,
> Tony
> ________________________________________
> From: Braj [brajbio...@gmail.com]
Please keep your questions on the mailing list.
Sorry, the query type was a different issue. So, you can ignore my earlier comment about that for now.
A fellow Trans-ABySS user reported that BLAT has a query size limit of 5000 when the fastmap option is used. As far as I know, there wasn't such limit when Trans-ABySS 1.2.0 was released. So, you have a few options:
a) find an older version of BLAT that didn't have the query size limit of 5000
b) compile your own BLAT executable with the query size limit increased
c) use the scripts that I posted in this thread: http://groups.google.com/group/trans-abyss/browse_thread/thread/188ddb4da32268ab
> 1.What statistics from assembly merging, I should check to rely on
merging ?
The total number of contigs in the merged contig set should be less than the total number of contigs in your input assemblies for merging. Your merged contig set should have the longer contigs while the smaller contigs are removed.
> 2.what the term 'buried' and 'canonical' mean here.
A contig is classified as "buried" if it is part of a longer contig. For example, "ATCGATCG" is buried within "CCCCCCATCGATCGCCCC". During merging, the buried contigs are removed while the larger contig containing the buried contig is kept. A set of contigs are classified as "canonical" when its buried contigs have been removed.
> 3.What count is 603314 and 1206628?
I guess you were trying to understand this line in the log: "603314(query) + 603314(target) = 1206628" ?
The merging algorithm uses BLAT to align contigs from one assembly to another assembly. So, the number for query refers to the number of contigs in the query assembly while the number for target refers to the number of contigs in the target assembly. "1206628" is the sum of these two numbers.
Hope that helps!
Ka Ming
________________________________________
From: Braj [brajb...@gmail.com]
Sent: February 9, 2012 8:33 PM
To: Ka Ming Nip
Cc: brajb...@gmail.com
Subject: Re: merging assemblies
Thanks Ka Ming,
I had tried merging my assemblies (containing contigs as well as
scaffold in <lib>-contigs.fa) of kmers in range k25-k49.
I used -fastmap and -q=dna option for blat but this does not work and
throws the same error ('length limit of 5000') .
In my subsequent attempt, I run the merge.pl with blat without the -
fastmap -q=dna options, and it works fine this time,
but took more time then expected as fastmap was off.
In the log file for merge.pl I got the last few lines as :
.
.
.
k30:687932 buried in k30:411532
k30:691572 buried in k30:506461 (identical)
k30:693116 buried in k30:459038 (identical)
k30:693774 buried in k30:498079 (identical)
k31:173596 buried in k31:610477 (identical)
k31:299367 buried in k31:619930 (identical)
k31:610477 buried in k31:173596 (identical)
k31:619930 buried in k31:299367 (identical)
k32:235719 buried in k32:551318 (identical)
k32:551318 buried in k32:235719 (identical)
k33:22593 buried in k33:499736
k34:342080 buried in k34:440312
k35:217177 buried in k35:393891
k35:69842 buried in k35:386549
k35:87580u buried in k35:395708
k38:62749 buried in k38:287750
k40:303417u buried in k40:518439u
num contigs buried: 112
112(buried) + 603202(canonical) = 603314
603314(query) + 603314(target) = 1206628
Could you please help me to know
1.What statistics from assembly merging, I should check to rely on
merging ?
2.what the term 'buried' and 'canonical' mean here.
3.What count is 603314 and 1206628?
thanks
Braj
On Feb 4, 4:26 am, Ka Ming Nip <km...@bcgsc.ca> wrote:
> Hi Braj,
>
> Your error message was:
> "Maximum single piece size (5000) exceeded by query 1292594 of size
> (5191). Larger pieces will have to be split up
> until no larger than this limit when the -fastMap option is used."
>
> It looks like I was wrong about the query type for this error. Sorry
> about that.
> Basically, merge.pl from Trans-ABySS v1.2.0 isn't compatible with the
> latest version of BLAT...
> You will either have to find an older version of BLAT that didn't have
> a 5000 query size limit for BLAT's fastmap option OR use the scripts
> that I posted here:http://groups.google.com/group/trans-abyss/browse_thread/thread/188dd...
I forgot to point this out in my last message.
> I had tried merging my assemblies (containing contigs as well as
> scaffold in <lib>-contigs.fa) of kmers in range k25-k49.
If you want to use the analysis portions of Trans-ABySS (ie. detecting fusions, snv, indels), you need to break up all scaffolds into contigs before running the alignment steps (reads-to-contig, contigs-to-genome).
Scaffold:
>123
ATCTNNNATCGAAA
Contigs:
>123_1
ATCT
>123_2
ATCGAAA
Ka Ming
________________________________________
From: Ka Ming Nip
Sent: February 10, 2012 10:07 AM
To: Braj
Cc: trans...@googlegroups.com