First attempt with Zorro: empty assembly. Because I used scaffolds instead of contigs?

290 views
Skip to first unread message

Bastien Boussau

unread,
Apr 21, 2011, 2:02:30 PM4/21/11
to Zorro - The masked assembler
Dear Zorros,

I have just tried Zorro on two assemblies I obtained from SOAPdenovo
from 2 distinct Illumina paired-end libraries.
I obtained an empty assembly file as can be seen from the listing
below. My question is: is it because one of the two assemblies
contained scaffolds and not contigs? In this case I will run
split_at_Ns.pl on this second assembly and run Zorro again. In
addition, if I may, from a user point of view, it might be nice to
have an error message issued in cases where there are Ns in one of the
two assemblies.

Thanks very much!

Bastien.

Listing obtained after Zorro ran:

-rw-r--r-- 1 boussau boussau 63M 2011-04-20 20:44 out.BRK.delta
-rw-r--r-- 1 boussau boussau 12M 2011-04-20 20:44
out.BRK.filtered.delta
-rw-r--r-- 1 boussau boussau 37M 2011-04-20 20:44 out.BRK.coords
-rw-r--r-- 1 boussau boussau 134M 2011-04-20 20:46 out.assembly1.BRK
-rw-r--r-- 1 boussau boussau 120M 2011-04-20 20:48 out.assembly2.BRK
-rw-r--r-- 1 boussau boussau 7.8G 2011-04-20 22:15 out.reads.22mers
-rw-r--r-- 1 boussau boussau 6.9G 2011-04-20 22:36
out.assembly1.22mers_covplot
-rw-r--r-- 1 boussau boussau 272K 2011-04-20 22:45 out.histocovplot
-rw-r--r-- 1 boussau boussau 0 2011-04-20 22:45 out.reads.
22mers.cutoff5932784
-rw-r--r-- 1 boussau boussau 0 2011-04-20 23:00 out.reads.
22mers.cutoff8899176
-rw-r--r-- 1 boussau boussau 25M 2011-04-20 23:13
out.assembly1.BRK.index.4.ebwt
-rw-r--r-- 1 boussau boussau 4.0M 2011-04-20 23:13
out.assembly1.BRK.index.3.ebwt
-rw-r--r-- 1 boussau boussau 13M 2011-04-20 23:15
out.assembly1.BRK.index.2.ebwt
-rw-r--r-- 1 boussau boussau 48M 2011-04-20 23:15
out.assembly1.BRK.index.1.ebwt
-rw-r--r-- 1 boussau boussau 13M 2011-04-20 23:18
out.assembly1.BRK.index.rev.2.ebwt
-rw-r--r-- 1 boussau boussau 48M 2011-04-20 23:18
out.assembly1.BRK.index.rev.1.ebwt
-rw-r--r-- 1 boussau boussau 0 2011-04-20 23:18
out.assembly1.BRK.screencoords
-rw-r--r-- 1 boussau boussau 134M 2011-04-20 23:21
out.assembly1.BRK.screen
-rw-r--r-- 1 boussau boussau 134M 2011-04-20 23:24
out.assembly1.masked
-rw-r--r-- 1 boussau boussau 21M 2011-04-20 23:24
out.assembly2.BRK.index.4.ebwt
-rw-r--r-- 1 boussau boussau 3.2M 2011-04-20 23:24
out.assembly2.BRK.index.3.ebwt
-rw-r--r-- 1 boussau boussau 11M 2011-04-20 23:26
out.assembly2.BRK.index.2.ebwt
-rw-r--r-- 1 boussau boussau 38M 2011-04-20 23:26
out.assembly2.BRK.index.1.ebwt
-rw-r--r-- 1 boussau boussau 11M 2011-04-20 23:28
out.assembly2.BRK.index.rev.2.ebwt
-rw-r--r-- 1 boussau boussau 38M 2011-04-20 23:28
out.assembly2.BRK.index.rev.1.ebwt
-rw-r--r-- 1 boussau boussau 0 2011-04-20 23:28
out.assembly2.BRK.screencoords
-rw-r--r-- 1 boussau boussau 120M 2011-04-20 23:31
out.assembly2.BRK.screen
-rw-r--r-- 1 boussau boussau 120M 2011-04-20 23:34
out.assembly2.masked
-rw-r--r-- 1 boussau boussau 254M 2011-04-20 23:34 out.ALL
-rw-r--r-- 1 boussau boussau 367M 2011-04-20 23:34 out.assembly1.qual
-rw-r--r-- 1 boussau boussau 221M 2011-04-20 23:35 out.assembly2.qual
-rw-r--r-- 1 boussau boussau 587M 2011-04-20 23:35 out.ALL.qual
-rw-r--r-- 1 boussau boussau 482M 2011-04-20 23:42 out.ALL.afg
-rw-r--r-- 1 boussau boussau 16M 2011-04-20 23:43 out.newids
-rw-r--r-- 1 boussau boussau 55M 2011-04-21 01:25 out.masked.delta
-rw-r--r-- 1 boussau boussau 882 2011-04-21 01:25 out.masked.log
-rw-r--r-- 1 boussau boussau 88M 2011-04-21 01:26 out.masked.coords
-rw-r--r-- 1 boussau boussau 860 2011-04-21 02:39 out.unmasked.log
-rw-r--r-- 1 boussau boussau 55M 2011-04-21 03:02 out.unmasked.delta
-rw-r--r-- 1 boussau boussau 0 2011-04-21 03:02
out.unmasked.coords.log
-rw-r--r-- 1 boussau boussau 0 2011-04-21 03:02
out.unmasked.coords.discarded
-rw-r--r-- 1 boussau boussau 36M 2011-04-21 03:02 out.unmasked.coords
-rw-r--r-- 1 boussau boussau 31M 2011-04-21 03:02
out.unmasked.coords.filtered
-rw-r--r-- 1 boussau boussau 13M 2011-04-21 03:02
out.unmasked.coords.filtered.ovl
-rw-r--r-- 1 boussau boussau 19M 2011-04-21 03:02
out.unmasked.coords.filtered.OVL
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34 out.hybrid.contig
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34 out.hybrid.fasta
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34 out.hybrid.qual
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34 out.singlets
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34 out.singlets.fasta
drwxr-xr-x 2 boussau boussau 4.0K 2011-04-21 05:34 out.ALL.bnk
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34
out.hybrid.fasta.index.4.ebwt
-rw-r--r-- 1 boussau boussau 8 2011-04-21 05:34
out.hybrid.fasta.index.3.ebwt
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34
out.hybrid.fasta.screencoords
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34
out.singlets.fasta.unique
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34
out.singlets.fasta.rep
-rw-r--r-- 1 boussau boussau 5.8M 2011-04-21 05:34 out.log
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34
out.hybrid.fasta.trimmed.ends
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34
out.hybrid.fasta.trimmed.core
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34
out.hybrid.fasta.screen
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34
out.hybridassembly.masked
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34 out.REPASSEMBLY
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34 out.REPASSEMBLY.afg
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34
out.REPASSEMBLY.newids
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34
out.REPASSEMBLY.hybridcontigs.qual
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34
out.REPASSEMBLY.hybridcontigs.fasta
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34
out.REPASSEMBLY.singlets
drwxr-xr-x 2 boussau boussau 4.0K 2011-04-21 05:34 out.REPASSEMBLY.bnk
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34
out.REPASSEMBLY.singlets.fasta
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34 out.ZORRO.fasta
-rw-r--r-- 1 boussau boussau 0 2011-04-21 05:34
out.REPASSEMBLY.allsinglets.fasta

gustavoglcosta

unread,
Apr 21, 2011, 2:25:56 PM4/21/11
to zorro-a...@googlegroups.com

hi Bastien. probably this problem was caused by the use of scaffolds instead of contigs. this occurs because we use nucmer to detect the overlaps and nucmer alignments do not span NNNNNs regions. thank you for the suggestion. i will certainly embbed split_at_ns in the main zorro pipeline in the next version.


Regards,
Gustavo

----- Mensagem original -----

Bastien Boussau

unread,
Apr 24, 2011, 6:17:51 PM4/24/11
to Zorro - The masked assembler
Hi Gustavo,

I tried again after using split_at_ns on each of the two assemblies I
got. Unfortunately I still got an empty assembly. I paste below the
stderr given by Zorro. There seems to be a problem with data base
authorizations. Do you have any idea what I did wrong?
Sorry to bother you again and thank you for your help!

Bastien

Loading /home/boussau/DataCleaned21032011/
graph_prefix_41.scafSeq.parts.fasta...DONE
Loading /home/boussau/FirstAssemblyFemaleDNA/
Xves_1.0_scaffolds.parts.fasta...DONE

==> ENFORCING CONSISTENCY BETWEEN ASSEMBLY1 AND ASSEMBLY2
Running nucmer (alignAssembliesBRK)...DONE
Running delta-filter (alignAssembliesBRK)...DONE
Running show-coords (alignAssembliesBRK)...DONE
Generating new fasta files for assembly1...DONE
Generating new fasta file for assembly2...DONE
FIXED ASSEMBLY1: out.assembly1.BRK
FIXED ASSEMBLY2: out.assembly2.BRK

==> REPEAT MASKING
Counting 22-mers in /home/boussau/RawSequenceData_23032010/
AllCleanSequencesConcatenated.fq.fasta...DONE
Generating kmer-cov-plot for out.reads.22mers...DONE
Generating kmer frequency histogram for
out.assembly1.22mers_covplot...DONE
Generating masking file out.reads.22mers.cutoff52...DONE
Generating masking file out.reads.22mers.cutoff78...DONE
Running bowtie-build (out.assembly1.BRK,out.reads.
22mers.cutoff52)...Running bowtie (out.assembly1.BRK,out.reads.
22mers.cutoff52)...DONE
Merging neighbouring masked regions for out.assembly1.BRK...DONE
Running bowtie-build (out.assembly2.BRK,out.reads.
22mers.cutoff52)...Running bowtie (out.assembly2.BRK,out.reads.
22mers.cutoff52)...DONE
Merging neighbouring masked regions for out.assembly2.BRK...DONE

===> CORE ASSEMBLY
Creating AMOS BANK...DONE
DONE
Running nucmer with masked sequences...DONE
Parsing pairs of aligned sequences...DONE
Running nucmer with unmasked sequences...DONE
Filtering confident overlaps...DONE
Converting confident overlaps to AMOS format...DONE
Loading overlaps to AMOS BANK...DONE
Making hybrid contigs layout...DONE
Calling hybrid contigs consensus...Retrieving hybrid contigs from AMOS
BANK...Failed to open contig account in bank out.ALL.bnk:
WHAT: Could not open bank for reading, locked by 'w 1010 boussau'
LINE: 955
FILE: Bank_AMOS.cc

DONE
Retrieving singlets from AMOS BANK...DONE

===> TRIM REPETITIVE CONTIG ENDS (up to 100 bp) COVERED BY ONLY 1
ASSEMBLY
Running bowtie-build (out.hybrid.fasta,out.reads.
22mers.cutoff78)...Running bowtie (out.hybrid.fasta,out.reads.
22mers.cutoff78)...DONE
Merging neighbouring masked regions for out.hybrid.fasta...DONE
Retrieving consensus positions covered by a single source
assembly...DONE
Retrieving repetitive consensus positions and marking positions to be
trimmed...DONE
Trimming repetitive contig ends covered by only one source
assembly...DONE

===> FORCING SINGLETS ASSEMBLY
Loading sequences into a new AMOS BANK...DONE
Mapping old contig ids to new AMOS BANK ids...DONE
Finding rep overlaps...Starting on Sat Apr 23 04:11:11 2011

Read bank is out.REPASSEMBLY.bnk
Alignment error rate is 0.06
Minimum overlap bases is 40
** AMOS Exception **
WHAT: Could not open bank file, out.REPASSEMBLY.bnk/RED.ifo, No such
file or directory
LINE: 1037
FILE: Bank_AMOS.cc

DONE
Layout... AMOS Read Bank out.REPASSEMBLY.bnk does not exist
AMOS Overlap Bank out.REPASSEMBLY.bnk does not exist
DONE
Calling rep contigs consensus...Starting on Sat Apr 23 04:11:11 2011

Read bank is out.REPASSEMBLY.bnk
Alignment error rate is 0.06
Minimum overlap bases is 5
Output will be written to the bank
Input is being read from the bank
** AMOS Exception **
WHAT: Could not open bank file, out.REPASSEMBLY.bnk/RED.ifo, No such
file or directory
LINE: 1037
FILE: Bank_AMOS.cc

Retrieving hybrid contigs from AMOS BANK...Failed to open contig
account in bank out.REPASSEMBLY.bnk:
WHAT: Could not open bank for reading, locked by 'w 1010 boussau'
LINE: 955
FILE: Bank_AMOS.cc

DONE
Retrieving singlets from AMOS BANK...DONE

===> FINISHING ZORRO ASSEMBLY
Renaming final contigs...DONE
DONE

Gustavo Lacerda

unread,
Apr 27, 2011, 8:11:15 AM4/27/11
to zorro-a...@googlegroups.com
Hi Bastien,

Thank you for your feedback. Which version of AMOS are you using? Do you have some small subsample of your data that you could provide so we can try to reproduce the error? You could also run the minimus2 pipeline to verify if your AMOS installation is OK. Try to give read/write permissions to all users in the directory you are running zorro. Which OS are you using? Linux? Which filesystem?

Gustavo

2011/4/24 Bastien Boussau <bou...@gmail.com>



--
Gustavo Gilson Lacerda Costa
Bioinformatician at State University of Campinas (UNICAMP)
Work:(19)3521-6651 Cell:(19)9243-1559 Skype:gustavo.unicamp
www.researcherid.com/rid/B-6312-2009


Bastien Boussau

unread,
Apr 27, 2011, 8:11:09 PM4/27/11
to Zorro - The masked assembler
Hi Gustavo,

I am using AMOS 3.0.0, on a Linux version 2.6.35-22-server
(buildd@allspice) (gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu4) )
#33-Ubuntu SMP Sun Sep 19 20:48:58 UTC 2010.
Minimus2 used to run well in this directory, although on different
files.

I did a test where I only used the first 5000 lines of the three
required files, and still got the same mistake, so I can send you
these small files if you want to test them.

Thank you for your help!

Bastien.




On Apr 27, 5:11 am, Gustavo Lacerda <glace...@lge.ibi.unicamp.br>
wrote:
> Hi Bastien,
>
> Thank you for your feedback. Which version of AMOS are you using? Do you
> have some small subsample of your data that you could provide so we can try
> to reproduce the error? You could also run the minimus2 pipeline to verify
> if your AMOS installation is OK. Try to give read/write permissions to all
> users in the directory you are running zorro. Which OS are you using? Linux?
> Which filesystem?
>
> Gustavo
>
> 2011/4/24 Bastien Boussau <bous...@gmail.com>
> ...
>
> read more »

Steve

unread,
May 26, 2011, 8:46:37 AM5/26/11
to Zorro - The masked assembler
I am getting the same error as above, that is the problem appears to
begin at this point;

Read bank is out.REPASSEMBLY.bnk
Alignment error rate is 0.06
Minimum overlap bases is 40
** AMOS Exception **
WHAT: Could not open bank file, out.REPASSEMBLY.bnk/RED.ifo, No such
file or directory
LINE: 1037
FILE: Bank_AMOS.cc

Did you have any luck solving this problem after?

Cheers,

Stephen
> > > > > -rw-r--r-- 1- Hide quoted text -
>
> - Show quoted text -...
>
> read more »

Steve

unread,
May 26, 2011, 8:54:10 AM5/26/11
to Zorro - The masked assembler
Was looking at the error again and the RED.ifo file it is looking for
is not in the folder out.REPASSEMBLY.bnk. Can't see where this file is
generated though.
> > > > > > -rw-r--r-- 1 boussau- Hide quoted text -

Bastien Boussau

unread,
May 26, 2011, 7:29:51 PM5/26/11
to Zorro - The masked assembler
Hi Stephen,

I did not get any luck solving this problem, which is a pity since
Zorro looked promising. I use Minimus2 instead.

If you find a way to make Zorro work, please post it here!

Best,

Bastien
> > > > > > -rw-r--r-- 1 boussau...
>
> read more »

Gustavo Lacerda

unread,
May 26, 2011, 9:41:59 PM5/26/11
to zorro-a...@googlegroups.com
Hi Bastien and Steve,

I could reproduce this error with the small dataset that Bastien has sent to me.
Zorro assemblies contigs in two stages. First, all confident overlaps are found (those overlaps not caused by repeats). This stage is called CORE ASSEMBLY and yields the zorro contigs named HybridContigNNN in the zorro output.

Often, whole input contigs or contig regions can not be merged in the CORE ASSEMBLY. For these sequences, we do a second, less stringent, assembly (SINGLETS ASSEMBLY) where we try to merge the repeat regions and generate the RepContigsNNN that appear in the zorro output.

The problem that should be happening with your assemblies is that no sequence was left to the singlets assembly, thus RED.ifo becames empty and AMOS shows an error. In the next release, I will test if there are sequences left to the second assembly stage to avoid this error (thanks for your feedback). The good news is that this error message that AMOS shows does not cause problems to the rest os the zorro pipeline and the file out.ZORRO.fasta should contain the final zorro assembly. You can disconsider this AMOS error message safely.

Personally, I had never triggered this error before because always there were sequences left to the second assembly stage (SINGLETS ASSEMBLY).

I would like to reinforce that the two assemblies input to zorro should represent the whole genome. The read dataset should be representative of the whole genome, also. Finally, zorro assembly is as good as the assemblies that you input to it. If you have reasons to believe that one of the assemblies should contain many errors, it's useful to experiment with the parameters --noassemblyNbreak (If assembly 1 is bad, use --noassembly2break).

Next release 2.3 I promise to better document zorro, try to multithread some sections of the code (specially those that use nucmer) and hide the temporary files somewhere. Some minor bugs are being corrected also.

Regards,

Gustavo


2011/5/26 Bastien Boussau <bou...@gmail.com>

Steve

unread,
May 27, 2011, 10:25:50 AM5/27/11
to Zorro - The masked assembler
Hi Bastien,

What size files do you use with minimus2 and what kind of run times?

Cheers,

Stephen
> > > > > > > -rw-r--r-- 1 boussau boussau 134M 2011-04-20 23:21- Hide quoted text -
Reply all
Reply to author
Forward
0 new messages