Error in stage1, perhaps merge.pl

45 views
Skip to first unread message

kadota

unread,
Feb 16, 2011, 1:23:24 AM2/16/11
to Trans-ABySS, kad...@iu.a.u-tokyo.ac.jp
Hi,

I would like to assemble Illumina's 76 bp paired-end data with no
reference genome.
For simplicity, I explain the problem with two k-assembled data (i.e.,
k=59 and 61).
I used ABySS ver.1.2.5 and trans-ABySS ver.1.2.0.
Could you advice me to break down my problem?
My error is as follows:
----------------------------------------------------------------
[kadota@hpcs02 ~]$ trans-abyss -i input -1
input: input
/home/kadota
topdir: /home/kadota/L1; reference: none
...
/home/kadota/trans-ABySS/utilities/facN /home/kadota/L1/Assembly/
abyss-1.2.5/k59/L1-contigs.fa /home/kadota/L1/Assembly/abyss-1.2.5/k61/
L1-contigs.fa >/home/kadota/L1/Assembly/abyss-1.2.5/stats
time /home/kadota/trans-ABySS/wrappers/merge.pl /home/kadota/L1/
Assembly/abyss-1.2.5 L1 contigs /home/kadota/L1/Assembly/abyss-1.2.5/
merge/merging local
Maximum single piece size (5000) exceeded by query 1106370 of size
(6471). Larger pieces will have to be split up until no larger than
this limit when the -fastMap option is used.
/home/kadota/bin/x86_64/blat /home/kadota/L1/Assembly/abyss-1.2.5/k61/
L1-contigs.fa /home/kadota/L1/Assembly/abyss-1.2.5/k59/L1-contigs.fa /
home/kadota/L1/Assembly/abyss-1.2.5/merge/merging/contigs/round1/k59-
k61.psl -minIdentity=100 -maxGap=0 -fastMap failed: 65280 at /home/
kadota/trans-ABySS/wrappers/merge.pl line 209, <FILE> line 18.
Command exited with non-zero status 9
52.62user 0.66system 0:53.29elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (0major+31134minor)pagefaults 0swaps
local? local
parse: /usr/local/python/bin/python /home/kadota/trans-ABySS/utilities/
contig_merger.py
k59 k61
/home/kadota/L1/Assembly/abyss-1.2.5/k59/L1-contigs.fa /home/kadota/L1/
Assembly/abyss-1.2.5/k61/L1-contigs.fa /home/kadota/L1/Assembly/
abyss-1.2.5/merge/merging/contigs/round1
.::/home/kadota/trans-ABySS
.::/home/kadota/trans-ABySS
mkdir /home/kadota/L1/Assembly/abyss-1.2.5/merge/merging/contigs/
round1
/home/kadota/bin/x86_64/blat /home/kadota/L1/Assembly/abyss-1.2.5/k61/
L1-contigs.fa /home/kadota/L1/Assembly/abyss-1.2.5/k59/L1-contigs.fa /
home/kadota/L1/Assembly/abyss-1.2.5/merge/merging/contigs/round1/k59-
k61.psl -minIdentity=100 -maxGap=0 -fastMap
Loaded 29360156 letters in 148938 sequences
/home/kadota/trans-ABySS/utilities/facN /home/kadota/L1/Assembly/
abyss-1.2.5/merge/L1-contigs.fa >/home/kadota/L1/Assembly/abyss-1.2.5/
merge/stats_merged_assembly
/home/kadota/L1/Assembly/abyss-1.2.5/merge/L1-contigs.fa: No such file
or directory
[kadota@hpcs02 ~]$
----------------------------------------------------------------

Followings are my detailed settings:
[kadota@hpcs02 ~]$ tail -n 3 ~/trans-ABySS/configs/projects.cfg
[test1]
topdir: /home/kadota
reference: none

[kadota@hpcs02 ~]$ more input
L1 1.2.5 /home/kadota/LIB0001 test1 75

[kadota@hpcs02 k59]$ ls -la /home/kadota/LIB0001/k59
total 1479588
drwxr-xr-x 2 kadota users 4096 Feb 16 12:03 .
drwxr-xr-x 4 kadota users 4096 Feb 16 15:21 ..
-rw-r--r-- 1 kadota users 103421 Feb 16 12:03 coverage.hist
-rw-r--r-- 1 kadota users 49644807 Feb 16 12:03 L1-1.adj
-rw-r--r-- 1 kadota users 111420924 Feb 16 12:03 L1-1.fa
-rw-r--r-- 1 kadota users 31156 Feb 16 12:03 L1-1.path
-rw-r--r-- 1 kadota users 49548883 Feb 16 12:03 L1-3.adj
-rw-r--r-- 1 kadota users 4801217 Feb 16 12:03 L1-3.dist
-rw-r--r-- 1 kadota users 111214101 Feb 16 12:03 L1-3.fa
-rw-r--r-- 1 kadota users 3747 Feb 16 12:03 L1-3.hist
-rw-r--r-- 1 kadota users 840436736 Feb 16 12:03 L1-3.sam.gz
-rw-r--r-- 1 kadota users 49742622 Feb 16 12:03 L1-4.adj
-rw-r--r-- 1 kadota users 390488 Feb 16 12:03 L1-4.fa
-rw-r--r-- 1 kadota users 1625338 Feb 16 12:03 L1-4.path1
-rw-r--r-- 1 kadota users 1347312 Feb 16 12:03 L1-4.path2
-rw-r--r-- 1 kadota users 49742622 Feb 16 12:03 L1-5.adj
-rw-r--r-- 1 kadota users 0 Feb 16 12:03 L1-5.fa
-rw-r--r-- 1 kadota users 1347312 Feb 16 12:03 L1-5.path
-rw-r--r-- 1 kadota users 253527 Feb 16 12:03 L1-bubbles.fa
-rw-r--r-- 1 kadota users 131465354 Feb 16 12:03 L1-contigs.dot
-rw-r--r-- 1 kadota users 110271393 Feb 16 12:03 L1-contigs.fa
-rw-r--r-- 1 kadota users 105246 Feb 16 12:03 L1-indel.fa

[kadota@hpcs02 k59]$ ls -la /home/kadota/LIB0001/k61
total 1273848
drwxr-xr-x 2 kadota users 4096 Feb 15 03:18 .
drwxr-xr-x 4 kadota users 4096 Feb 16 15:21 ..
-rw-r--r-- 1 kadota users 97811 Feb 15 00:26 coverage.hist
-rw-r--r-- 1 kadota users 41292962 Feb 15 00:58 L1-1.adj
-rw-r--r-- 1 kadota users 97024066 Feb 15 00:58 L1-1.fa
-rw-r--r-- 1 kadota users 21956 Feb 15 00:58 L1-1.path
-rw-r--r-- 1 kadota users 41222156 Feb 15 00:58 L1-3.adj
-rw-r--r-- 1 kadota users 4715240 Feb 15 03:15 L1-3.dist
-rw-r--r-- 1 kadota users 96867810 Feb 15 00:58 L1-3.fa
-rw-r--r-- 1 kadota users 3666 Feb 15 02:40 L1-3.hist
-rw-r--r-- 1 kadota users 728191472 Feb 15 02:54 L1-3.sam.gz
-rw-r--r-- 1 kadota users 41470218 Feb 15 03:15 L1-4.adj
-rw-r--r-- 1 kadota users 535017 Feb 15 03:15 L1-4.fa
-rw-r--r-- 1 kadota users 1635096 Feb 15 03:17 L1-4.path1
-rw-r--r-- 1 kadota users 1346165 Feb 15 03:17 L1-4.path2
-rw-r--r-- 1 kadota users 41470218 Feb 15 03:17 L1-5.adj
-rw-r--r-- 1 kadota users 0 Feb 15 03:17 L1-5.fa
-rw-r--r-- 1 kadota users 1346165 Feb 15 03:17 L1-5.path
-rw-r--r-- 1 kadota users 198717 Feb 15 00:57 L1-bubbles.fa
-rw-r--r-- 1 kadota users 109427728 Feb 15 03:18 L1-contigs.dot
-rw-r--r-- 1 kadota users 96068984 Feb 15 03:18 L1-contigs.fa
-rw-r--r-- 1 kadota users 79230 Feb 15 00:59 L1-indel.fa

[kadota@hpcs02 k59]$ ls -la /home/kadota/L1/Assembly/abyss-1.2.5/k59
total 39088
drwxr-xr-x 2 kadota users 4096 Feb 16 15:11 .
drwxr-xr-x 10 kadota users 4096 Feb 16 13:59 ..
-rw-r--r-- 1 kadota users 39970038 Feb 16 15:11 L1-contigs.fa

[kadota@hpcs02 k59]$ ls -la /home/kadota/L1/Assembly/abyss-1.2.5/k61
total 34704
drwxr-xr-x 2 kadota users 4096 Feb 16 15:13 .
drwxr-xr-x 10 kadota users 4096 Feb 16 13:59 ..
-rw-r--r-- 1 kadota users 35486814 Feb 16 15:13 L1-contigs.fa

R R

unread,
Feb 16, 2011, 4:05:30 AM2/16/11
to trans...@googlegroups.com
Blat with -fastMap option is used as an alignment tool in trans-ABySS to merge assemblies from different k's. It seems there is an error in running blat:


"Maximum single piece size (5000) exceeded by query 1106370 of size (6471). Larger pieces will have to be split up until no larger than this limit when the -fastMap option is used."

It seems to say that the query size cannot exceed 5000? Unfortunately I don't have experience with this behavior of blat. Can you check if blat is installed properly and is the correct version etc.?
Rong

kadota

unread,
Feb 16, 2011, 4:47:19 AM2/16/11
to Trans-ABySS
Thanks for your quick reply.
I will chek it.
Kadota

kadota

unread,
Feb 20, 2011, 11:17:49 PM2/20/11
to Trans-ABySS
Dear trans-ABySS developer,

I checked our data and Blat.

In short, I desiderate...
1) Could you tell me the way to remove "-fastMap" option in Blat in
trans-ABySS analysis pipeline?
and
2) related to above, could you add an option without the "-fastMap" in
the next version of trans-ABySS?
or
3) Could you remove long contigs with > 5000bp after Stage1.A (i.e.,
after assembly.py) if you want to make the "-fastMap" option remain?

Our 76bp paired-end data with ABySS assembly produced many long (>
5000bp) contigs. Such long contigs remains after processing by
"assembly.py" (i.e., Stage 1.A). Perhaps, that's the reason.

I could not find the description about the Blat options in merge.pl.
Your help is greatly appreciated.
Kadota

Following is the example after processing ABySS contigs by
assembly.py:
-----------------------------------------------------
[kadota@hpcs02 ~]$ grep ">" /home/kadota/L1/Assembly/abyss-1.2.5/k59/
L1-contigs.fa | sort -n -k2 | tail
>1118118 9002 422099 1079570-,1101672+,717988+,204552+,813601+,1047879-,517278+,88398+,349232-,1101692-,341620+,766281+,972607-
>1110173 9059 393400 341620-,1101692+,349232+,88398-,517278-,1047879+,813601-,204552-,717988-,1101672-,1079570+,572670+,1065211-
>1110172 9185 397278 341620-,1101692+,349232+,88398-,517278-,1047879+,813601-,204552-,717988-,1101672-,1079570+,323310+,1065211-
>1110839 9256 331838 401319+,172601+,25815+,900613+,891206+,752999-,985964+,524974-,986827-,1104424-,362272+,744924+,727097+,1000761-,82727-
>1109742 9278 337308 304144-,214530+,629729-,321166-,59513-,551003-,561074+,776624-
>1116088 9422 206202 879817-,1104269-,326041-,724853+,492878+,694197-,391221-,590454-,1103993-,265374-,497452-,171551-,1103453+,1102331+,1106157-,1101954-,1103946-,255609-
>1110957 10028 378531 412214-,350023-,749782+,766060-,127822-,942308+,648045+,944229-
>1111459 10369 511882 453453+,368780-,1076525-,485521-,732482+,323143-,113035+,939093-,1101921-
>1109997 10813 266698 326895-,190950-,787384-,827443+,1037209-,300745-,749265+,1016255+,639782+,955580-,391003+,1041713+
>1112517 25739 945677 545173+,968076+,658236-,146130-,358061-,692828-,802698-,316238-,940492-,939499+,89676+,481787-,627120-,191567-,499079+,698413+,1101711-,185358+,706199-,1092629+,173399+,130201-
[kadota@hpcs02 ~]$
[kadota@hpcs02 ~]$
[kadota@hpcs02 ~]$ grep ">" /home/kadota/L1/Assembly/abyss-1.2.5/k61/
L1-contigs.fa | sort -n -k2 | tail
>946611 9518 292340 228768-,918926+,398726+,122660-,298486-,759703-,43585+,494239+,850027-
>955831 9645 390766 937662-,839989-,800698+,448358-,394161+,846657+,136349+,2922+
>949069 9695 398500 400882+,539675+,665695+,937906-,497672+,890856-,589260-,460321+,385347+
>949068 9821 401949 400882+,240187+,665695+,937906-,497672+,890856-,589260-,460321+,385347+
>955121 10386 345468 880957-,17479+,842474-,591295-,857098+,924336+,773423+,165212+,236873+,807507-
>952429 10488 492724 665695+,937906-,497672+,890856-,589260-,460321+,385347+,392193+,881465+,754964+,139884-,499730-,908399+
>946612 10657 308931 228768-,918926+,398726+,122660-,298486-,759703-,43585+,850026-,226026+,53908+,808971+
>951012 13097 327739 548691-,942469+,905996+,939665-,150953+,939664+,518554+,528608+,53908-,226026-,850026+,43585-,759703+,298486+,122660+,398726-,918926-,228768+
>951915 24516 710757 621774+,818181+,313267+,412510-,654255-,883577+,758739+,349949+,761405-,771263+,392065+,918077+
>951914 24519 711128 621774+,818181+,313266+,412510-,654255-,883577+,758739+,349949+,761405-,771263+,392065+,918077+
[kadota@hpcs02 ~]$
-----------------------------------------------------
> > Rong- 引用テキストを表示しない -
>
> - 引用テキストを表示 -

kadota

unread,
Feb 20, 2011, 11:19:15 PM2/20/11
to Trans-ABySS

kadota

unread,
Feb 20, 2011, 11:21:43 PM2/20/11
to Trans-ABySS

R R

unread,
Feb 21, 2011, 12:26:20 AM2/21/11
to trans...@googlegroups.com
To remove "-fastMap" option, open "configs/projects.cfg" file, remove "-fastMap" from both "blat_1_2" and "blat_2_1" lines. However, this will take much longer time to run.

Alternatively, to remove contigs >5000bp, simply use awk:
awk '/^>/ { if ($2<=5000) print; getline; print }' contigs.fa
Rong

R R

unread,
Feb 21, 2011, 12:29:07 AM2/21/11
to trans...@googlegroups.com
Sorry the awk command should be:

awk '/^>/ { if ($2<=5000) { print; getline; print } }' contigs.fa

kadota

unread,
Feb 21, 2011, 1:01:52 AM2/21/11
to Trans-ABySS
Dear RR
Many thanks for your quick reply.
I will try your advice.
kadota

kadota

unread,
Feb 28, 2011, 2:57:15 AM2/28/11
to Trans-ABySS
Dear Developer,

By virtue of your advice (by removing "-fastMap" option), I could
finish state 1 correctly as follows:
-----------------------------------------------
[kadota@hpcs02 merge]$ ls -la /home/kadota/L1/Assembly/abyss-1.2.5/
merge
total 53332
drwxr-xr-x 4 kadota users 4096 Feb 28 15:52 .
drwxr-xr-x 10 kadota users 4096 Feb 28 11:55 ..
drwxr-xr-x 3 kadota users 4096 Feb 28 11:50 aligns
drwxrwxrwx 3 kadota users 4096 Feb 28 11:55 cluster
-rw-r--r-- 1 kadota users 49860537 Feb 28 15:46 L1-contigs.fa
-rw-r--r-- 1 kadota users 4661685 Feb 28 15:46 log.txt
-rw-r--r-- 1 kadota users 200 Feb 28 15:46 stats_merged_assembly
-----------------------------------------------

However, I encountered the next probrem in stage 2. My problem may be
related to a missing file "/archive/solexa1_4/analysis/data_path/
illumina_data_paths.txt". According to the previous discussion (http://
groups.google.com/group/trans-abyss/browse_thread/thread/
488989bcc16fa8dc), the bug should be corrected in trans-ABySS ver.
1.2.0. So my problem might not be related to a missing file.

Following is the error message.
-----------------------------------------------
[kadota@hpcs02 ~]$ pwd
/home/kadota
[kadota@hpcs02 ~]$ trans-abyss -i input -2
input: input
package_setup: export TRANSABYSS_PATH=/home/kadota/trans-ABySS;export
PYTHONPATH=.:$PYTHONPATH:$TRANSABYSS_PATH;export PERL5LIB=.:$PERL5LIB:
$TRANSABYSS_PATH/wrappers
input:input
L1
/home/kadota/L1/Assembly/abyss-1.2.5/reads_to_contigs/L1-contigs.bam
/home/kadota/L1/Assembly/abyss-1.2.5/reads_to_contigs/cluster
/usr/local/python/bin/python /home/kadota/trans-ABySS/utilities/
reads_to_contigs.py L1 /home/kadota/L1/Assembly/abyss-1.2.5/merge/L1-
contigs.fa /home/kadota/L1/Assembly/abyss-1.2.5/reads_to_contigs -r
Traceback (most recent call last):
File "/home/kadota/trans-ABySS/utilities/reads_to_contigs.py", line
544, in <module>
libs = parse_datapath(options.info)
File "/home/kadota/trans-ABySS/utilities/reads_to_contigs.py", line
26, in parse_datapath
for line in open(datapath_file, 'r'):
IOError: [Errno 2] No such file or directory: '/archive/solexa1_4/
analysis/data_path/illumina_data_paths.txt'
/usr/local/python/bin/python /home/kadota/trans-ABySS/utilities/
reads_to_contigs.py L1 /home/kadota/L1/Assembly/abyss-1.2.5/merge/L1-
contigs.fa /home/kadota/L1/Assembly/abyss-1.2.5/reads_to_contigs -c -a
bowtie


/usr/local/python/bin/python /home/kadota/trans-ABySS/utilities/
reads_to_contigs.py L1 /home/kadota/L1/Assembly/abyss-1.2.5/merge/L1-
contigs.fa /home/kadota/L1/Assembly/abyss-1.2.5/reads_to_contigs -a
bowtie
[kadota@hpcs02 ~]$
-----------------------------------------------

Your help is greatly appreciated.
kadota
> > - 引用テキストを表示 -- 引用テキストを表示しない -
>
> - 引用テキストを表示 -

R R

unread,
Feb 28, 2011, 12:15:01 PM2/28/11
to trans...@googlegroups.com
You need a "in" file that specifies the reads files. Please refer to user manual for expected files in the ABySS directory.
Rong

kadota

unread,
Feb 28, 2011, 8:34:17 PM2/28/11
to Trans-ABySS
Thanks for your advice.
I did not use "in" file in ABySS step because the shell script "run-
abyss" could not work well when using "in" file. But anyway, I will
make it.
Thank you very much.
kadota
Reply all
Reply to author
Forward
0 new messages