Error with PathConsensus ...

17 views
Skip to first unread message

marcovth

unread,
Dec 10, 2014, 9:56:35 AM12/10/14
to abyss...@googlegroups.com
Hello ...

I have been running this script ...

#!/bin/bash 

set -eu 

contigs=$1 
reads1=$2
reads2=$3 
k=$4 
output=$5 

bwa index $contigs 
bwa mem -a -t16 -S -P -k$k $contigs $reads1 $reads2 | gzip > $contigs.sam.gz 
abyss-longseqdist -k$k  $contigs.sam.gz | grep -v "l=" > $contigs.dist.dot 
abyss-scaffold -v -k$k -s200- -n1 -g $contigs.path.dot $contigs $contigs.dist.dot > $contigs.path 
PathConsensus -v -k$k -p1 -s ${contigs}2 -g ${contigs}2.adj -o ${contigs}2.path $contigs $contigs $contigs.path 
cat $contigs ${contigs}2.fa | MergeContigs -v -k$k -o $output - ${contigs}2.adj ${contigs}2.path 


However, with PathConsensus I am getting the following error ...

PathConsensus -v -k 25 -p1 -s Cgriseus17AGY.seq2 -g Cgriseus17AGY.seq2.adj -o Cgriseus17AGY.seq2.path Cgriseus17AGY.seq Cgriseus17AGY.seq Cgriseus17AGY.seq.path 

Reading `Cgriseus17AGY.seq'...

error: Expected `=' and saw `r'



What could be wrong with my Cgriseus17AGY.seq file?

This file are these fasta files concatenated into a single file ...

Thanks for any suggestion.

- Marco.

Ben Vandervalk

unread,
Dec 10, 2014, 12:09:14 PM12/10/14
to marcovth, abyss...@googlegroups.com
Hi Marco,

On the line:

$ PathConsensus -v -k$k -p1 -s ${contigs}2 -g ${contigs}2.adj -o ${contigs}2.path $contigs $contigs $contigs.path 

Replace the second occurrence of $contigs with '$contigs.path.dot'.

- Ben

--
You received this message because you are subscribed to the Google Groups "ABySS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to abyss-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

marcovth

unread,
Dec 10, 2014, 1:23:10 PM12/10/14
to abyss...@googlegroups.com, marc...@gmail.com

Thanks a lot for the reply.

Do you mean? ...

PathConsensus -v -k 25 -p1 -s Cgriseus17AGY.seq2 -g Cgriseus17AGY.seq2.adj -o Cgriseus17AGY.seq2.path Cgriseus17AGY.seq Cgriseus17AGY.seq.path.dot Cgriseus17AGY.seq.path 

Reading `Cgriseus17AGY.seq.path.dot'...

Reading `Cgriseus17AGY.seq'...

error: unexpected ID: `gi|530169401|gb|APMK01000002.1|'


I also tried without success ...

PathConsensus -v -k 25 -p1 -s Cgriseus17AGY.seq2 -g Cgriseus17AGY.seq2.adj -o Cgriseus17AGY.seq2.path Cgriseus17AGY.seq.path.dot Cgriseus17AGY.seq Cgriseus17AGY.seq.path 



Ben Vandervalk

unread,
Dec 10, 2014, 4:24:34 PM12/10/14
to marcovth, abyss...@googlegroups.com
Hi Marco,

The first command line is the right one, I think (based on PathConsensus --help).

Sorry to hear you are getting another error, though. 

I notice that 'gi|530169401|gb|APMK01000002.1|' is only the second sequence in your list of input FASTA sequences.   That leads me to suspect that the problem is due to using multiline FASTA as input.

Could you try converting to single-line fasta and running your script again?  I think you can do the conversion with:

$ abyss-tofastq Cgriseus17AGY.seq > single-line.fasta

Another thing to try would be replacing the FASTA IDs with integers (to see if the problem is related to parsing the FASTA headers.)

- Ben

marcovth

unread,
Dec 11, 2014, 10:52:35 AM12/11/14
to abyss...@googlegroups.com
Unfortunately, abyss-tofastq didn't make a difference.

Do you know of a tools that could delete the "Cricetulus griseus strain 17A/GY chromosome 1 chr1_contig_64483, whole genome shotgun sequence" in each of the fasta ID's? 
The fasta file is too big for a text editor.

e.g. 

>gi|529499428|gb|APMK01319215.1| Cricetulus griseus strain 17A/GY chromosome 1 chr1_contig_64483, whole genome shotgun sequence

Ka Ming Nip

unread,
Dec 11, 2014, 3:15:09 PM12/11/14
to abyss...@googlegroups.com
Hi Marco,

You can use the unix `cut` command, ie.

$ cut -f1 -d ' ' input_reads.fa | abyss-tofastq > output_reads.fa

Note:
-d ' ' to set space character as delimiter
-f1 to extract the first column

Hope that helps!

Ka Ming
Reply all
Reply to author
Forward
0 new messages