Error with PathConsensus ...

marcovth

unread,

Dec 10, 2014, 9:56:35 AM12/10/14

to abyss...@googlegroups.com

Hello ...

I have been running this script ...

#!/bin/bash 

set -eu 

contigs=$1 
reads1=$2
reads2=$3 
k=$4 
output=$5 

bwa index $contigs 
bwa mem -a -t16 -S -P -k$k $contigs $reads1 $reads2 | gzip > $contigs.sam.gz 
abyss-longseqdist -k$k  $contigs.sam.gz | grep -v "l=" > $contigs.dist.dot 
abyss-scaffold -v -k$k -s200- -n1 -g $contigs.path.dot $contigs $contigs.dist.dot > $contigs.path 
PathConsensus -v -k$k -p1 -s ${contigs}2 -g ${contigs}2.adj -o ${contigs}2.path $contigs $contigs $contigs.path 
cat $contigs ${contigs}2.fa | MergeContigs -v -k$k -o $output - ${contigs}2.adj ${contigs}2.path

However, with PathConsensus I am getting the following error ...









PathConsensus -v -k 25 -p1 -s Cgriseus17AGY.seq2 -g Cgriseus17AGY.seq2.adj -o Cgriseus17AGY.seq2.path Cgriseus17AGY.seq Cgriseus17AGY.seq Cgriseus17AGY.seq.path 
Reading `Cgriseus17AGY.seq'...
error: Expected `=' and saw `r'

What could be wrong with my Cgriseus17AGY.seq file?

This file are these fasta files concatenated into a single file ...

http://www.ncbi.nlm.nih.gov/Traces/wgs/?val=APMK01#contigs

Thanks for any suggestion.

- Marco.

Ben Vandervalk

unread,

Dec 10, 2014, 12:09:14 PM12/10/14

to marcovth, abyss...@googlegroups.com

Hi Marco,

On the line:

$ PathConsensus -v -k$k -p1 -s ${contigs}2 -g ${contigs}2.adj -o ${contigs}2.path $contigs $contigs $contigs.path

Replace the second occurrence of $contigs with '$contigs.path.dot'.

- Ben

--
You received this message because you are subscribed to the Google Groups "ABySS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to abyss-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

marcovth

unread,

Dec 10, 2014, 1:23:10 PM12/10/14

to abyss...@googlegroups.com, marc...@gmail.com

Thanks a lot for the reply.

Do you mean? ...

PathConsensus -v -k 25 -p1 -s Cgriseus17AGY.seq2 -g Cgriseus17AGY.seq2.adj -o Cgriseus17AGY.seq2.path Cgriseus17AGY.seq Cgriseus17AGY.seq.path.dot Cgriseus17AGY.seq.path

Reading `Cgriseus17AGY.seq.path.dot'...

Reading `Cgriseus17AGY.seq'...

error: unexpected ID: `gi|530169401|gb|APMK01000002.1|'

I also tried without success ...

PathConsensus -v -k 25 -p1 -s Cgriseus17AGY.seq2 -g Cgriseus17AGY.seq2.adj -o Cgriseus17AGY.seq2.path Cgriseus17AGY.seq.path.dot Cgriseus17AGY.seq Cgriseus17AGY.seq.path

Ben Vandervalk

unread,

Dec 10, 2014, 4:24:34 PM12/10/14

to marcovth, abyss...@googlegroups.com

Hi Marco,

The first command line is the right one, I think (based on PathConsensus --help).

Sorry to hear you are getting another error, though.

I notice that 'gi|530169401|gb|APMK01000002.1|' is only the second sequence in your list of input FASTA sequences. That leads me to suspect that the problem is due to using multiline FASTA as input.

Could you try converting to single-line fasta and running your script again? I think you can do the conversion with:

$ abyss-tofastq Cgriseus17AGY.seq > single-line.fasta

Another thing to try would be replacing the FASTA IDs with integers (to see if the problem is related to parsing the FASTA headers.)

- Ben

marcovth

unread,

Dec 11, 2014, 10:52:35 AM12/11/14

to abyss...@googlegroups.com

Unfortunately, abyss-tofastq didn't make a difference.

Do you know of a tools that could delete the "Cricetulus griseus strain 17A/GY chromosome 1 chr1_contig_64483, whole genome shotgun sequence" in each of the fasta ID's?

The fasta file is too big for a text editor.

e.g.

>gi|529499428|gb|APMK01319215.1| Cricetulus griseus strain 17A/GY chromosome 1 chr1_contig_64483, whole genome shotgun sequence

Ka Ming Nip

unread,

Dec 11, 2014, 3:15:09 PM12/11/14

to abyss...@googlegroups.com

Hi Marco,

You can use the unix `cut` command, ie.

$ cut -f1 -d ' ' input_reads.fa | abyss-tofastq > output_reads.fa

Note:
-d ' ' to set space character as delimiter
-f1 to extract the first column

Hope that helps!

Ka Ming

Reply all

Reply to author

Forward