Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Benchmarking reassembly/Loading fasta files

3 views
Skip to first unread message

N.E.Whiteford

unread,
Jul 13, 2006, 1:04:36 PM7/13/06
to sta...@magpie.bio.indiana.edu
Hi All,

As part of my PhD project I'm working on a tool to benchmark reassembly
algorithms. To do this I'm planning on doing the following:

1. Taking a sequence file and breaking it into reads of a specified
length and during this process adding errors.

2. Reassembly these simulated reads with the reassembly programs
available in GAP4.

3. Align contigs of a useful size to the original sequence, note those
that align within a given edit distance.

4. Calculate the percentage of the sequence that is covered by contigs.

I have just completed the alignment with edit distance tool and am now
beginning the processes of benchmarking reassembly algorithms. Does
anybody have any thoughts or suggestions? I should say that my main
interest is short read reassembly.

Secondly, I'm having a problem with GAP4. It only seems to load
19 sequences from my fasta file. My fasta file looks like this:

>R0
CCAATTAGTCCTATTAAGAC

>R1
CAATTAGTCCTATTAAGACT

>R2
AATTAGTCCTATTAAGACTG

>R3
ATTAGTCCTATTAAGACTGT

However if I include any more than 19 sequences in my fasta file I
get the following error:

Failed files:
/home/new/A1.fasta (UNK) 'init: Unknown file type'

Is this a bug? Or I'm I doing something wrong?

Many Thanks for Reading,

Nava Whiteford

0 new messages