Thanks Ben,
I’ve been involved in some genome projects, but stumbling through do it yourself assembly :)
Our N50 is just low, hence my concerns. With ALLPATHs, using both MP and PE data, this assembly was at n50 contig= ~12kb. In another concurrent assembly with similar data (data from different facility, different species) we had an N50 of 100 kb. We are working with birds, which are usually not too hard to assemble. Strategy was one lane MP one PE per species.
I was reading the ALLPATHs doc, and our data looked pretty similar to what they called “poor data quality”, where you get a major correction (loss of rare kmers) during whatever ALLPATHs does in their correction step. Anyway, I couldn’t think of other things, besides contamination, that would lead to such poor assembly. I haven’t ruled out contamination yet.
Now that it seems the kmer profile is ok, maybe I am overlooking obvious things. Maybe the depth of sequencing for the worse assembly is just a bit lower, and below some threshold where contiguity stats make a big bump up.
abyss command (didn’t add MP data yet):
abyss-pe k=32 np=8 name=AGPHv2 lib='pe4 pe9' pe4='MH-S1_S4_L005_R1_001_val_1.fq.gz MH-S1_S4_L005_R2_001_val_2.fq.gz' pe9='MH-S1_S9_L006_R1_001_val_1.fq.gz MH-S1_S9_L006_R2_001_val_2.fq.gz' &