The final output file is ${name}-contigs.fa. The files
${name}-bubbles.fa and ${name}-indel.fa contain variant sequences. The
other files are intermediate steps and can be ignored.
If the ${name}-contigs.fa file is not present, either the assembly is
still in progress, or the assembly failed. Check the log file. Assemble
using the v=-v option to get more verbose output.
abyss-pe v=-v
Cheers,
Shaun
Adding to this query, what exactly do you mean by variant sequences ?
The variation in genome such as indel and inversion ?
Aby
--
Research Scientist,
UMCG, The Netherlands
------------------------------------------------------------
This email message, including any attachments, is for the sole use of
the intended recipient(s) and may contain information that is
proprietary, confidential, and exempt from disclosure under applicable
law. Any unauthorized review, use, disclosure, or distribution is
prohibited. If you have received this email in error please notify the
sender by return email and delete the original message.
========================================
-1 is the output after the first stage of the assembly and -3 after the
third stage of the assembly. The second stage no longer exists. These
are intermediate outputs of the assembly and may be ignored.
The file coverage.hist is a k-mer coverage histogram. The first column
is the coverage, the second column is the count of k-mer with that
amount of coverage. For a more detailed explanation of a k-mer coverage
histogram, read the Jellyfish paper:
http://www.cbcb.umd.edu/software/jellyfish/
Cheers,
Shaun
You'll find in the -bubbles.fa and -indel.fa files both SNV and indel,
but not structural rearrangements such as inversions.
Cheers,
Shaun
Thank you for letting me know the details .
By SNV - do you mean Single Nucleotide Variation (also popularly known
as SNP - Single Nucleotide Polymorphism) ?
When I look at the first few lines of -indel.fa , I see:
>189 381 2001
TGTCAGAAGATACTAGACCAGCAAGTAATAATAATAATAAATATCAAAAATTACTGAAATGTATAATGGAAAAACAATAGAGATAATCAAGTAAACTAAAGCTGCCTCTTAGAAAAAGACAAATACAATGAATAAAGGTCTAATTAGGCTGGTTATAAAAAATTAGAAGAGACAGAAATTACCACTAACAAGAATGAATTAGCAGCATCACTACAGAACCTACAGATATCAAAACGAGAAAGAGTAGACATTATGAATTATTGTATGCCAGTAATTTGACAATTTAAACAAAATGAACTACTTCTTTGAATACCACAAACTACTAATGCTCATTCAAGAAGAAATAGATTGGTTGGGTAACATTAAATTTATTAAACCAAT
>339 141 596
ATGATGTGATTATTACACACCACATACCTGTATCAGAACATCTCATGTACCCATAAATATATACATTTATTATATACCCACAAAAAGTAAAAAAAAAAATTTTAAGAGCTGCTCTCTTAACAAATTTCAAGTAAGCAATAT
>348 105 143
CACCACAGTAGATGCTGGGGAGGATTCAGAGGTATTGGTGGAGGAGGTCAGACAGCCCTGGTCTTGTCCCAGCTCTGCCTTTGCCAGCTGTTAGAGCCTCAGTTT
>391 103 253
AACTTTATTTGCAAATCAGCTAACAACCAGATTTGGCTGTGGGCCATAGTTTTTGCCAGCTTGTGCTCTAGAATGAGTAATAGTTACCCAAACTTGGGAAGAT
>432 105 140
ATCTCCCTTATCTATAATTTTATTTATGTGAGTCTTTTCTCTGTTTTTATTACTCTCAGTTTTGTTGATATTTTCTTTTGTTCTTCTAGTATCTATTTCATTTAT
What does each number mean after the '>' sign ?
When I look at the first few lines of -bubbles.fa I see:
>1A 105 262
TATCTTATAATCTGCCCACTTGGGACCAGTAACACCTACACCAGAGACTATGGATGACTGGTGATGTAATTTAATAAAGGACCCAAGAATATTTGGAAACAGCCC
>1B 105 148
TATCTTATAATCTGCCCACTTGGGACCAGTAACACCTACACCAGAGACTATGAATGACTGGTGATGTAATTTAATAAAGGACCCAAGAATATTTGGAAACAGCCC
>2A 105 345
GTGCTATAGTTACAGCTGTTAGTCTGTACGGGCTCAATAATTCAACACTCATTCAACAAATATTTACTAACTGCCCACTGTGTGCCAGATGCTGTTCTTAGTGCT
>2B 105 340
GTGCTATAGTTACAGCTGTTAGTCTGTACGGGCTCAATAATTCAACACTCATGCAACAAATATTTACTAACTGCCCACTGTGTGCCAGATGCTGTTCTTAGTGCT
>3A 105 142
CTGCAAATGACAAGATTTCCTTCTTTTAAATGGCCAAATAGTATTCAATTGTACATTTTTTATCCATTTACTCTTTCATCTATTTATTCATAGATGGACACTTAT
What does things after '>' sign mean ?
Also if these are indels (insertion and deletions) they should be with
respect to a reference genome ?? But I don't think we specify
reference genome while doing computation ? So what exactly do you mean
by indels and SNV over here ?
Aby
--
Research Scientist,
UMC Groningen, The Netherlands
Yes, coverage.hist does depend on the value of k. I'd recommend reading
the Jellyfish paper for more information on k-mer coverage histograms.
http://www.cbcb.umd.edu/software/jellyfish/
Cheers,
Shaun
The variants are compared to similar sequences in the assembly, not to
the reference. You can align the variant sequences to the assembly. The
ID after the > symbol is arbitrary, followed by length and k-mer
coverage.
Cheers,
Shaun
--
You received this message because you are subscribed to the Google Groups "ABySS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to abyss-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.