output file format

866 views
Skip to first unread message

Won Cheol Yim

unread,
Aug 29, 2011, 9:37:06 PM8/29/11
to ABySS
Hi,

Is there anyone who can explain output format of Abyss?

I have

-1.adj
-3.adj
-1.fa
-3.fa
-3.dist
coverage.hist
lib1-3.dist
lib1-3.hist
-4.path1
-4.path2
-4.path3
-bubbles.fa
-indel.fa
-contigs.dot


What is difference -1, -3, -4, -5? Why they don't have -2?

Some k-mer have no contigs.fa is it possible?

And I'm still looking for manuals...

Thank you.

Won

Shaun Jackman

unread,
Aug 30, 2011, 1:41:45 PM8/30/11
to Won Cheol Yim, ABySS
Hi Won,

The final output file is ${name}-contigs.fa. The files
${name}-bubbles.fa and ${name}-indel.fa contain variant sequences. The
other files are intermediate steps and can be ignored.

If the ${name}-contigs.fa file is not present, either the assembly is
still in progress, or the assembly failed. Check the log file. Assemble
using the v=-v option to get more verbose output.
abyss-pe v=-v

Cheers,
Shaun

Won Cheol Yim

unread,
Aug 31, 2011, 1:04:23 AM8/31/11
to ABySS

Could you explain other formats?

What does that mean -1 and -3?

The file named coverage.hist, what is the column name?

Won

Abhishek Narain

unread,
Aug 31, 2011, 5:59:47 AM8/31/11
to Won Cheol Yim, ABySS
Dear Shaun

Adding to this query, what exactly do you mean by variant sequences ?
The variation in genome such as indel and inversion ?

Aby


--
Research Scientist,
UMCG, The Netherlands
------------------------------------------------------------
This email message, including any attachments, is for the sole use of
the intended recipient(s) and may contain information that is
proprietary, confidential, and exempt from disclosure under applicable
law. Any unauthorized review, use, disclosure, or distribution is
prohibited. If you have received this email in error please notify the
sender by return email and delete the original message.
========================================

Shaun Jackman

unread,
Aug 31, 2011, 12:51:51 PM8/31/11
to Won Cheol Yim, ABySS
Hi Won,

-1 is the output after the first stage of the assembly and -3 after the
third stage of the assembly. The second stage no longer exists. These
are intermediate outputs of the assembly and may be ignored.

The file coverage.hist is a k-mer coverage histogram. The first column
is the coverage, the second column is the count of k-mer with that
amount of coverage. For a more detailed explanation of a k-mer coverage
histogram, read the Jellyfish paper:
http://www.cbcb.umd.edu/software/jellyfish/

Cheers,
Shaun

Shaun Jackman

unread,
Aug 31, 2011, 12:53:16 PM8/31/11
to abhishe...@cantab.net, Won Cheol Yim, ABySS
Hi Aby,

You'll find in the -bubbles.fa and -indel.fa files both SNV and indel,
but not structural rearrangements such as inversions.

Cheers,
Shaun

Won Cheol Yim

unread,
Aug 31, 2011, 10:09:47 PM8/31/11
to ABySS
Thank you for your kindness reply.

Does coverage.hist depends on k-mer?

If it does, is there any way which k-mer is more efficient?

Thank you

Won

On 9월1일, 오전1시53분, Shaun Jackman <sjack...@bcgsc.ca> wrote:
> Hi Aby,
>
> You'll find in the -bubbles.fa and -indel.fa files both SNV and indel,
> but not structural rearrangements such as inversions.
>
> Cheers,
> Shaun
>
>
>
>
>
>
>
> On Wed, 2011-08-31 at 02:59 -0700, Abhishek Narain wrote:
> > Dear Shaun
>
> > Adding to this query, what exactly do you mean by variant sequences ?
> > The variation in genome such as indel and inversion ?
>
> > Aby
>

Abhishek Narain

unread,
Sep 1, 2011, 9:05:06 AM9/1/11
to Shaun Jackman, Won Cheol Yim, ABySS
Dear Shaun

Thank you for letting me know the details .

By SNV - do you mean Single Nucleotide Variation (also popularly known
as SNP - Single Nucleotide Polymorphism) ?

When I look at the first few lines of -indel.fa , I see:

>189 381 2001
TGTCAGAAGATACTAGACCAGCAAGTAATAATAATAATAAATATCAAAAATTACTGAAATGTATAATGGAAAAACAATAGAGATAATCAAGTAAACTAAAGCTGCCTCTTAGAAAAAGACAAATACAATGAATAAAGGTCTAATTAGGCTGGTTATAAAAAATTAGAAGAGACAGAAATTACCACTAACAAGAATGAATTAGCAGCATCACTACAGAACCTACAGATATCAAAACGAGAAAGAGTAGACATTATGAATTATTGTATGCCAGTAATTTGACAATTTAAACAAAATGAACTACTTCTTTGAATACCACAAACTACTAATGCTCATTCAAGAAGAAATAGATTGGTTGGGTAACATTAAATTTATTAAACCAAT
>339 141 596
ATGATGTGATTATTACACACCACATACCTGTATCAGAACATCTCATGTACCCATAAATATATACATTTATTATATACCCACAAAAAGTAAAAAAAAAAATTTTAAGAGCTGCTCTCTTAACAAATTTCAAGTAAGCAATAT
>348 105 143
CACCACAGTAGATGCTGGGGAGGATTCAGAGGTATTGGTGGAGGAGGTCAGACAGCCCTGGTCTTGTCCCAGCTCTGCCTTTGCCAGCTGTTAGAGCCTCAGTTT
>391 103 253
AACTTTATTTGCAAATCAGCTAACAACCAGATTTGGCTGTGGGCCATAGTTTTTGCCAGCTTGTGCTCTAGAATGAGTAATAGTTACCCAAACTTGGGAAGAT
>432 105 140
ATCTCCCTTATCTATAATTTTATTTATGTGAGTCTTTTCTCTGTTTTTATTACTCTCAGTTTTGTTGATATTTTCTTTTGTTCTTCTAGTATCTATTTCATTTAT

What does each number mean after the '>' sign ?

When I look at the first few lines of -bubbles.fa I see:

>1A 105 262
TATCTTATAATCTGCCCACTTGGGACCAGTAACACCTACACCAGAGACTATGGATGACTGGTGATGTAATTTAATAAAGGACCCAAGAATATTTGGAAACAGCCC
>1B 105 148
TATCTTATAATCTGCCCACTTGGGACCAGTAACACCTACACCAGAGACTATGAATGACTGGTGATGTAATTTAATAAAGGACCCAAGAATATTTGGAAACAGCCC
>2A 105 345
GTGCTATAGTTACAGCTGTTAGTCTGTACGGGCTCAATAATTCAACACTCATTCAACAAATATTTACTAACTGCCCACTGTGTGCCAGATGCTGTTCTTAGTGCT
>2B 105 340
GTGCTATAGTTACAGCTGTTAGTCTGTACGGGCTCAATAATTCAACACTCATGCAACAAATATTTACTAACTGCCCACTGTGTGCCAGATGCTGTTCTTAGTGCT
>3A 105 142
CTGCAAATGACAAGATTTCCTTCTTTTAAATGGCCAAATAGTATTCAATTGTACATTTTTTATCCATTTACTCTTTCATCTATTTATTCATAGATGGACACTTAT

What does things after '>' sign mean ?

Also if these are indels (insertion and deletions) they should be with
respect to a reference genome ?? But I don't think we specify
reference genome while doing computation ? So what exactly do you mean
by indels and SNV over here ?

Aby


--
Research Scientist,
UMC Groningen, The Netherlands

Shaun Jackman

unread,
Sep 6, 2011, 3:40:47 PM9/6/11
to Won Cheol Yim, ABySS
Hi Won,

Yes, coverage.hist does depend on the value of k. I'd recommend reading
the Jellyfish paper for more information on k-mer coverage histograms.
http://www.cbcb.umd.edu/software/jellyfish/

Cheers,
Shaun

Shaun Jackman

unread,
Sep 6, 2011, 3:42:53 PM9/6/11
to abhishe...@cantab.net, ABySS
Hi Aby,

The variants are compared to similar sequences in the assembly, not to
the reference. You can align the variant sequences to the assembly. The
ID after the > symbol is arbitrary, followed by length and k-mer
coverage.

Cheers,
Shaun

Safina A.R

unread,
Jun 18, 2015, 11:06:18 AM6/18/15
to abyss...@googlegroups.com, asce...@gmail.com
Hi Shaun,

I ran aabys and i got all the output file.  The folder contains these files.
  mangrove-6.fa
  mangrove-6.hist
coverage.hist                                          mangrove-6.path
mangrove-1.adj                                         mangrove-6.path.dot
mangrove-1.fa                                          mangrove-7.adj
mangrove_1.log                                         mangrove-7.fa
mangrove-1.path                                        mangrove-7.path
mangrove-2.adj                                         mangrove-8.dot
mangrove-2.path                                        mangrove-8.fa
mangrove-3.adj                                         mangrove-bubbles.fa
mangrove-3.dist                                        mangrove-contigs.dot
mangrove-3.fa                                          mangrove-contigs.fa
mangrove-3.hist                                        mangrove-indel.fa
mangrove-4.adj                                         mangrove-scaffolds.dot
mangrove-4.fa                                          mangrove-scaffolds.fa
mangrove-4.path1                                       mangrove-stats
mangrove-4.path2                                       mangrove-stats.csv
mangrove-4.path3                                       mangrove-stats.md
mangrove-5.adj                                         mangrove-stats.tab
mangrove-5.fa                                          mangrove-unitigs.fa
mangrove-5.path                                        nohup.out
mangrove-6.dist.dot                                    reads_1.fq
mangrove-6.dot                                         reads_2.fq

But i have a confusion that my $name-contig.fa file is is blue color and the file size is 0.

-rw-rw-r-- 1 lifescope lifescope      37814 Apr 29 23:57 coverage.hist
-rw-rw-r-- 1 lifescope lifescope  298946851 Apr 30 04:57 mangrove-1.adj
-rw-rw-r-- 1 lifescope lifescope  664785876 Apr 30 04:51 mangrove-1.fa
-rw-rw-r-- 1 lifescope lifescope       7104 Apr 30 09:19 mangrove_1.log
-rw-rw-r-- 1 lifescope lifescope   11351815 Apr 30 04:59 mangrove-1.path
-rw-rw-r-- 1 lifescope lifescope  288251686 Apr 30 04:59 mangrove-2.adj
-rw-rw-r-- 1 lifescope lifescope    1442310 Apr 30 05:02 mangrove-2.path
-rw-rw-r-- 1 lifescope lifescope  283732904 Apr 30 05:02 mangrove-3.adj
-rw-rw-r-- 1 lifescope lifescope    7004929 Apr 30 07:14 mangrove-3.dist
-rw-rw-r-- 1 lifescope lifescope  597369219 Apr 30 05:04 mangrove-3.fa
-rw-rw-r-- 1 lifescope lifescope       5739 Apr 30 07:07 mangrove-3.hist
-rw-rw-r-- 1 lifescope lifescope  284268167 Apr 30 07:16 mangrove-4.adj
-rw-rw-r-- 1 lifescope lifescope     982084 Apr 30 07:16 mangrove-4.fa
-rw-rw-r-- 1 lifescope lifescope    2615211 Apr 30 07:19 mangrove-4.path1
-rw-rw-r-- 1 lifescope lifescope    1347206 Apr 30 07:19 mangrove-4.path2
-rw-rw-r-- 1 lifescope lifescope    1346865 Apr 30 07:20 mangrove-4.path3
-rw-rw-r-- 1 lifescope lifescope  284333120 Apr 30 07:22 mangrove-5.adj
-rw-rw-r-- 1 lifescope lifescope     118478 Apr 30 07:22 mangrove-5.fa
-rw-rw-r-- 1 lifescope lifescope    1381154 Apr 30 07:22 mangrove-5.path
-rw-rw-r-- 1 lifescope lifescope     763150 Apr 30 09:08 mangrove-6.dist.dot
-rw-rw-r-- 1 lifescope lifescope  683012185 Apr 30 07:26 mangrove-6.dot
-rw-rw-r-- 1 lifescope lifescope  595545828 Apr 30 07:25 mangrove-6.fa
-rw-rw-r-- 1 lifescope lifescope       7766 Apr 30 09:03 mangrove-6.hist
-rw-rw-r-- 1 lifescope lifescope     238511 Apr 30 09:10 mangrove-6.path
-rw-rw-r-- 1 lifescope lifescope   11172072 Apr 30 09:10 mangrove-6.path.dot
-rw-rw-r-- 1 lifescope lifescope  278771588 Apr 30 09:14 mangrove-7.adj
-rw-rw-r-- 1 lifescope lifescope       8094 Apr 30 09:13 mangrove-7.fa
-rw-rw-r-- 1 lifescope lifescope     240712 Apr 30 09:13 mangrove-7.path
-rw-rw-r-- 1 lifescope lifescope  681424399 Apr 30 09:17 mangrove-8.dot
-rw-rw-r-- 1 lifescope lifescope  595962654 Apr 30 09:16 mangrove-8.fa
-rw-rw-r-- 1 lifescope lifescope   28755376 Apr 30 04:24 mangrove-bubbles.fa
lrwxrwxrwx 1 lifescope lifescope         14 May  7 15:39 mangrove-contigs.dot -> mangrove-6.dot
lrwxrwxrwx 1 lifescope lifescope         13 May  7 15:39 mangrove-contigs.fa -> mangrove-6.fa
-rw-rw-r-- 1 lifescope lifescope    3080628 Apr 30 05:04 mangrove-indel.fa
lrwxrwxrwx 1 lifescope lifescope         14 May  7 15:39 mangrove-scaffolds.dot -> mangrove-8.dot
lrwxrwxrwx 1 lifescope lifescope         13 May  7 15:39 mangrove-scaffolds.fa -> mangrove-8.fa
lrwxrwxrwx 1 lifescope lifescope         18 May  7 15:39 mangrove-stats -> mangrove-stats.tab
-rw-rw-r-- 1 lifescope lifescope        285 Apr 30 09:19 mangrove-stats.csv
-rw-rw-r-- 1 lifescope lifescope        471 Apr 30 09:19 mangrove-stats.md
-rw-rw-r-- 1 lifescope lifescope        285 Apr 30 09:19 mangrove-stats.tab
lrwxrwxrwx 1 lifescope lifescope         13 May  7 15:39 mangrove-unitigs.fa -> mangrove-3.fa
-rw------- 1 lifescope lifescope        276 Jun 17 11:26 nohup.out

 However, when i opened the file i doesn't containing the data or contigs and when i count the number of contigs in this file they were 5120939. than why does it showing the file size 0? secondly when i list al the items in the folder to get the size it says that $name-contigs.fa -> mangrove-6.fa.. 

Ben Vandervalk

unread,
Jun 18, 2015, 11:48:44 AM6/18/15
to Safina A.R, abyss...@googlegroups.com, asce...@gmail.com
Hi Safina,

What you see in normal.  $name-contigs.fa is a "symbolic link" to $name-6.fa:  https://en.wikipedia.org/wiki/Symbolic_link.

The main output file for the assembly is $name-scaffolds.fa, which is a symbolic link for $name-8.fa.  The numbers in the output filenames indicate the stage in the assembly pipeline that generated a particular file.

- Ben

--
You received this message because you are subscribed to the Google Groups "ABySS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to abyss-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages