Custom trait data file

7 views
Skip to first unread message

Max Sikolenko

unread,
Feb 8, 2023, 3:13:12 AM2/8/23
to CopyRighter
Hello!

I understand that it has been a long time since CopyRighter was developed, but it will be great if you can explain some things to me.

I am trying to create a custom data file for CopyRighter and I’m just a bit confused with the required file format.

In the README on GitHub, it’s written that for taxonomy lookup (-l desc), the data file should be quite simple: a “taxonomic string (col 1), and trait estimate (col 2), as illustrated in this example”:
    # taxstring   16S rRNA count
    k__Archaea; p__; c__; o__; f__; g__; s__      1.57262


But the file “ssu_img40_gg201210.txt”, which is bundled with CopyRigter, contains two “tables” (two parts of the file) for taxonomy lokup. Here are their headers and some simple example rows:

The first “table” contains 2 columns, exactly as README describes (lines 1075178–1078113):
# tax_string    16S rRNA count
k__Archaea; p__; c__; o__; f__; g__; s__    1.55415924023992
...
k__Bacteria; p__; c__; o__; f__; g__; s__   2.45611068136988
...


The second “table” looks similar. It contains 3 columns (lines 1078115–1079495):
# taxonomy  16S rRNA count  num_genomes
k__Archaea  1.46009 149
k__Bacteria 2.40183 2783
...


The two tables use different taxonomy format. 16S rRNA counts are different for the respective taxa, too.

If I manually delete the second “table” from the default data file, CopyRighter produces different results compared to the run when the intact data file is used. It means that CopyRighter uses the second “table” somehow. But the two “tables” look like mirroring each other: just 1) taxonomy and 2) copy numbers.

Could you please clarify:
1. What is the difference between “k__Archaea; p__; c__; o__; f__; g__; s__” and just “k__Archaea” and why 16S rRNA counts for them differ (1.55415924023992 and 1.46009, respectively)?
2. Should the second “table” be removed from the data file?
3. If it shouldn’t, then, I guess, 16S rRNA counts should be calculated in two different ways for “k__Archaea; p__; c__; o__; f__; g__; s__” and “k__Archaea”, shouldn’t them? And if so, what should be the difference?

It will be great if you can help me with this conundrum.

Thanks in advance,
Max Sikolenko
Reply all
Reply to author
Forward
0 new messages