Custom trait data file

7 views

Skip to first unread message

Max Sikolenko

unread,

Feb 8, 2023, 3:13:12 AM2/8/23

to CopyRighter

Hello!

I understand that it has been a long time since CopyRighter was developed, but it will be great if you can explain some things to me.

I am trying to create a custom data file for CopyRighter and I’m just a bit confused with the required file format.

In the README on GitHub, it’s written that for taxonomy lookup (-l desc), the data file should be quite simple: a “taxonomic string (col 1), and trait estimate (col 2), as illustrated in this example”:
# taxstring 16S rRNA count
k__Archaea; p__; c__; o__; f__; g__; s__ 1.57262

But the file “ssu_img40_gg201210.txt”, which is bundled with CopyRigter, contains two “tables” (two parts of the file) for taxonomy lokup. Here are their headers and some simple example rows:

The first “table” contains 2 columns, exactly as README describes (lines 1075178–1078113):
# tax_string 16S rRNA count
k__Archaea; p__; c__; o__; f__; g__; s__ 1.55415924023992
...
k__Bacteria; p__; c__; o__; f__; g__; s__ 2.45611068136988
...

The second “table” looks similar. It contains 3 columns (lines 1078115–1079495):
# taxonomy 16S rRNA count num_genomes
k__Archaea 1.46009 149
k__Bacteria 2.40183 2783
...

The two tables use different taxonomy format. 16S rRNA counts are different for the respective taxa, too.

If I manually delete the second “table” from the default data file, CopyRighter produces different results compared to the run when the intact data file is used. It means that CopyRighter uses the second “table” somehow. But the two “tables” look like mirroring each other: just 1) taxonomy and 2) copy numbers.

Could you please clarify:
1. What is the difference between “k__Archaea; p__; c__; o__; f__; g__; s__” and just “k__Archaea” and why 16S rRNA counts for them differ (1.55415924023992 and 1.46009, respectively)?
2. Should the second “table” be removed from the data file?
3. If it shouldn’t, then, I guess, 16S rRNA counts should be calculated in two different ways for “k__Archaea; p__; c__; o__; f__; g__; s__” and “k__Archaea”, shouldn’t them? And if so, what should be the difference?

It will be great if you can help me with this conundrum.

Thanks in advance,
Max Sikolenko

Reply all

Reply to author

Forward

0 new messages