Max Sikolenko
unread,Feb 8, 2023, 3:13:12 AM2/8/23Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to CopyRighter
Hello!
I understand that it has been a long time since CopyRighter was developed, but it will be great if you can explain some things to me.
I am trying to create a custom data file for CopyRighter and I’m just a bit confused with the required file format.
In the README on GitHub, it’s written that for taxonomy lookup (-l desc), the data file should be quite simple: a “taxonomic string (col 1), and trait estimate (col 2), as illustrated in this example”:
# taxstring 16S rRNA count
k__Archaea; p__; c__; o__; f__; g__; s__ 1.57262
But the file “ssu_img40_gg201210.txt”, which is bundled with CopyRigter, contains two “tables” (two parts of the file) for taxonomy lokup. Here are their headers and some simple example rows:
The first “table” contains 2 columns, exactly as README describes (lines 1075178–1078113):
# tax_string 16S rRNA count
k__Archaea; p__; c__; o__; f__; g__; s__ 1.55415924023992
...
k__Bacteria; p__; c__; o__; f__; g__; s__ 2.45611068136988
...
The second “table” looks similar. It contains 3 columns (lines 1078115–1079495):
# taxonomy 16S rRNA count num_genomes
k__Archaea 1.46009 149
k__Bacteria 2.40183 2783
...
The two tables use different taxonomy format. 16S rRNA counts are different for the respective taxa, too.
If I manually delete the second “table” from the default data file, CopyRighter produces different results compared to the run when the intact data file is used. It means that CopyRighter uses the second “table” somehow. But the two “tables” look like mirroring each other: just 1) taxonomy and 2) copy numbers.
Could you please clarify:
1. What is the difference between “k__Archaea; p__; c__; o__; f__; g__; s__” and just “k__Archaea” and why 16S rRNA counts for them differ (1.55415924023992 and 1.46009, respectively)?
2. Should the second “table” be removed from the data file?
3. If it shouldn’t, then, I guess, 16S rRNA counts should be calculated in two different ways for “k__Archaea; p__; c__; o__; f__; g__; s__” and “k__Archaea”, shouldn’t them? And if so, what should be the difference?
It will be great if you can help me with this conundrum.
Thanks in advance,
Max Sikolenko