(This is moved from another topic as an off-topic post to here as a new topic).
Hi Gavin,
I ran into another problem after getting OUTPUT.tre for the 100 sample subset. For this command:
metagenome_pipeline.py -i feature-table.relFreq.no_first_line.tsv -m 16S_predicted.tsv -f EC_predicted.tsv -p 4 -o OUT_PREFIX
I got this error:
Traceback (most recent call last):
File "/home/ubuntu/miniconda2/envs/picrust2-dev/bin/metagenome_pipeline.py", line 6, in <module>
exec(compile(open(__file__).read(), __file__, 'exec'))
(...truncated...)
File "/home/ubuntu/downloads/picrust2/picrust2/util.py", line 246, in three_df_index_overlap_sort
"input files.")
ValueError: No sequence ids overlap between all three of the input files.
But I am sure the IDs overlap:
cut -f1 feature-table.relFreq.no_first_line.tsv | sort > cut -f1 feature-table.relFreq.no_first_line.tsv.f1.sort
cut -f1 16S_predicted.tsv | sort > 16S_predicted.tsv.f1.sort
cut -f1 EC_predicted.tsv | sort > EC_predicted.tsv.f1.sort
(then I manually deleted the header row, then: )
$ diff 16S_predicted.tsv.f1.sort EC_predicted.tsv.f1.sort #no output, meaning they are identical
$ diff EC_predicted.tsv.f1.sort feature-table.relFreq.no_first_line.tsv.f1.sort #no output, meaning they are identical, therefore 16S_predicted.tsv.f1.sort and feature-table.relFreq.no_first_line.tsv.f1.sort are also identical
What should I be fixing? One thing to note is all my sequence IDs are integers, and most of my sample IDs look like a decimal number (e.g. 12345.789), would these have confused metagenome_pipeline.py?
Thank you so much.