ValueError: No sequence ids overlap between all three of the input files.

Jamie Kwok

unread,

Jun 12, 2018, 11:40:36 PM6/12/18

to picrust-users

(This is moved from another topic as an off-topic post to here as a new topic).

Hi Gavin,

I ran into another problem after getting OUTPUT.tre for the 100 sample subset. For this command:

metagenome_pipeline.py -i feature-table.relFreq.no_first_line.tsv -m 16S_predicted.tsv -f EC_predicted.tsv -p 4 -o OUT_PREFIX

I got this error:

Traceback (most recent call last):

File "/home/ubuntu/miniconda2/envs/picrust2-dev/bin/metagenome_pipeline.py", line 6, in <module>

exec(compile(open(__file__).read(), __file__, 'exec'))

(...truncated...)

File "/home/ubuntu/downloads/picrust2/picrust2/util.py", line 246, in three_df_index_overlap_sort

"input files.")

ValueError: No sequence ids overlap between all three of the input files.

But I am sure the IDs overlap:

cut -f1 feature-table.relFreq.no_first_line.tsv | sort > cut -f1 feature-table.relFreq.no_first_line.tsv.f1.sort

cut -f1 16S_predicted.tsv | sort > 16S_predicted.tsv.f1.sort

cut -f1 EC_predicted.tsv | sort > EC_predicted.tsv.f1.sort

(then I manually deleted the header row, then: )

$ diff 16S_predicted.tsv.f1.sort EC_predicted.tsv.f1.sort #no output, meaning they are identical

$ diff EC_predicted.tsv.f1.sort feature-table.relFreq.no_first_line.tsv.f1.sort #no output, meaning they are identical, therefore 16S_predicted.tsv.f1.sort and feature-table.relFreq.no_first_line.tsv.f1.sort are also identical

What should I be fixing? One thing to note is all my sequence IDs are integers, and most of my sample IDs look like a decimal number (e.g. 12345.789), would these have confused metagenome_pipeline.py?

Thank you so much.

Best regards,

Jamie

Gavin Douglas

unread,

Jun 13, 2018, 8:30:26 AM6/13/18

to picrus...@googlegroups.com

Hey Jamie,

Would you mind send me your input files privately? Or at the very least the heads of these files? I bet the problem is that the floats aren’t being read in consistently, but I’m not sure. In my experience it’s better to have “tidy” sample names that begin with letters to avoid these kinds of errors.

Best,

Gavin

--
You received this message because you are subscribed to the Google Groups "picrust-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to picrust-user...@googlegroups.com.
To post to this group, send email to picrus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gavin Douglas

unread,

Jun 20, 2018, 2:29:09 PM6/20/18

to picrus...@googlegroups.com

Hey Jamie,

Thanks for pointing this out and troubleshooting the error! The issue was indeed that the rownames were being interpreted as integers and not strings by the metagenome_pipeline.py script. Adding a string to the start of the rownames does get around this problem, but I made a small update to avoid this issue in the future.

The commit is here: https://github.com/picrust/picrust2/commit/d13e15002ca37d2e3b3c15b43a2fc461f8ccac0b