beta_significance.py: ValueError: No valid samples/environments found

118 views
Skip to first unread message

Sebastian Lau

unread,
Dec 17, 2015, 4:05:46 AM12/17/15
to Qiime 1 Forum

Hey, I wanted to perform unifrac on my fungal LSU data. I used make_phylogeny.py to generate a compatible tree for my biom file. However ,I got this errror message:


beta_significance.py -i fung.biom -t temp_aligned.tre -s unweighted_unifrac -o uw_sig.txt

Traceback (most recent call last):

  File "/macqiime/bin/beta_significance.py", line 4, in <module>

    __import__('pkg_resources').run_script('qiime==1.9.0', 'beta_significance.py')

  File "/macqiime/lib/python2.7/site-packages/setuptools-12.2-py2.7.egg/pkg_resources/__init__.py", line 698, in run_script

    

  File "/macqiime/lib/python2.7/site-packages/setuptools-12.2-py2.7.egg/pkg_resources/__init__.py", line 1616, in run_script

    

  File "/macqiime/lib/python2.7/site-packages/qiime-1.9.0-py2.7.egg/EGG-INFO/scripts/beta_significance.py", line 210, in <module>

    main()

  File "/macqiime/lib/python2.7/site-packages/qiime-1.9.0-py2.7.egg/EGG-INFO/scripts/beta_significance.py", line 122, in main

    raise ValueError(e.message + " and that the otu abundance is"

ValueError: No valid samples/environments found. Check whether tree tips match otus/taxa present in samples/environments and that the otu abundance is not relative.


I checked the tree tips and otu names and didn't see any anomalies. I found this command worked fine on my 16S data. 

Thanks.

fung.biom
temp_aligned.tre

Colin Brislawn

unread,
Dec 17, 2015, 11:46:33 AM12/17/15
to Qiime 1 Forum
Hello there,

Thanks for getting in touch with us.
make_phylogeny.py to generate a compatible tree for my biom file
I'm guessing something went wrong at this step. Qiime sometimes gets nervous when it sees non 16S data. Can you post the full series of scripts you ran? I usually align my rep_set.fna and filter it before tree building.


Colin

Sebastian Lau

unread,
Dec 18, 2015, 9:23:50 AM12/18/15
to Qiime 1 Forum
Hello Colin, here are the scripts I used:

align_seqs.py -i temp.fasta -m muscle -o alitemp/

make_phylogeny.py -i temp_aligned.fasta

beta_significance.py -i fung.biom -t temp_aligned.tre -s unweighted_unifrac -o uw_sig.txt

Thanks

Colin Brislawn

unread,
Dec 18, 2015, 11:23:06 AM12/18/15
to Qiime 1 Forum
After running align_seqs.py, you should run filter_alignment.py 

Then you feed in the filtered alignment into make_phylogeny.py

That should fix it!
Colin

Colin Brislawn

unread,
Dec 18, 2015, 6:55:07 PM12/18/15
to Qiime 1 Forum
Hey Sebastian,

One of the qiime devs just mentioned to me that there could be a problem with temp.fasta, and I wanted to check back.

If I may ask, how did you make temp.fasta? This could have a pretty big effect on the quality of alignment and of the resulting tree.

Thanks!
Colin

Sebastian Lau

unread,
Dec 18, 2015, 10:43:55 PM12/18/15
to Qiime 1 Forum
Thanks for the reply Colin.

The temp_fasta is comprised by the representatives of each OTU. The OTUs were picked by UPARSE pipeline.
The sequencing method was 454 as my amplicons were at 600bp length. But only first 300bp were kept with the consideration of sequencing quality. Then I used a hyper variable region extractor (VXtractor) to extract the D1 region that I was interested in.

Sebastian Lau

unread,
Dec 18, 2015, 10:55:20 PM12/18/15
to Qiime 1 Forum
Hey, I checked the document of filter_alignment.py. But it seems to work with pyNAST alignment only. My sequences are fungal 28S rRNA. I tried pyNAST algorithm with customized database but all I got was just frustration. The filter_alignment also needs a lanemask yet I don't have one for 28S rRNA. I am not sure how I can get started with filter_alignment

Colin Brislawn

unread,
Dec 19, 2015, 7:25:14 PM12/19/15
to Qiime 1 Forum, William Walters, Jose Navas
Ah ok. Thank you for the information.

I think your processing steps seem very defensible. UPARSE is good for 454 data and a muscle MSA is a good fit for the 28S region (better than pyNAST + filter_alignment.py which does needs a database and lanemask).

Maybe a qiime dev can take a look at this?

Colin

Colin Brislawn

unread,
Dec 20, 2015, 4:50:50 PM12/20/15
to Qiime 1 Forum, william....@gmail.com, josenav...@gmail.com
Hello Sebastian,

We are still looking for the problem. To help us out, can you validate all your files for us?

We will let you know when we make progress.

Thanks!
Colin

Sebastian Lau

unread,
Dec 21, 2015, 10:40:31 AM12/21/15
to Qiime 1 Forum, william....@gmail.com, josenav...@gmail.com
Hey Colin, I tried the script but it seemed the mapping file was always wrong because I left the Barcode and Linker empty as the script document suggested...

Colin Brislawn

unread,
Dec 21, 2015, 12:41:24 PM12/21/15
to Qiime 1 Forum
Try running the script again while passing -b and -p to disable to barcode and primer checks. 

I'm most interested in what this script has to say about the temp.fasta file and tree file.

Thanks for working through this with us.
Colin
Message has been deleted

Sebastian Lau

unread,
Dec 21, 2015, 10:05:17 PM12/21/15
to Qiime 1 Forum
Hey Colin, the report is quite long because it indicates all the 72 OTUs in my fasta file are not in the tree while all the 72 OTUs in tree file are not in the fasta file. I compared the name lists in Excel and I couldn't say there's any difference. 

 validate_demultiplexed_fasta.py -i temp.fasta -m mapping_corrected.txt -b -p -t /Users/FLFLFLLF/Desktop/geochip2isme/alitemp/temp_aligned.tre -e 


# fasta file temp.fasta validation report

Percent duplicate labels: 0.000

Percent QIIME-incompatible fasta labels: 1.000

Percent of labels that fail to map to SampleIDs: 1.000

Percent of sequences with invalid characters: 0.000

Percent of sequences with barcodes detected: 0.000

Percent of sequences with barcodes detected at the beginning of the sequence: 0.000

Percent of sequences with primers detected: 0.000

Fasta label/tree tip exact match report

The following labels were not in tree tips:

fung.OTU.83

fung.OTU.68

fung.OTU.81

fung.OTU.80

fung.OTU.101

fung.OTU.86

fung.OTU.85

fung.OTU.82

fung.OTU.61

fung.OTU.60

fung.OTU.89

fung.OTU.62

fung.OTU.65

fung.OTU.118

fung.OTU.67

fung.OTU.66

fung.OTU.46

fung.OTU.93

fung.OTU.41

fung.OTU.40

fung.OTU.24

fung.OTU.111

fung.OTU.26

fung.OTU.38

fung.OTU.49

fung.OTU.48

fung.OTU.96

fung.OTU.119

fung.OTU.84

fung.OTU.104

fung.OTU.112

fung.OTU.102

fung.OTU.99

fung.OTU.116

fung.OTU.22

fung.OTU.106

fung.OTU.91

fung.OTU.92

fung.OTU.88

fung.OTU.94

fung.OTU.95

fung.OTU.78

fung.OTU.79

fung.OTU.98

fung.OTU.77

fung.OTU.110

fung.OTU.75

fung.OTU.72

fung.OTU.70

fung.OTU.71

fung.OTU.33

fung.OTU.56

fung.OTU.57

fung.OTU.36

fung.OTU.51

fung.OTU.34

fung.OTU.53

fung.OTU.10

fung.OTU.12

fung.OTU.120

fung.OTU.59

fung.OTU.113

fung.OTU.17

fung.OTU.8

fung.OTU.105

fung.OTU.2

fung.OTU.1

fung.OTU.6

fung.OTU.7

fung.OTU.4

fung.OTU.5

fung.OTU.103

The following tips were not in fasta labels:

'fung.OTU.106'

'fung.OTU.86'

'fung.OTU.80'

'fung.OTU.96'

'fung.OTU.104'

'fung.OTU.111'

'fung.OTU.4'

'fung.OTU.2'

'fung.OTU.53'

'fung.OTU.98'

'fung.OTU.113'

'fung.OTU.83'

'fung.OTU.56'

'fung.OTU.68'

'fung.OTU.72'

'fung.OTU.89'

'fung.OTU.40'

'fung.OTU.94'

'fung.OTU.41'

'fung.OTU.102'

'fung.OTU.82'

'fung.OTU.65'

'fung.OTU.48'

'fung.OTU.12'

'fung.OTU.10'

'fung.OTU.51'

'fung.OTU.36'

'fung.OTU.99'

'fung.OTU.75'

'fung.OTU.67'

'fung.OTU.46'

'fung.OTU.6'

'fung.OTU.70'

'fung.OTU.84'

'fung.OTU.59'

'fung.OTU.110'

'fung.OTU.38'

'fung.OTU.5'

'fung.OTU.112'

'fung.OTU.1'

'fung.OTU.93'

'fung.OTU.91'

'fung.OTU.62'

'fung.OTU.81'

'fung.OTU.79'

'fung.OTU.85'

'fung.OTU.119'

'fung.OTU.57'

'fung.OTU.118'

'fung.OTU.24'

'fung.OTU.101'

'fung.OTU.22'

'fung.OTU.92'

'fung.OTU.66'

'fung.OTU.105'

'fung.OTU.26'

'fung.OTU.34'

'fung.OTU.116'

'fung.OTU.103'

'fung.OTU.95'

'fung.OTU.49'

'fung.OTU.71'

'fung.OTU.77'

'fung.OTU.60'

'fung.OTU.120'

'fung.OTU.88'

'fung.OTU.33'

'fung.OTU.7'

'fung.OTU.8'

'fung.OTU.61'

'fung.OTU.17'

'fung.OTU.78'

Colin Brislawn

unread,
Dec 22, 2015, 1:35:38 AM12/22/15
to Qiime 1 Forum
Hey Sebastian,

Percent QIIME-incompatible fasta labels: 1.000
Qiime is really picky about what characters the you can use in your fasta labels and SampleIDs because the various programs that qiime calls upon have problems with different characters. In this case, qiime is worried that the - and _ characters in your OTU IDs.

Try this: go back to your original temp.fasta file and regenerate it such that your OTU IDs (and also sampleIDs) contain only letters, numbers, and periods ('.'s). Then repeat OTU picking with UPARSE, read mapping with the usearch program, and treebuilding with qiime. Repeating all these steps will make sure that all the IDs match between files and will only know about the safe characters you used in your new temp.fasta file. 

(A few qiime devs and I have been talking about it. Basically, it's not your fault. Various programs qiimes uses interpreted sampleIDs and OTU IDs in inconsistent and poorly documented ways, leading to strangeness. You did everything right and that seqs.fna file should work, but changing these characters to periods will avoid the weaknesses of other programs.)

Sorry for these strange, lingering issues. Thanks for working through this with me!
Colin



Colin Brislawn

unread,
Dec 22, 2015, 1:38:37 AM12/22/15
to qiime...@googlegroups.com
Hey, I should have asked this sooner, but how did you go about building your fung.biom file from temp.fasta and your original seqs.fna demultiplexed file? Mapping reads to OTUs is an important step of the UPARSE pipeline, so I should have asked about it sooner.

Colin

Sebastian Lau

unread,
Dec 22, 2015, 6:11:21 AM12/22/15
to qiime...@googlegroups.com
Hello Colin, I used uparse7 instead of uparse8 so the generating OTU table part was  different. 
Basically, uparse7 won't output biom file but only classic OTU table and OTU.fasta (it likes to name the otu by OTU_1..OTU_100) 
The biom files I used were subsequently generated by biom convert from classic OTU table.
Then everything was done in qiime.

I regenerated the fasta names by removing all '.'. Then I tried a new tree building method "raxml_V730", suprisingly, the script generated a tree with tips named in seq1,seq2,seq3..., which totally went against my OTU_id.

Colin Brislawn

unread,
Dec 22, 2015, 12:44:49 PM12/22/15
to Qiime 1 Forum
Oh yeah, his documentation has changed recently. I also have used usearch 7 and I used a command like this
usearch -usearch_global all_reads.fa -db otus.fa -strand plus -id 0.97 -uc map.uc
Then I parse that map.uc file into a text file, then into a .biom file.

Did you do a step like that? 

I've never used raxml. Do your new OTU IDs work fine with FastTree (the qiime default)?

Colin

Sebastian Lau

unread,
Dec 22, 2015, 9:16:37 PM12/22/15
to Qiime 1 Forum
Yes, I did that step, which was critical for generating classic OTU table by counting the abundance of each OTU in each sample.

 I test other tree building methods, only fasttree would not mess up the sequence names. 
Reply all
Reply to author
Forward
0 new messages