Silva Version

2 views

Skip to first unread message

Pinkie Mclucas

unread,

Aug 3, 2024, 4:27:37 PM8/3/24

to pancrougiftest

I am having an issue using Silva to classify my reads. I used the following command
qiime feature-classifier classify-sklearn --i-classifier silva-132-99-515-806-nb-classifier.qza --i-reads allreps_rep-seq.qza --o-classification allrepstaxonomy.qza

My guess is this is just a minor file bookkeeping issue on your end - I triple-checked the linked data resources and all are fine - every one is the latest, trained with 0.22.1. Maybe try downloading again and using all new file names to help you keep your files clearly separated. Keep us posted!

I have been trying to find out the best way to perform the taxonomic assignment to my 16S rRNA V4V5 set of sequences. I have found 2 options so far, considering the fact that Naive Bayes classifiers trained on the region of the target sequences improves the taxonomic classification accuracy, as said here:

Use full-length pre-formatted SILVA reference sequence and taxonomy files already processed with RESCRIPt (found in Data Resources ). Then, use qiime feature-classifier extract-reads command to constrain it to V4V5 and re-train it with qiime feature-classifier fit-classifier-naive-bayes command. At this step I have got the same warning that many other users:

packages/q2_feature_classifier/classifier.py:101: UserWarning: The TaxonomicClassifier artifact that results from this method was trained using scikit-learn version 0.23.1. It cannot be used with other versions of scikit-learn. (While the classifier may complete successfully, the results will be unreliable.)
warnings.warn(warning, UserWarning)

Essentially, you should be downloading the files from the Data Resources page, that matches the version of QIIME you are using. Note the drop-menu in the upper left of the page. If you are indeed using the correct version, then that means something altered your environment that changed the version of scikit-learn.

Option 1 gives you more flexibility if you want it (e.g., if you disagree with the default options used for processing these databases you can start from scratch and choose your own filtering options).

I'm working on downloading the SILVA database to process my 18S data with QIIME. I understand that there are a few options here, but I'm having trouble understanding which one to choose. Is it best to download version 138 formatted with RESCRIPTr or version 132 on the SILVA website?

I forgot to mention that the choice of files depends on which tool you'd like to use to identify your sequences. The 'classifier' files are for use with feature-classifier classify-sklearn, while the sequence and taxonomy files are provided if you'd like to use either feature-classifier classify-consensus-vsearch or feature-classifier classify-consensus-blast to identify your reads.

The scikit-learn version (0.24.1) used to generate this artifact does not match the current version of scikit-learn installed (0.24.2). Please retrain your classifier for your current deployment to prevent data-corruption errors.

It looks like somehow you have gotten scikit-learn 0.24.2 installed in the environment that you are running QIIME 2 in. If we can get your environment sorted out, we should be able to get this working for you. With that as the aim, I have a few questions about your installation.

How did you update to QIIME 2 2021.4? If it was with the recommended installation in a new conda environment, do you have the correct environment activated? If you followed the instructions here it should be qiime2-2021.4. Did you manually install anything inside of this environment? Could you run conda list and post the results here?

Regarding how I updated QIIME 2 2021.4, I was following the recommended installation in the link and I know I have the correct environment activated. I can't remember having installed anything manually. I have attached the list. Conda_Environment.txt (32.4 KB)

Looking at your Conda environment, it looks like you do have scikit-learn 0.24.2 installed. Based on what you have said and looking at your environment, it is not possible to tell why this got installed. At this point you should try reinstalling QIIME 2 from scratch in a new environment. Start by uninstalling your old one completely. Deactivate the environment: conda deactivate. Then completely remove your old environment via: conda remove -n qiime2-2021.4 --all. After that run conda update conda and finally copy-paste the Conda install instructions again from here, being sure to select the appropriate platform.

I ran an install this morning on macOS that got the correct version installed, so it looks like something got mixed up on your machine during your previous install. A fresh install usually fixes it. Once you get QIIME 2 installed, go ahead and run your analysis up to this point before doing anything else, so that if this issue comes up again it is not from the installation of some other tool.

I feel the need to re-open the topic because I was not able to find a satisfactory solution for me. Discussion was between @DannyBoi97 and @SoilRotifer, which I hope will be able to help me once more.

I am trying to train a classifier myself, as I am not able to find classifiers for Silva 132, or better, I got one but Qiime 2 (I am using version 2023.2 in a singularity container) retrieve the error message:

Yeah, 132 is rather old at this point. The pre-trained classifiers on the QIIME 2 website are always for the latest available version of SILVA, so we have not released pre-trained classifiers for version 132 for a few years now I think.

thanks for your feedback!
I am now facing some troubles. I tried to build my own classifier with the above mentioned commands but it runs 19 hours and then gives me error like "no space left on device", which sounds odd for me as it is running in a 1TB environment.

As alternative, I then tried to follow the instructions given in the past to @DannyBoi97 about RESCRIPt, but as I try the first script mentioned (for which you as well where specifying the features), it gives me an error. Here it is:

That's bizarre. The SILVA database should not take up all that much space. BUT it sounds like you are maybe running this on a cluster and it is probably configured to use a temp directory that might have much more limited space. You can look into changing the temp directory, see other topics on the forum, e.g., this one for instructions:

The pre-trained SILVA 132 classifiers were trained using an older version of QIIME 2 (or more importantly an older version of scikit-learn), so would not be compatible with QIIME 2023.2. You would need to use an older version of QIIME 2 to run the SILVA 132 pre-trained classifiers.

About the reason I am trying to do this, it is that I am trying to compare some results with Silva 132, and Silva 138. I want to see what is different. Reading you last lines I unfortunately reach the conclusion that it would be better to scale down to older versions of to use Silva 132 at this point, am I correct?

RESCRIPt/get-silva-data should not be too space or RAM hungry, so it should be possible to get this running on your cluster once you sort out the tempdir issue OR run this locally to build the database and then upload to the cluster for the other steps (though the tempdir size issue will most likely constrain you even more with downstream steps). So I would recommend getting this sorted out to create a 132 database instead of giving up or downgrading to an older version of QIIME 2 (which would be a workable plan B).

Hi @Nicholas_Bokulich, in the end I succeeded (working with the technical assistance of my server) in making my own classifier without using RESCRIPt.
I was wondering whether I could share it somewhere, so that other researchers could find it in case of need. Of course is "made by me" but it could be better than nothing. Let me know, I would be glad to cooperate with community

I want to train my classifier using the new SILVA 138 release. I checked SILVA database, but the available formats are different from those of the previous releases. In other word, in 128, and 132, I downloaded a package of files from which I use silva_132_97_16S.fna by converting to qza, then consensus_taxonomy_7_levels.txt to generate a reference taxonomy.
I have attached the link of the new SILVA release and need to know which files should I download to train the classifier.
arb-silva.de ArchiveSILVA provides comprehensive, quality checked and regularly updated databases of aligned small (16S / 18S, SSU) and large subunit (23S / 28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya).

RESCRIPt an external plugin that you need to install. The very first link in the tutorial links you to the github page with the instructions. All you need to do is activate your qiime environment and skip right to the pip install command step, and go from there.

I am wondering if there is something wrong with the install? One last test you can try is to not set the parameters, and use the defaults (with and without the --p-include-species-labels option). That is, if you run the following:

One more thing that I forgot to mention, I created an environment within qiime environment to install rescript and other plugins (as in the noted website). That means every time I use these plugins, I have to activate both qiime, then rescript environment. Is that correct?

First create a conda environment and install relevant dependencies (you can skip this step and install RESCRIPt in a conda environment containing QIIME 2 2020.2+, which should contain the necessary dependencies/versions)

(2) Hence, I re-analysed tutorial files using Silva 138.1 for both align.seqs and classify.seqs. It worked perfectly. But, gave me different results. I understand differences but it was too different. Was this coming from different database for classify.seqs between tutorial and re-analysed myself? I just wonder if I analyze my own data, should I use silva 138.1 for align.seqs and RDP 18 for classify.seqs? or It is OK to use silva 138.1 for both align.seqs and classify.seqs?