problems downloading sequences of ITS, small subunit RNA gene (mtSSU), and Tsr1

29 views
Skip to first unread message

Oscar Pérez

unread,
Oct 17, 2017, 7:04:07 AM10/17/17
to phyloGenerator Users
Dear William,


I am trying to compile a phylogeny of the Lecanorales lichens, using your fantastic program pG2 (v. 2.0.2). I have successfully compiled sequences for two (i.e. mcm7, rpb1) of my five target loci. The mining for the remaining three markers (i.e. ITS, small subunit RNA gene, and Tsr1) is not working, as pG2 is not retrieving sequences at all for these loci. 

My search terms for these three markers are "Tsr1 (aliases: TSR1, ribosome_biogenesis_protein_like)", "28S_ribosomal_RNA (aliases: 5.8S_ribosomal_RNA_gene_partial_sequence_internal_transcribed_spacer_2_complete_sequence_and_28S_ribosomal_RNA_gene, internal_transcribed_spacer_1)", and  "small_subunit_ribosomal_RNA (aliases: small_subunit_ribosomal, mtSSU)".

I know that there are available sequences for these markers in GenBank. Do you know perhaps why pG2 is not retrieving any of these? I am providing here the params.yml file, the species list, and the refseq files. I will be extremely grateful for any help or info you could provide.

Thank you and best wishes,


Oscar 
28S_ribosomal_refseq.fasta
params.yml
small_subunit_refseq.fasta
tsr1_refseq.fasta
A_E_mutualist_spp.txt

Will

unread,
Oct 17, 2017, 4:08:42 PM10/17/17
to Oscar Pérez, phyloGenerator Users
Hello Oscar,

Thanks for getting in touch, and thanks for using pG2!

It looks like you're trying to use all the sequence checking options (ref_min, ref_max, etc.), along with HawkEye, before you've even downloaded a few sequences and taken a look at them. Most times when users email me with problems like this, it's because (1) their criteria are too strict for the sequences on GenBank (there are sequences there, but none of them are 'good enough' according to your criteria) or (2) the reference alignments you're giving don't actually pass the criteria themselves.

When I modified your file to drop those sequence checking constraints (attached) I started downloading sequences immediately. Make sure you check both of the options above (almost everyone gets those wrong the first time, so don't worry about it) and then tell me how you get on.

Cheers,

Will

---

Need a phylogeny? Try phyloGenerator: original or new version
Measuring phylogenetic structure? Try install.packages('pez')

Will Pearse
Assistant Professor of Biology, Utah State University
Office: +1-435-797-0831; Room BR-139
Skype: will.pearse
params.yml
params.yml

Oscar Pérez

unread,
Oct 18, 2017, 1:29:12 AM10/18/17
to phyloGenerator Users
Hi Will,


Thanks so much for replying so soon, and for the insights! I was not aware I could turn on/off ref min, ref max and other parameters of the pipeline, and I am sorry for have bothered you with this basic query!

I did as you suggested, and now I have retrieved a lot more sequences for tsr1, mcm7, and rpb1. While this is really great, I am still having troubles with "Small subunit ribosomal RNA gene", and "ITS" (here, I have used already several variants of that marker, includes "internal_transcribed_spacer_1", "internal transcribed spacer 1", "internal-transcribed-spacer-1", "28S_ribosomal_RNA"). For these two markers, pG2 is not retrieving sequences at all (see below), where I know there are hundreds of seqs available (at least for ITS). 

phyloGenerator 2.0-2 DOI: 10.1111/2041-210X.12055

Please *use with caution*; first version!

Will Pearse - will....@gmail.com

 - Setup complete...

 - - small_subunit_ribosomal_RNA_gene: 0/2207 sequences found



Do you know perhaps what is the reason for this? My guess is that this is related to sequences annotations in NCBI for miscellaneous stuff that do not have a proper marker abbreviation, but not sure. 

Again, I will be very grateful for any help with these markers, as they are the most abundant for my group of interest.

Thanks a lot and sorry for bothering you again!

Best,


Oscar

Will

unread,
Oct 18, 2017, 10:51:14 AM10/18/17
to Oscar Pérez, phyloGenerator Users
Hello Oscar,

Thanks for this. Could you send me a link to one example each of a "small subunit ribosomal RNA gene" and "ITS" sequence for a species in your dataset? It sounds like, as you say, you're not putting in the annotations correctly, and I can help you if I have an example of what you would expect to find.

When running all these, remember there's a cache feature, so any sequences you find in searches now will be useful to you later.


Cheers,

Will

---

Need a phylogeny? Try phyloGenerator: original or new version
Measuring phylogenetic structure? Try install.packages('pez')

Will Pearse
Assistant Professor of Biology, Utah State University
Office: +1-435-797-0831; Room BR-139
Skype: will.pearse

Oscar Pérez

unread,
Oct 18, 2017, 12:20:46 PM10/18/17
to phyloGenerator Users
Hi Will,


Thanks a lot again for replying so soon. Here are the links to examples of sequences of small subunit ribosomal RNA and ITS:

1. Small subunit ribosomal DNA: https://www.ncbi.nlm.nih.gov/nuccore/AF140233
2. ITS: https://www.ncbi.nlm.nih.gov/nuccore/KM250204 or https://www.ncbi.nlm.nih.gov/nuccore/KT970617

I am also providing the params file I am using this time.

Thank you so much again for your help with this.

Best,


Oscar
params_v2.yml

Will

unread,
Oct 18, 2017, 2:00:47 PM10/18/17
to Oscar Pérez, phyloGenerator Users
Hello Oscar,

Thanks for this. You're right that this is an annotation problem on GenBank; these regions aren't quite so standardised in how they're described, which can make things difficult for programs like pG.

The solution is to turn off its annotation checking by setting fussy: false within the block for the gene you're working on. So something like:

ITS:
  max_dwn: 10
    aliases: [internal transcribed spacer 1, internal transcribed spacer 2]
    fussy: false
small_subunit_ribosomal_RNA:
    max_dwn: 10
    aliases: [small subunit ribosomal RNA, mtSSU]
    fussy: false

...should work for you. Let me know if that fixes your problem. I should warn, however, that doing this means there's a greater chance you'll get lower-quality sequences or a poorer alignment; there's not much I can do about that (beyond suggesting you use HawkEye and some kind of ref_file) at present.

Does that help?

Cheers,

Will

---

Need a phylogeny? Try phyloGenerator: original or new version
Measuring phylogenetic structure? Try install.packages('pez')

Will Pearse
Assistant Professor of Biology, Utah State University
Office: +1-435-797-0831; Room BR-139
Skype: will.pearse

Oscar Pérez

unread,
Oct 20, 2017, 6:53:31 AM10/20/17
to phyloGenerator Users
Hi Will,


It is working now!! Thanks a lot! I added the "fussy: false" argument to the script, and pG2 started to download tons of seqs for ITS and SSU (I got very few bad seqs but is not a big issue). This is perfect, and I am very grateful for your help!

Best wishes,


Oscar

Will

unread,
Oct 20, 2017, 11:51:31 AM10/20/17
to Oscar Pérez, phyloGenerator Users
No worries; glad I could help.


Cheers,

Will

---

Need a phylogeny? Try phyloGenerator: original or new version
Measuring phylogenetic structure? Try install.packages('pez')

Will Pearse
Assistant Professor of Biology, Utah State University
Office: +1-435-797-0831; Room BR-139
Skype: will.pearse
Reply all
Reply to author
Forward
0 new messages