Optimal reads to get for ShortBRED

Joan Slonczewski

unread,

Nov 22, 2019, 4:49:57 PM11/22/19

to shortbred-users

We are sequencing water filtered metagenomes to count and compare ARGs from diverse environments.

What DNA sequencing would be optimal?

Would 2 million reads length 100bp suffice?

Would 20 million detect significantly higher counts?

Would 150-bp length detect more ARGs at higher resolution?

Any opinions would be appreciated.

Joan L. Slonczewski

Professor of Biology

Higley Hall, 202 N. College Road

Kenyon College

Gambier, OH 43022

http://biology.kenyon.edu/slonc/slonc.htm

slonc...@kenyon.edu

Phone: 740-427-5397

Text: 740-504-2215

Eric Franzosa

unread,

Nov 26, 2019, 12:52:24 PM11/26/19

to Joan Slonczewski, shortbred-users

Hi Joan,

In my opinion your read depth will have a much bigger impact on your results than increasing the read length from 100 to 150 nts. What read depth you target depends on how rare the features are that you're trying to find. I use the following figures as a guide for species detection:

https://docs.google.com/spreadsheets/d/1kgZWNnh7Kujrc0Yv-TRlhItO68woSfjIEnBMRNah7SQ/edit?usp=sharing

So, for example, if you wanted to detect a gene that was present in ~1 in 100 cells from your community (i.e. species with total relative abundance of 1%), you'd want at least 5M reads to expect 1x coverage of that gene. If the gene is 10x more rare (found in ~1 in a 1000 cells from your community), you'd want 10x the sequencing depth to have a good shot at seeing it. Thus 20M reads seems a much safer bet to me than 2M reads (among the options you gave).

Thanks,

Eric

--
You received this message because you are subscribed to the Google Groups "shortbred-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to shortbred-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/shortbred-users/4918a4b2-eb3b-44c0-b086-b3d2bfc9916d%40googlegroups.com.

Joan Slonczewski

unread,

Nov 26, 2019, 9:17:34 PM11/26/19

to shortbred-users

Thanks so much, Eric. I can see the literature agrees with you that 20M-40M would be better.

To unsubscribe from this group and stop receiving emails from it, send an email to shortbr...@googlegroups.com.

Joan Slonczewski

unread,

Dec 13, 2019, 4:34:11 PM12/13/19

to shortbred-users

Eric, is there any chance you could help out Daniel Barich on his question about the ShortBRED run with our new marker set?

He is trying hard to go through the software output, but it's hard for us to know which part of the pipeline is causing trouble.

Thanks,

Joan

Daniel Barich

Dec 3

Other recipients: slonc...@kenyon.edu

Hello,

I'm trying to run shortbred_identify. It runs for a while and then I get this error:

(python2) barichd@JOAN9000:~/mouse/shortbred$ shortbred_identify.py --goi protein_fasta_protein_homolog_model.adjusted.fasta --ref ../uniref-filtered-identity_0.9.fasta.gz

...

00:00:00 21 MB(100%) Iter 8 100.00% Refine biparts

00:00:00 21 MB(100%) Iter 9 100.00% Refine biparts
Making BLAST database for the family consensus sequences...
Making BLAST database for the reference protein sequences...
Traceback (most recent call last):
File "/home/barichd/miniconda3/envs/python2/bin/shortbred_identify.py", line 298, in <module>
"-dbtype", "prot", "-logfile", dirTmp + os.sep + "refdb.log"])
File "/home/barichd/miniconda3/envs/python2/lib/python2.7/subprocess.py", line 190, in check_call raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['makeblastdb', '-in', '../uniref-filtered-identity_0.9.fasta.gz', '-out', 'tmp93601575385787556/refdb/refdb', '-dbtype', 'prot', '-logfile', 'tmp93601575385787556/refdb.log']' returned non-zero exit status 1

--

Daniel Barich

Barich Assistive Technology

Gambier, OH 43022

On Tuesday, November 26, 2019 at 12:52:24 PM UTC-5, Eric Franzosa wrote:

To unsubscribe from this group and stop receiving emails from it, send an email to shortbr...@googlegroups.com.

Reply all

Reply to author

Forward