Optimal reads to get for ShortBRED

43 views
Skip to first unread message

Joan Slonczewski

unread,
Nov 22, 2019, 4:49:57 PM11/22/19
to shortbred-users
We are sequencing water filtered metagenomes to count and compare ARGs from diverse environments.
What DNA sequencing would be optimal?
Would 2 million reads length 100bp suffice?
Would 20 million detect significantly higher counts?
Would 150-bp length detect  more ARGs at higher resolution?
Any opinions would be appreciated.

Joan L. Slonczewski
Professor of Biology
Higley Hall, 202 N. College Road
Kenyon College
Gambier, OH 43022

Eric Franzosa

unread,
Nov 26, 2019, 12:52:24 PM11/26/19
to Joan Slonczewski, shortbred-users
Hi Joan,

In my opinion your read depth will have a much bigger impact on your results than increasing the read length from 100 to 150 nts. What read depth you target depends on how rare the features are that you're trying to find. I use the following figures as a guide for species detection:


So, for example, if you wanted to detect a gene that was present in ~1 in 100 cells from your community (i.e. species with total relative abundance of 1%), you'd want at least 5M reads to expect 1x coverage of that gene. If the gene is 10x more rare (found in ~1 in a 1000 cells from your community), you'd want 10x the sequencing depth to have a good shot at seeing it. Thus 20M reads seems a much safer bet to me than 2M reads (among the options you gave).

Thanks,
Eric



--
You received this message because you are subscribed to the Google Groups "shortbred-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to shortbred-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/shortbred-users/4918a4b2-eb3b-44c0-b086-b3d2bfc9916d%40googlegroups.com.

Joan Slonczewski

unread,
Nov 26, 2019, 9:17:34 PM11/26/19
to shortbred-users
Thanks so much, Eric. I can see the literature agrees with you that 20M-40M would be better.
To unsubscribe from this group and stop receiving emails from it, send an email to shortbr...@googlegroups.com.

Joan Slonczewski

unread,
Dec 13, 2019, 4:34:11 PM12/13/19
to shortbred-users
Eric, is there any chance you could help out Daniel Barich on his question about the ShortBRED run with our new marker set?
He is trying hard to go through the software output, but it's hard for us to know which part of the pipeline is causing trouble.
Thanks,
Joan


Daniel Barich
Dec 3
Other recipients: slonc...@kenyon.edu
Hello,

I'm trying to run shortbred_identify.  It runs for a while and then I get this error:

(python2) barichd@JOAN9000:~/mouse/shortbred$ shortbred_identify.py  --goi protein_fasta_protein_homolog_model.adjusted.fasta --ref ../uniref-filtered-identity_0.9.fasta.gz

...
00:00:00   21 MB(100%)  Iter   8  100.00%  Refine biparts
00:00:00   21 MB(100%)  Iter   9  100.00%  Refine biparts
Making BLAST database for the family consensus sequences...
Making BLAST database for the reference protein sequences...
Traceback (most recent call last):
  File "/home/barichd/miniconda3/envs/python2/bin/shortbred_identify.py", line 298, in <module>
    "-dbtype", "prot", "-logfile", dirTmp + os.sep +  "refdb.log"])
  File "/home/barichd/miniconda3/envs/python2/lib/python2.7/subprocess.py", line 190, in check_call    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['makeblastdb', '-in', '../uniref-filtered-identity_0.9.fasta.gz', '-out', 'tmp93601575385787556/refdb/refdb', '-dbtype', 'prot', '-logfile', 'tmp93601575385787556/refdb.log']' returned non-zero exit status 1

--
Daniel Barich
Barich Assistive Technology
Gambier, OH 43022

On Tuesday, November 26, 2019 at 12:52:24 PM UTC-5, Eric Franzosa wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to shortbr...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages