Fasta file for Step 1

61 views
Skip to first unread message

John G

unread,
Oct 17, 2016, 11:51:57 AM10/17/16
to CLARK Users
Hi,

I successfully ran the non-space version of CLARK with your help. I am planning to now run it in --spaced mode. I am attempting to run the second step (Step 1), in which you run classify on the fasta file. Herein lies my question. I used your set_targets script for my original CLARK runs with bacteria and virus dbs at species level, which downloaded all the fna's for each bug into a separate folder for each fna. Can I somehow use the fasta files that your set_targets downloaded for input into classify at Step 1 (see below for step i am attempting to initiate)? Or am I completely interpreting the instructions incorrectly and am heading down the wrong path?

Step 1: Create the discriminative 31-mers of the database you have defined in step 0 (if they do not exist already). This can be done by running the default variant CLARK on the sample of your choice, for example: 
$ ./classify_metagenome.sh -O sample.fa -R result

Thanks,
John

Rachid OUNIT

unread,
Oct 19, 2016, 11:03:06 PM10/19/16
to John G, CLARK Users
Hello John,

Yes, you are in the right track, you can use any fastq/fasta file "of your choice". 
You can create an empty/dummy file if you want.

Best,
Rachid 

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.

--
You received this message because you are subscribed to the Google Groups "CLARK Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clarkusers+unsubscribe@googlegroups.com.
To post to this group, send email to clark...@googlegroups.com.
Visit this group at https://groups.google.com/group/clarkusers.
To view this discussion on the web visit https://groups.google.com/d/msgid/clarkusers/877942cd-a0d8-4f15-b44e-b87dbe583eeb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John G

unread,
Oct 20, 2016, 3:15:32 PM10/20/16
to CLARK Users, jgil...@tgen.org
What do you mean by "create an empty/dummy file"? I know how to create one, but you saying an empty fasta file? So is this step not necessary?

Rachid OUNIT

unread,
Oct 20, 2016, 5:44:34 PM10/20/16
to John G, CLARK Users
As indicated, this step is needed only if you have not built the database of specific 31-mers.

Best,
Rachid 

John G

unread,
Oct 20, 2016, 5:53:07 PM10/20/16
to CLARK Users, jgil...@tgen.org
That makes sense, but to my original question asked in a different way. When I ran set_targets and it downloaded the info from ncbi for bacteria/viruses, did your script also build the 31 mers? If not, where is the fasta file that I need for generating the 31 mers?

Rachid OUNIT

unread,
Oct 20, 2016, 6:04:40 PM10/20/16
to John G, CLARK Users
First, the reference sequences for the database are first downloaded. Second, the database of specific k-mers is built. That's why there is two different scripts.

John Gillece

unread,
Oct 20, 2016, 6:12:49 PM10/20/16
to Rachid OUNIT, CLARK Users
I understand that. So when I run the second script to generate the specific k-mers as defined in step 1 of the CLARK-S section (run $ ./classify_metagenome.sh -O sample.fa -R result), where do I point the -O given that I've downloaded the bacteria/viruses with your set_targets.sh?

John

Rachid

unread,
Nov 2, 2016, 2:36:50 PM11/2/16
to CLARK Users, rachid...@gmail.com
Hi John,

You point to the address of your file (any file you want for that step). The database you downloaded are stored in the folder you indicated with "set_targets.sh" (you can find the addresses of the database sequences in the file "targets.txt" in that directory).
Let me know if there is another matter otherwise, is it okay to mark complete this topic ?

Cheers,
Rachid

John Gillece

unread,
Nov 4, 2016, 12:19:05 PM11/4/16
to Rachid, CLARK Users
Hi Rachid,

Thanks! You can mark it as complete. I may have follow up questions, but can start a new thread.

You received this message because you are subscribed to a topic in the Google Groups "CLARK Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clarkusers/pY_dy8P8i74/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clarkusers+unsubscribe@googlegroups.com.

To post to this group, send email to clark...@googlegroups.com.
Visit this group at https://groups.google.com/group/clarkusers.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages