Pre-built CLARK DB (bacteria + viruses) + split results

115 views
Skip to first unread message

Леонид Солнцев

unread,
Oct 10, 2016, 1:06:20 PM10/10/16
to CLARK Users

Is there any way to download the pre-build DB.
I've got only 128Gb on my HV server, so I've no possibility to build bases myself ;(

Also, is there any ready-to-use script for splitting input pair-ended fastq library according to classification results of CLARK?

Thanks in advance.

Rachid

unread,
Oct 10, 2016, 11:21:10 PM10/10/16
to CLARK Users
Hello!

Thank you for your interest. 
1) About the pre-built database, this is an excellent question, please let me know get back to you about this issue in a separate email soon.
For the moment, we are recommending users to build their own databases but it is possible that we may provide access to pre-built databases files to CLARK users soon.

However, please note that if you work with public data/samples, you can run the CLARK tools with the insideDNA platform (you can create an account and run your analysis with CLARK through their cloud-based solutions).

2) What exactly do you mean by "splitting input" paired-end reads "according to classification results of CLARK" ?
Could you please an example of you want to obtain? 
For example, say you have N paired-end reads as input, and so CLARK gives you N results, then what is the outcome you are looking for?

Thank you again for your interest!

Best,
Rachid

Леонид Солнцев

unread,
Oct 10, 2016, 11:58:21 PM10/10/16
to CLARK Users
1) About the pre-built database

I mean ref db of bacteria and virae from NCBI.  

For example, say you have N paired-end reads as input, and so CLARK gives you N results, then what is the outcome you are looking for?

After CLARK classification I obtain info, that reads 1, 2, 3 , 4 belongs to species A. Reads 5, 6, 7,8 -- to spesices B and so on.
And I split src fatsq files into parts according to this. As a result, I have fatsq files with reads, classified as "reads from species A"
with this subset I can do de novo assembly and all other things.

It is possible and correct? I'm newbe to NGS and metagen, so some my questions can soung rather silly^)
 

вторник, 11 октября 2016 г., 6:21:10 UTC+3 пользователь Rachid написал:

Rachid

unread,
Oct 14, 2016, 3:43:15 PM10/14/16
to CLARK Users
Hello,

I believe a solution for your analysis, is to:
1) filter the CLARK results (using confidence score and/or gamma). This first is optional but recommended to work with high confidence assignments.
2) sort the filtered results in 1) by using the "Assignment" column (column #3, for the default format, otherwise, it is the column #4, called "1st_assignment", for the other format without extended results, see README file).
3) extract the reads using the value in the column used in 2), because reads belonging to the same species or taxon will be grouped together in one block.

Best,
Rachid

Леонид Солнцев

unread,
Nov 7, 2016, 12:07:46 PM11/7/16
to CLARK Users

One more question.
"im making DB from Bacteria and Virae for CLARK using 20kmers.
According to manual, it needs 107GB RAM.
I have ubuntu 16.04 VM with 24vCPU (2xXEON on host machine) and 115 GB RAM

But I see sudh htop output. Is it normal or not?



понедельник, 10 октября 2016 г., 20:06:20 UTC+3 пользователь Леонид Солнцев написал:

Rachid OUNIT

unread,
Nov 8, 2016, 10:35:00 PM11/8/16
to Леонид Солнцев, CLARK Users
Hello,

I am afraid that for the installation and for building the database, you will need a server with more than 115 GB of RAM.

Yes, it is said in the README file that it should require about 107 GB for 20-mers for bacteria genomes only and for the version v1.1.1, which was posted about a year ago. We are now with the version v1.2.3. If you want to work with bacteria and viruses and using the latest version then please consider using a more powerful server (i.e. a server with 160 GB (or more) of RAM).

However, your alternative(s) is(are):
- to use smaller k (please try 19 or 18) .
- work with only bacteria, instead of bacteria+viruses
- use the insideDNA platform (https://insidedna.me/tools/page/CLARK) if you work with public data.

Cheers,
Rachid

--
You received this message because you are subscribed to the Google Groups "CLARK Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clarkusers+unsubscribe@googlegroups.com.
To post to this group, send email to clark...@googlegroups.com.
Visit this group at https://groups.google.com/group/clarkusers.
To view this discussion on the web visit https://groups.google.com/d/msgid/clarkusers/03af14cd-31df-4970-9e95-e569c0514f4f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages