optimizing parameters

499 views
Skip to first unread message

Lindsay Kalan

unread,
Mar 8, 2016, 6:23:00 PM3/8/16
to kneaddata users
Hi,

I am using kneadData to remove human contamination from samples that are >98% human. I have tried different human databases and bowtie2 parameters (using the very-sensitive-local alignment) but I'm still only getting about 50% decontamination. Are you able to suggest optimization parameters to work within kneadData to improve the output? 

Thanks,

Lindsay

Andy Shi

unread,
Mar 9, 2016, 1:27:08 PM3/9/16
to kneaddata users
Hi Lindsay,

We've encountered this problem before. I think one of the main issues is that there are a lot of human genetic sequences not in the human reference. 

One thing you can try is to run kneadData with TRF. TRF will remove tandem repeats, which are among the human genetic sequences that may not be represented in the human reference.

You can add TRF by including the --run-trf flag in kneadData. 

You'll need to download TRF from here:

and ensure that it's in your $PATH, or use the --trf flag to specify the location of the TRF executable. As a first try, you can try TRF with the default parameters, but if it doesn't work so well you can consider tuning some of the TRF arguments (run kneaddata -h for more info on those arguments).

Thank you for using our software!

Best wishes,
Andy

Lindsay Kalan

unread,
Mar 11, 2016, 10:54:16 AM3/11/16
to kneaddata users
Hi Andy,

Thanks. I will try this option but now I am running into further problems.

I am running kneadData in paired end mode and giving it a human reference database and bacterial reference database plus the TRF path.  I used bowtie2-build to create a bacterial bowtie2 database. The program automatically created large indexes and now when I run kneadData it is telling me that it "could not find reference database file bacterial_db.1bt2".  I think this is because the files have the extension .bt2l for large index.  

Can kneadData only use small indices from bowtie2? I had created another bowtie2 database previously with the small index .bt2 but when I give the program the path that database I get the same error message. 

Thanks,

Lindsay

Andy Shi

unread,
Mar 23, 2016, 12:50:26 PM3/23/16
to kneaddata users
Hi Lindsay,

Sorry for the late response---I was out of the office this past week.

KneadData does indeed support large indices. I think the problem might be that you misspecified the directory for the bowtie2 indices. Can you post the whole command that you used to run KneadData, as well as the contents of any directories referenced? That would help a lot in speeding up the debugging process.

Thanks,
Andy

Fan Li

unread,
Apr 13, 2016, 10:21:11 AM4/13/16
to kneaddata users
@Lindsay, you probably meant to specify "bacterial_db", e.g. without the .bt2 extension.

Relatedly, we've noticed issues removing human contaminant as well. Specifically, it seems like bowtie2 does a mediocre job of mapping repeat-derived (e.g. ERVs) read pairs in a concordant fashion. What ends up happening is that one read of the pair will be mapped to a copy of the repeat element on chromosome 7, and the other member of the pair will be mapped to the same element on a different chromosome. By definition, this is a discordant alignment, and so does not get excluded from the --un-conc output. 

IMO, I'd like to remove any read pair for which either member aligns to the human reference, so perhaps manual filtering from the SAM output of bowtie2 may be necessary. Andy, if you think this would be a useful feature, I can put in a pull request (otherwise I'll just hack it in elsewhere).

Best,
Fan

Andy Shi

unread,
Apr 19, 2016, 5:33:47 PM4/19/16
to kneaddata users
Hi Fan,

Thanks so much for your suggestion! We're thinking about including it in KneadData.

For now, maybe you can try running KneadData with the --trf option? We specifically included this to attempt to remove reads with lots of repeats.

Thanks again for your suggestion!

Fan Li

unread,
May 11, 2016, 11:19:20 AM5/11/16
to kneaddata users
Hi Andy,

In that case, I'll try to find some time to formalize the code a bit. Right now it's a mess of hacked together shell and perl. I also did try the --trf option but it didn't catch much more.

Best,
Fan

Drishti Kaul

unread,
Sep 5, 2018, 4:47:14 PM9/5/18
to kneaddata users
Hi! 
I've experienced the same issues and I was wondering if mapping with bmtagger instead of bowtie would be better for host contamination identification?

Thanks,
Drishti
Reply all
Reply to author
Forward
0 new messages