Details on KneadData's human genome reference

360 views
Skip to first unread message

rachael...@gmail.com

unread,
Jul 25, 2017, 10:09:37 PM7/25/17
to kneaddata users
Hello,

I am using KneadData to trim reads and remove human genomic sequences. The large server I am working on already had the human genome (GRCh38) files indexed by bowtie2 so I was using these as my reference. However, bowtie2 failed to remove a large number of sequences when mapping to this reference, particularly human mitochondrial DNA.

Uncertain whether something was wrong with these files, I tried the KneadData version of the genome:

kneaddata_database --download human_genome bowtie2 $DIR


And this was far more successful at removing human reads, though I have not been able to find information about how this reference was generated for KneadData.

Which release of the human genome is this?
What version of bowtie2 was used to index the genome?
Is the file reduced in some way to prevent alignment to several regions and subsequent failure to map/remove the reads?

Regards,
Rachael

Brian Glenn St Hilaire

unread,
Apr 24, 2018, 2:10:26 PM4/24/18
to kneaddata users
Hello, 

I also have this issue and would like it to be addressed at least with a response. 

-Brian 

Rachael Lappan

unread,
Apr 25, 2018, 5:09:54 AM4/25/18
to kneadda...@googlegroups.com

Hi Brian,

I later emailed Lauren McIver individually - she said to me:

Hi Rachael - The database is built from the human genome reference GRCh37/hg19 . I am not sure the exact bowtie2 version used to index the database (likely v2.2). However, since it is not a large database you should be able to run with most of the recent bowtie2 versions (bowtie2 >=2.2 is not necessarily required, any version 2+ should work). I does not look like the reference was modified in any way prior to building the database. The sequences are the exact same as those for the reference. 

Apologies for not relaying this to the user group, and hope this helps. I ended up building a custom database myself with the latest human genome file - not reduced or modified before indexing.

Cheers,
Rachael
--
You received this message because you are subscribed to a topic in the Google Groups "kneaddata users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kneaddata-users/x5X0EKynHh0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kneaddata-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kneaddata-users/9af0b86d-1b30-470c-a344-5c077620f1c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages