HDT generation update

13 views

Skip to first unread message

Bolton, Evan (NIH/NLM/NCBI) [E]

unread,

Dec 17, 2015, 8:40:13 AM12/17/15

to bio...@googlegroups.com

Hi,

As a part of the analysis of HDT, I am still generating HDT files (only 10% complete after several days). It seems that the processing is too slow for a single processor approach, especially when it comes to the PubChem neighboring files. Each of the PubChem neighboring files take several minutes each to process (rapper TDT.gz->NT conversion [60-120 seconds], rdf2hdt NT->HDT conversion [60-120 seconds], hdrSearch index generation [~1-10 seconds]). Most of the files are these neighboring files.

I will parallelize this (across many processors) for expediency.

Again, if the TDT.gz->HDT conversion step could be skipped, 1-2 minutes per file could be removed. Considering that there are 21,360 files, it would be a considerable time savings.

Best,

Evan

Evan Bolton, Ph.D.

National Center for Biotechnology Information

Bldg. 38A, Room 8S810

National Library of Medicine

National Institutes of Health

8600 Rockville Pike, Bethesda, MD 20894

Phone: 301-451-1811

Fax: 301-480-4559

Email: bol...@ncbi.nlm.nih.gov

Skype: evan_bolton

Reply all

Reply to author

Forward

0 new messages