HDT generation update

10 views
Skip to first unread message

Bolton, Evan (NIH/NLM/NCBI) [E]

unread,
Dec 17, 2015, 8:40:13 AM12/17/15
to bio...@googlegroups.com

Hi,

 

As a part of the analysis of HDT, I am still generating HDT files (only 10% complete after several days).  It seems that the processing is too slow for a single processor approach, especially when it comes to the PubChem neighboring files.  Each of the PubChem neighboring files take several minutes each to process (rapper TDT.gz->NT conversion [60-120 seconds], rdf2hdt NT->HDT conversion [60-120 seconds], hdrSearch index generation [~1-10 seconds]).  Most of the files are these neighboring files.

 

I will parallelize this (across many processors) for expediency.

 

Again, if the TDT.gz->HDT conversion step could be skipped, 1-2 minutes per file could be removed.  Considering that there are 21,360 files, it would be a considerable time savings.

 

 

Best,

Evan

 

--

 

Evan Bolton, Ph.D.

National Center for Biotechnology Information

Bldg. 38A, Room 8S810

National Library of Medicine

National Institutes of Health

8600 Rockville Pike, Bethesda, MD  20894

 

Phone:  301-451-1811

Fax:    301-480-4559

Email:  bol...@ncbi.nlm.nih.gov

Skype:  evan_bolton

 

Reply all
Reply to author
Forward
0 new messages