Hi -
Before I write my questions below, I would like to provide you a brief overview of what I am trying to achieve through popHealth 5.0.
I have a very large set of clinical documents from individual physicians and small clinics. The data set is a mix of Physician notes, Lab reports, Progress notes, Eye exam etc. All these documents are in image (TIFF / JPEG) format - means a payer went to the Physician's office, took a photograph of these documents and saved as TIFF / JPEG images. Some of them are faxed paper documents which are then scanned as JPEG image. This means that there are no QRDA documents. We converted all these image documents to its corresponding text and saved as PDF. There is a backlog of 20 million documents to be processed and analysed for eCQM measure compliance.
I am looking for some help from this forum to know the features / capabilities of popHealth 5.0, and the feasibility of using this software within the context of my scenario described above.
Questions:
- Can popHealth 5.0 apply Natural Language Processing (NLP) techniques on unstructured documents as in my case defined above?
- Can popHealth 5.0 distribute the processing across multiple nodes? For example, I have 10 VMs dedicated to process 20 million documents and each VM has 8 cores processor. Can popHealth distribute the measure processing job to all the 10 nodes to speed up the processing?
- If the answer to question (1) is 'No', then can you suggest how to use popHealth in such scenarios? Is there a roadmap to include NLP feature in future versions?
Looking forward to hear from you.
Many thanks,
Sekhar H