Re: CouchDB/CouchApps Database Group

3 views

Skip to first unread message

Gregory Kelleher

unread,

Nov 2, 2011, 12:16:33 AM11/2/11

to DB-architecture

Roger,

Your reply is impressive enough for a morning's work ... but done without coffee! It will take me rest of today to compose a suitable response, so i'll do two things to save labor and improve clarity: I'll intersperse my comments into your text; I'll forwarded to the group topic some email from a few weeks ago, written at the start of the architecture discussion.

The point of including the earlier messages is to put the current discussion into suitable context because I agree with you that it isn't the whole picture. In my comments I'll ignore your criticism of my narrow view of the subject (and look forward to suggestions for improvement).

Finally, in the interest of time and coherence, I'll defer some answers until I can present them in the broader context --tonight if I have the stamina. I think some of the requirements we agree upon are not effectively satisfied by the EMR per se or the RDBMS representation, and hope we can briefly address those concerns.

-- Greg

On Nov 1, 2011, at 11:16 AM, "Friedman, Roger (CDC/CGH/DGHA) (CTR)" <rd...@cdc.gov> wrote:

This is sort of a heady topic to take up before my first cup of coffee, but here goes.

I think Greg is totally off base with his view of medical records.

>>> the context of my remarks was a suggestion of taking the existing paper
>>> record as the starting point for an EMR

He acts as if the doctor is the only client of the system and the only use of the medical record is to remind the doctor of what has come before. But that is far from true. The EMR is also a tracking system for service delivery, counting all the activities taking place in the facility; it is a quality control system, allowing the measurement of outcomes of patients with similar diagnoses and comparison with targets; it is an epidemiological system, tracking the outbreak and spread of disease; it is a patient protection system, flagging contraindications or allergies or medication conflicts in prescribing; it is a physician management system, identifying cases where the doctors did and did not adhere to treatment standards, and hopefully reminding them what the right thing to do is. Physicians have a lot of things to remember -- in one study, one group of physicians used an EMR system with alerts and the other used the same system but without alerts; after three months, there were large, signficant differences in the administration of tests and prescribing of medications in the alerts; then the alerts were taken away and the differences disappeared. So it is not lack of knowledge or lack of desire to implement best practices that prevents their implementation, it is the fact that there is so much medical knowledge that even the best physicians can't know it all, and the EMR should give them an assist.

>>> I agree with the points in the previous paragraph.

In the 40 year history of medical informatics, a lot of work has been done to achieve a somewhat uniform representation of health care activities. A lot of this work has focused on standardizing vocabulary, identifying process steps and disease stages, providing a structure for medical records both paper and electronic. This has allowed the comparison of medical records from different facilities and promoted interoperability between systems. Not that there hasn't been work done in natural language processing, just that standardization had to come first before the problem became at all tractable. At present, the interoperability of systems is limited -- it is possible to exchange data elements. A current goal of medical informatics is allow machine understanding of the data elements at a higher level -- to correlate observations, to diagnose, to propose treatments, to identify malpractice automatically. But all of these efforts are limited to very small areas of practice, and attempts to extend them to practical utility have gone very slowly. Why is this? It is because of two characteristics of medical information. First, the corpus of medical information is growing and changing. The US National Library of Medicine has 11 people whose only job is to keep up with new medical terminology. Second, the level of detail needed by medical practitioners varies immensely. A general practitioner can identify a skin cancer; an oncologist can identify what type of tissue has become cancerous, the stages in its lifecycle that have been affected by the mutations, and its treatability by different modalities; today's cancer researchers and tomorrow's oncologists will know the particular mutations which have taken place and the vulnerabilities of each along with the consequences of targeting each on the rest of the body.

>>> So, given the meager results so far and the slow pace of development,
>>> wouldn't this be a good time to search for ways to serve the clinician while
>>> integrating the results (implementing decision support) without requiring
>>> the same system be suitable for development and evaluation of rules?

Greg seems to view the medical record as a collection of forms.

>>> more accurate to say that I see current data collection as
>>> as being accomplished through pre- defined forms

But actually, the forms are only an organization tool for the underlying observations. This becomes particularly clear when you look at more specialized visits. Consider, for example, an OPD visit by an adult whose main complaint is diarrhea; an antenatal care visit; an HIV care visit; and an antenatal care visit by an HIV-positive mother. All four will have their vital signs taken; any new patient will have a history taken; all four will undergo a physical exam but to a different extent; each will be asked screening questions relevant to their condition(s) and answers which could indicate abnormal conditions are followed up with testing; the last three will undergo progress checks to assure that their conditions are following the normal course. If the four visits were reduced to forms, there would be a lot of overlapping questions; the forms are only there to assure that the right questions are asked. And the questions may have multiple parallel answers (WARM, RED) or structured answers (SEVERE allergy to PENECILLIN) or semi-structured answers (MODERATE allergy to OTHER: Peanuts) or answers that contain various dimensions of detail (WOUND [LACERATION], [EXTREMITY], [HAND, LEFT, PALMAR]) . In this regard, it is important to note that medicine makes a lot of use of negative inference -- not just the consequences of a negative test, but the fact that a particular test or procedure was not done is a sign that it was contraindicated or that the patient's presentation did not support it.

>>> not sure there's an issue here

Manor asks what business logic OpenMRS implements. This is not quite what I meant by my remarks. OpenMRS parses the subject matter in standard ways. For example, the patient, patient address, patient relations, provider identification, encounter, observation and order objects correspond to message segments one finds in HL7. Many of the objects (including problem lists, allergy lists and programs) come from previous work on structuring medical records. This provides a fairly high level of confidence that there are not areas of medical practice which cannot be represented in the data model. The only "business logic" that is implemented directly by OpenMRS is the cardinality relationships between the objects. The more businessy logic (e.g. a person is considered cured of tuberculosis if s/he has a two negative tests at least a month apart without an intervening positive test) is found in the modules (e.g. tuberculosis module). To take advantage of this logic under the current design, these modules would require rework to expose web services beyond the basic CRUD model of OpenMRS. Logic rules and cohort definitions are another source of business rules which will probably also require the creation of additional web service functionality to access.

Greg seems to think that the OpenMRS data model is too complex. But given the characteristics of medical data, the KAV model of OpenMRS is almost obligatory. Maintenance of indexes indeed requires overhead, but given the nature of a typical search (an observation of a given concept, perhaps connected to an encounter of a particular type, perhaps related to a particular patient or set of patients), the searches can be done on the indexes alone. Contrast that to Couch, with the document corresponding to an encounter. The JSON of the document has to be parsed and the nodes representing observations have to be scanned for occurrences of the concept in question. This just has to be substantially slower.

>>> I am sure that ad hoc queries requiring new indices would be slow. But
>>> that should not be common need in the clinical system. The incrementally
>>> maintained indexes of CouchDB are designed for extremely efficient
>>> updates and revivals but are not fitting for the analytical load at which
>>> OpenMRS excels. That"s why the 2 systems were treated as filling
>>> different roles in the earlier discussions.
>>>
>>> It should be noted that there are other ways to search JSON stores.

I have to take issue with the "pros and cons."
· As for the "long development learning curve," as one who has been making the transition from .NET to OpenMRS's J2EE, I agree that there are complexities. However, it seems that most developers will be working at the UI level and accessing the OpenMRS DB via REST calls. This will require recreation of validation rules and input widgets but not contact with OpenMRS.
o Where there is a need to create new modules (e.g. lab) or add services to existing modules (e.g. TB), there will be a need for developers who understand the way modules interface, the use of concept maps and global properties to provide parameters to modules, etc.
o Depending on whether the maintenance of metadata is done inside or outside OpenMRS, there may be a need to develop additional OpenMRS administration screens.
o I am concerned that the HTML5/REST interaction could be slow because every object will have to be completely retrieved before it can be sent out over the REST interface. Consideration should be given to having a module exposing additional RESTful web services that can take advantage of Hibernate lazy loading by interacting at the API level.
· As for "questionable technology investments," I think you need to focus on what's there that you need rather than that which you don't. The entire scope of OpenMRS is available without knowing Groovy. It's just that some people like Groovy, and so we see some features designed to use it -- but not to the exclusion of other means. Similarly, OpenMRS supports Arden Syntax for writing logic rules. This query language is not widely used so far as I know, but someone in the funding stream was interested in it. I think in any open project you are going to find features that interest only a subset of users/developers but nonetheless get implemented.
· As for UI obsolescence, I think JSPs, ajax and jquery will be around for a long time; I fail to see how it's any barrier to development. But in any event it's irrelevant since you don't intend to use that part of OpenMRS anyway.
· As for backend scalability issues, I don't see any volume estimates for JSS. Nor do I think that adoption at other facilities need cause scalability issues; at a certain point, one sets up a new instance and communicates via HL7 where necessary; all this capability is in work. I think there is a lot of confusion of MySQL issues with OpenMRS issues. OpenMRS will run on Postgres and SQL Server and I believe Oracle, all of which allow clustering (and Postgres at least is faster indexing).
· As for not being designed as a point of care application, that just applies to the default webapp. A number of places are using OpenMRS for direct entry, and a number of tools have been created to allow direct entry and bypass the default webapp interface.
· As for the document approach being better for concept/observation data, I think that's wrong as described in the previous full paragraph.
· As for CouchDB views, they are like rebuilding an index every time you want to do a query and show the weakness of the document model.

>>> I think the above is wrong as described in the previous full comment.

· I don't see how anyone can say that CouchDB is more scalable under the constraint of a need for a singular master view representing the clinic. The improvement in scalability comes from replication under the assumption that no node needs all the data at the same time.

>>> i think this is seriously misleading. Or perhaps you are suggesting
>>> we start a thread on the CAP Theorem?

I find it astonishing to hear people say they don't like OpenMRS because it uses an HL7 input queue for some types of form entry so that the DB is not instantly up to date, while saying that replication with daily synchronization would be just fine using Couch.

>>> The answer to this needs to deferred for proper context. Here, I'll say:
>>> 1) I object to imposing additional processing and single points of failure
>>> without the promise of substantial (or any) benefit.
>>> 2) I don't think we've discussed the frequency of updates; might be continuous.
>>> 3). Not all data would be communicated via EMR updates (stat labs)

· Those who advocate Couch have an obligation to show a working application of similar complexity to that of the JSS EMR. In particular, it is important to see how fast Couch is in a metadata-driven application.

>>> Please explain this requirement.

Also, we need internationalization and a security model at least as good as that of OpenMRS (and there have been a lot of calls for even more granularity within the security model).

>>> I agree.

I agree that these approaches are so different that trying to combine them would not allow either to work to full advantage; they need to be considered as separate tracks. I think we can get a working, reliable EMR for JSS faster with OpenMRS than Couch.

Saludos, Roger

-- Greg

Bapa Rao

unread,

Nov 3, 2011, 2:11:31 PM11/3/11

to Raxa JSS EMR Database

All, I signed on to this mailing list yesterday at Daniel's invitation
and tried to take in the issues being discussed. I just saw Paul's
note as well. Below are my thoughts of the moment:

Just by way of reference, here is a link to a SIGMOD Record paper I
dug up for my own edification, by Rick Cattell on the general topic of
NoSQL--Relational DBMS tradeoffs:

www.sigmod.org/publications/sigmod.../pdfs/04.surveys.cattell.pdf

The one-line summary of Cattell's quite helpful paper would be, "it
all depends." Surprise indeed. We know that there are always
tradeoffs. My philosophy for this project would be: 1. take care of
the essentials (show-stoppers), and compromise drastically on the
desirables, so as to get the product into the field as fast as
possible. 2. Be prepared to do a more comprehensive v2 fairly soon
after release of v1, as soon as field performance data starts to
arrive.

Both couchDB and MySQL will do the job, in technical sense. The main
tradeoff that I see is in the amount of work needed to implement the
data model in CouchDB versus costly or degraded data model
extensibility in the MySQL. To me, other things, like potential loss
of data due to hanging transactions, or slow response time, are less
relevant to the MySQL vs couchDB debate: most hanging transactions in
our environment are probably due to flaky network connections, to
which CouchDB is also not immune, and I would worry about slow
response times for v1 when we actually get complaints from the field
(here again, network connectivity probably dominates architectural
choices as a cause of slowness).

Coming to extensibility, I am not clear on how crucial or how
frequently-occurring a requirement this is, from a business
perspective. I took a quick look at the openMRS data model and it
looks like it has the usual built-in hacks for schema extensibility;
do we need more than that? how often does the need arise in the field?
can we live with clunky relational DB schema extension process? I take
it for granted that we are professional enough and clever enough to
minimize or eliminate downtime and regression-test the code so that
nothing breaks due to schema changes. So then, should we put in
couchDB, in effect tossing out schemas, and the benefits of structure,
joins and so on (keep in mind that we are still going to have to code
in what amounts to schema and structure, and implement joins (using
mapreduce or whatever) through the backdoor anyway, with couchDB). The
answer here has to come from the business end--is clunky extensibility
a show stopper?

I am guessing that the biggest show stoppers from a data perspective
would be inconsistent or erroneous data, and loss of data, and
everything else, like speed, high availability and the like is
desirable to different degrees but not essential. What that means is
that we need to code to eliminate or maximally mitigate data errors,
inconsistency and loss of data.

These things considered, my vote is to keep to MySQL.

regards,

Bapa Rao

On Nov 1, 9:16 pm, Gregory Kelleher <g...@kelleher.cc> wrote:
> Roger,
>
> Your reply is impressive enough for a morning's work ... but done without coffee! It will take me rest of today to compose a suitable response, so i'll do two things to save labor and improve clarity: I'll intersperse my comments into your text; I'll forwarded to the group topic some email from a few weeks ago, written at the start of the architecture discussion.
>
> The point of including the earlier messages is to put the current discussion into suitable context because I agree with you that it isn't the whole picture. In my comments I'll ignore your criticism of my narrow view of the subject (and look forward to suggestions for improvement).
>
> Finally, in the interest of time and coherence, I'll defer some answers until I can present them in the broader context --tonight if I have the stamina. I think some of the requirements we agree upon are not effectively satisfied by the EMR per se or the RDBMS representation, and hope we can briefly address those concerns.
>
> -- Greg
>

Saptarshi Purkayastha

unread,

Nov 3, 2011, 3:28:44 PM11/3/11

to raxa-jss-e...@googlegroups.com

Hello all,

I have been sitting on the sidelines with this discussions because I don't really have any preference of what we use. Thus, wanted to hear the people who have preferential views first. I feel we have had both the sides, "document-based" and "relational" heard out and people might have read and tried both the document-based and relational by now... If you haven't, I think your views might be a little skewed towards emotions and feeling or what you have read about others feelings. Having tried both over the last month and trying to do EMR-tilted work, I feel somethings that I'd note in this email

If you look at existing world of EMRs, they are both "document-based" as well as "relational". Please don't mistake "document-based" as NoSQL (rather non-relational), but I rather mean CCD or CCR or HL7 CDA, all of which have standard representation as XML documents. These are fairly well-formed standards, but not really implemented systems and that I feel is what the challenge is. Roger has raised points about EMRs not to be represented as forms, but these are standards that proclaim documents to be basic unit of an EMR.

Somethings I've come to love about CouchDB/CouchApps required for our context:

1.) Easy to learn and develop two-tier apps compared to OpenMRS used Hibernate, Java, JSP, JavaScript

2.) Replication comes out-of-the-box and at a low-cost of development compared to OpenMRS sync module

3.) Hardware can be cheap or extremely light centrally and can be distributed across settings

4.) Uses JSON out-of-the-box and hence protocol is implementable across various platforms and lots of library choices

5.) A document can be passed onto a device and then worked on locally from the clinician's device as a webapp

Somethings I've come to love about OpenMRS required for our context:

1.) Medical dictionaries are nice and mappings available by the community

2.) Modules allow extensibility to the core and new features can be easily shared

3.) Code is well written, code-conventions are nice and there is enough documentation about design patterns on these Java frameworks

4.) Useful logic module that allows writing rules and playing with the stored data

Somethings I've not liked about CouchDB/CouchApps required for our context:

1.) No design patterns or examples of complex apps to be able to write maintainable code

2.) No clear i18n strategy

Somethings I've not liked about OpenMRS in our context:

1.) Bigger learning curve to write modules and code

2.) Existing user-interface (including available modules) don't provide a Point-of-Care system that doctors like to use (if that actually is possible??)

3.) Large Obs table is hard to maintain and needs lots of resources

4.) Not many logic rules shared or distributed part of the app, thus making the API basically a CRUD system over Hibernate. But there are business rules/validation that are covered in API methods

I still think the "middle path" is achievable and the reason for saying that is that we can create documents from the JSON representation of OpenMRS objects, as created by the webservices.rest module and persist them to CouchDB... These smaller JSON documents can then be assembled to be stored as a CCR document on CouchDB and will be the unit of exchange between instances. Our UI could then talk to OpenMRS Webservices.REST API, as well as if we have design patterns ready, have them implemented on CouchDB... and as time progresses, we can build better search/cohort/reporting tools inside of CouchDB or go to the UnQL world, whenever that is ready. Till then, lets make our UI work on OpenMRS

The "middle path" has something that works and I'd really like to see something work... Its already 3 months of talk and I find it frustrating that we don't have any working system.

---
Regards,
Saptarshi PURKAYASTHA

My Tech Blog: http://sunnytalkstech.blogspot.com
You Live by CHOICE, Not by CHANCE

--
You received this message because you are subscribed to the Google Groups "Raxa JSS EMR Database" group.
To post to this group, send email to raxa-jss-e...@googlegroups.com.
To unsubscribe from this group, send email to raxa-jss-emr-dat...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/raxa-jss-emr-database?hl=en.

Saptarshi Purkayastha

unread,

Nov 3, 2011, 3:38:17 PM11/3/11

to raxa-jss-e...@googlegroups.com

Ah, I've done the abbreviation crime again... Apologies for that. I meant the following in my previous email.

CCD - http://en.wikipedia.org/wiki/Continuity_of_Care_Document

CCR - http://en.wikipedia.org/wiki/Continuity_of_Care_Record

Some sample CCD and CCR files: http://www.macadamian.com/insight/healthcare_detail/sample_ccr_and_ccd_xml_files/

---
Regards,
Saptarshi PURKAYASTHA

My Tech Blog: http://sunnytalkstech.blogspot.com
You Live by CHOICE, Not by CHANCE

Reply all

Reply to author

Forward

0 new messages