Fwd: An EMR system using a document oriented database

30 views
Skip to first unread message

Manor Lev-tov

unread,
Nov 1, 2011, 9:41:15 PM11/1/11
to raxa-jss-e...@googlegroups.com


---------- Forwarded message ----------
From: Paul Biondich <pbio...@regenstrief.org>
Date: Tue, Nov 1, 2011 at 7:02 PM
Subject: Re: An EMR system using a document oriented database
To: Manor Lev-tov <manor...@gmail.com>
Cc: Burke Mamlin <bma...@regenstrief.org>


Hi Manor,

Thanks for the note.

Some have tried to develop document-based data stores for patient
record systems.  The one I know of the most is called Tolven.

There are obvious downsides and potential benefits to this design
style.  The most obvious downside is that document-based data stores
make it significantly more difficult to use atomic data... that is,
unless you're developing against a "one-document-per-patient" model.
If you're doing the typical design, with a document-per-clinical
encounter, then to summarize the data for a given patient requires a
lot of parsing and computation vs a granular relational model like
OpenMRS.  It also make decision support type applications (which also
work against discrete data) a lot more difficult to pull off in a
speedy way.

On the flip side, we're seeing a lot of good results from SOLR and
similar tools which work against a document-based paradigm.  It's not
clear to me how that kind of technology interfaces with a CouchDB like
database, but there might perhaps be some untapped power in that blend
of tools.

Why are you making this up front technology choice?  Is there some
sort of specific problem you are looking to solve with a
document-based data store?

Best,
-Paul

On Tue, Nov 1, 2011 at 5:20 AM, Manor Lev-tov <manor...@gmail.com> wrote:
> Hello Paul,
> I found your email on the OpenMRS site.  I am looking into implementing an
> EMR system using a document oriented databases (CouchDB, MangoDB, etc.) as a
> backend.  Given your experience, I was wondering if you had any thoughts on
> that, any potential pitfalls, and how much business logic is built into an
> EMR system such as openMRS?
> Thanks in advance for your time,
> Manor Lev-Tov
>

Daniel Pepper

unread,
Nov 1, 2011, 11:03:51 PM11/1/11
to raxa-jss-e...@googlegroups.com

Gregory Kelleher

unread,
Nov 2, 2011, 9:30:52 AM11/2/11
to raxa-jss-e...@googlegroups.com
In reference to Paul's note:

Lucerne/SOLR has been used for full text indexing of CouchDB databases.  However, CouchDB is schema-:optional rather than schema-less: you choose what data to atomize and whether to require it or validate it.  You also choose whether or not to include a field in one or more indexes and whether those indexes will maintain related summary data.

Of course it would be foolish to develop an EMR without any schema controlled fields.  For example, medical record numbers if entered by hand should be required on some documents and pass check-digit validation everywhere.  Every change to the database must pass all validation functions.  Following best practices for web applications, the same validations are applied on the client side.

On the other hand, it can take months create or change forms for OpenMRS because every field is a schema-controlled concept.  Using CouchDB, fields could be added to forms and reports by authorized users without even changing the schema and schema control could be added as needed.

-- Greg

--
You received this message because you are subscribed to the Google Groups "Raxa JSS EMR Database" group.
To post to this group, send email to raxa-jss-e...@googlegroups.com.
To unsubscribe from this group, send email to raxa-jss-emr-dat...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/raxa-jss-emr-database?hl=en.

Sreekanth Vadagiri

unread,
Nov 2, 2011, 11:06:12 AM11/2/11
to Raxa JSS EMR Database
Hi,
Just adding a little more comments on Paul's note.

Couchdb lucene is quite popular in the couchdb community and the
author is core commiter to couchdb. This is one of the powerful aspect
of storing data as text(json). Indexing is almost trivial, and not an
expensive operation if the data is stored in relational database.

https://github.com/rnewson/couchdb-lucene

or http://wiki.apache.org/couchdb/Full_text_search

I did not not quite understand the speedy retrival aspect of
conversation. What Couchdb is really good at is reading data from the
database. Especially because the views are computed incrementally. And
if no new documents have been added it will already have been
precomputed. (writes on the other hand are slower comparatively)

We have found in practice that new fields in documents are needed for
new features and affect other parts rarely. So having a set of core
things that are always there and then bolting new fields is a good way
to go.
In my experience i have had to run (migrations, or equivalent of it)
twice or thrice on the whole database.
But among the best practices in the couchdb views is to assume that
the field may not be present and handle all such situations

Cheers
gabbar.

On Nov 2, 6:30 pm, Gregory Kelleher <g...@kelleher.cc> wrote:
> In reference to Paul's note:
>
> Lucerne/SOLR has been used for full text indexing of CouchDB databases.  However, CouchDB is schema-:optional rather than schema-less: you choose what data to atomize and whether to require it or validate it.  You also choose whether or not to include a field in one or more indexes and whether those indexes will maintain related summary data.
>
> Of course it would be foolish to develop an EMR without any schema controlled fields.  For example, medical record numbers if entered by hand should be required on some documents and pass check-digit validation everywhere.  Every change to the database must pass all validation functions.  Following best practices for web applications, the same validations are applied on the client side.
>
> On the other hand, it can take months create or change forms for OpenMRS because every field is a schema-controlled concept.  Using CouchDB, fields could be added to forms and reports by authorized users without even changing the schema and schema control could be added as needed.
>
> -- Greg
>

Gregory Kelleher

unread,
Nov 2, 2011, 11:43:27 AM11/2/11
to raxa-jss-e...@googlegroups.com
Progress report on a master's project in CS on CouchDB for EMR. The slide show is interesting.

http://portal.sliderocket.com/AUPYX/Luscinia-A-CouchDB-powered-EMR-for-Android

-- Greg

Surajit Nundy

unread,
Nov 2, 2011, 12:23:16 PM11/2/11
to raxa-jss-e...@googlegroups.com
Dear all:

This is an enjoyable (for a geek like me!) and an expert discussion, thank you all. A little history - this line of query was initiated when the following advantages of a CouchDB implementation for our Raxa JSS EMR project were mentioned by Greg:-

1. fault-tolerance
2. offline device data storage and syncing
3. easy rollout of new software updates
4. horizontal scalability
5. fitness to the task of the changing nature of patient records
6. HTML/JS/CSS in Couchapps are easy to learn because they are the language of the Web
...

Since there is a big cost for us in adapting OpenMRS to a NoSQL database, it would be great to know what those costs are and if these characteristics are achievable in the current implementation of OpenMRS.

Again, thank you all for your participation.

Shuro

Roger

unread,
Nov 2, 2011, 4:37:45 PM11/2/11
to Raxa JSS EMR Database
Not to keep beating on Greg, but it's simply not true that "it can
take months to create or change" a form. The KAV data model does not
require schema changes to add fields, and adding a concept to an
existing form requires only (1) updating the list of form fields to
include the concept (done by drag-dropping the concept name onto a
tree control) and (2) making an appropriate change to the form
definition to add the appropriate widget and validation conditions
(which is done with a WYSIWYG editor in 2 of the 3 available form
generators).

As for the ability to add a concept, any user can be given that
privilege in OpenMRS. Initially, the approach was to add concepts as
necessary to represent existing forms. However, lately there has been
more emphasis on mapping form questions and answers to standard
vocabularies. The main reasons for this were (1) many local forms
were poorly designed, without consideration of data needs and use and
with too much emphasis on fitting the form on the page; and (2) by
using concepts which were identical (or at least mappable) to standard
vocabularies, it becomes possible to share among installations modules
that contain much more patient care knowledge, such as those for
tuberculosis, HIV, antenatal care, etc.

Gregory Kelleher

unread,
Nov 2, 2011, 5:00:58 PM11/2/11
to raxa-jss-e...@googlegroups.com
At AMPATH it does take months, hence it can.

-- Greg

Roger

unread,
Nov 2, 2011, 6:34:16 PM11/2/11
to Raxa JSS EMR Database
Gabbar --
The reason views are slower than indexes is that each document
will have to be unpacked from JSON into a DOM in order to do the emits
for each indexed field, while in a SQL database the data arrives in
rows which can be parsed using fixed offsets. By using metadata, most
indexes in OpenMRS involve longs, which are easy to handle; neither
the string representation of the long nor the underlying natural key
can be processed as quickly.
While the documents of interest might be encounters (because they
encapsulate by whom, when, where, for whom, in what process), the data
elements of interest are the observations; the number of observations
per encounter varies widely from implementation to implementation,
ranging from around 6 to around 100. As stated in another post of
mine, the typical query will be for observations of particular concept
on a single patient or set of patients. In most cases, the query will
also require data from patient attributes, which are in a second kind
of document. So suppose we have 100 patients in encounters per hour
(probably quite low for JSS); that's 600-10,000 new observations per
hour that we have to index. According to a seemingly reliable source
[0], "a long-running view query can run concurrently (without
interference) with the ongoing update of the DB. However, the query
need to wait for the completion of the view index update before seeing
the consistent result." Because view updating takes place in a lazy
fashion, it means the entire document database must be read to find
changed documents before it can delete the no-longer-valid entries and
add the new ones. And this has to take place before the query can
complete. Not only that, we have to fetch and unpack each selected
encounter document and the corresponding patient document to compute
any cross-document WHERE conditions. While the SQL database might end
up with more correlated subqueries than CouchDB, many of them can be
executed using only indexes. The non-lazy updating of indexes can
reduce the length of time the index is locked.
Saludos, Roger

[0] http://horicky.blogspot.com/2008/10/couchdb-implementation.html

Gregory Kelleher

unread,
Nov 2, 2011, 8:53:53 PM11/2/11
to raxa-jss-e...@googlegroups.com
Roger,

Your sketch of view generation and index maintenance reads like a verbatim translation of CouchDB features into RDBMS. CouchDB is fast to write and modify documents and very fast at retrieving documents and stats, but slow at re-indexing (aka compiling views). Some of its speed comes from the feature that protects its integrity: updates are append-only, so there's no locking.

Each document has a unique key and is stored under that key in a shallow B-tree structure. The index values of pre-defined views are computed on the contents of the document when it is stored and these views are also stored in a B-tree for fast access.

Now the real trick: index updates are incremental; only the new document needs to be parsed (plus the replaced doc, if any). Queries use the views, hence their speed. The slow process of regenerating views is required only when the view itself is redefined. It is because data analysis frequently requires new temporary views that CouchDB isn't being considered a replacement for OpenMRS as an analytic repository.

-- Greg

Sreekanth V

unread,
Nov 2, 2011, 11:47:25 PM11/2/11
to raxa-jss-e...@googlegroups.com
HI Roger,


You are right on the queries being blocked by view generation, and one
solution we've used is to preemptively build views via a daily/hourly
scheduled task.
In some cases we also explicitly specify stale=ok to avoid getting
hurt by view generation time. But this means we do not get fresh data,
but in some cases it is ok.
I am slightly worried about the number of writes seems like it's a
huge range. 1/4 million documents(at the high end) a day means that
we'll be stuck building views and compacting(otherwise just takes a
lot of diskspace). Is there a more precise range for number of new
observations added.
Also in my experience write times increases from a small number to a
high steady state number after a couple of million documents.
Some graphing from my benchmarking(obviously flawed and dumb, but just
a ball park.)
http://twitpic.com/5apy2t

If we have a few MRS observation reference document(s) in json, i
could run some scripts to give more representative
numbers on read/write/view generation time.

Nitpicking, but couch actually does not store the object in plain json
internally, but stores in erlang's serialized format, so it's not very
slow loading the object.
Most time really is spent in the http transport actually and not in
the deserialization. There have been reports in the wild of using
msgpack to reduce that time as well.

Modelling data in Couchdb is slightly harder because you have to know
what sort of queries are being used up front. My general rule is to
err on the side of flattened data model, where all the fields are in
one document. Only when the rate of read/writes are different and
usage scenarios are different then I start to break the documents.
Normalizing document does not buy us much in non relational world.
However there are scenarios that need a association relationship(a la
Joins) and Couchdb has a trick up it's sleeve with the Linked
documents
http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#Linked_documents

I have never used it in practice, but have conjured solutions to
certain problems using such a technique.

Hope i am making sense.
Cheers
sreekanth.

Gregory Kelleher

unread,
Nov 3, 2011, 12:09:27 AM11/3/11
to raxa-jss-e...@googlegroups.com
The number of index changes and writes seem to be greatly inflated by assuming indexing and retrieval of individual observations, and compounded by retrieval of specific obs from cohort groups. These are functions of the analytical repository or data warehouse, not the clinical EMR.

-- Greg

Manor Lev-tov

unread,
Nov 3, 2011, 4:22:21 AM11/3/11
to raxa-jss-e...@googlegroups.com
The large number of observations per encounter is part what I was alluding to when saying that document oriented databases are better suited for the concept/observation pattern than relational databases.  In the document oriented approach, adding an encounter could involve an addition of a single document.  In the relational approach adding an encounter is an insert for the encounter, plus an insert for each observation - so a total of approximately 7 to 100 inserts.  I believe that OpenMRS uses a separate transaction for each observation and there is a small overhead associated with each open and commit transaction that cumulatively could have a noticeable impact on throughput in a large scale deployment.
-Manor

Friedman, Roger (CDC/CGH/DGHA) (CTR)

unread,
Nov 3, 2011, 9:21:43 AM11/3/11
to raxa-jss-e...@googlegroups.com

@Gabbar --

                I think if you use the DB set up for the Stanford Hackathon you will be able to get a set of observations.  If not, you can download the standalone version, install it along with the sample data, install the REST WS module, and then you can access the data via REST.

                The reason for the large variability in the number of observations per encounter is variability in workflow.  Different installations give encounters different levels of granularity.  For example, an entire visit might be recorded on a single form entered as a single encounter.  Or entry might be at the point of service so a visit would comprise 6 smaller encounters.  Also, a form for an HIV visit might be 4 pages long while an observed TB drug session would only have one observation.

 

@Manor and Greg --

                Actually I think it is the indexing rather than the inserting that takes the time in both cases.  I think you might end up with fewer indexes overall in a CouchDB than in an SQL DB. 

                OpenMRS keeps some sub-objects unsaved in the data access object until their parent is saved.  Thus the person names, IDs and attributes are not saved until the person is saved; the transaction is around the savePatient method.  However, I believe observations are written immediately.  You can check the code through Fisheye at source.openmrs.org.

                There really is a need to retrieve observations to support the clinician.  That's the meaning of a *longitudinal

* medical record.  The doc wants to see the BMI or CD4 count or weight or vaccinations over the last 2 years.  And this does apply to sets of patients and not just a single patient -- the HIV program needs to know who hasn't showed up for their appointment, the TB program needs to be sure that the local strain is responding to medication, the ANC program needs to assign follow-up tasks to community health workers, the hospital infection control officer needs to see all cases of hospital-acquired infection.

 

@David --

                Thanks for the kind remarks.  I don't think my coeur is criing, as my intern this summer said, you either know it or you don't.

 

From: raxa-jss-e...@googlegroups.com [mailto:raxa-jss-e...@googlegroups.com] On Behalf Of Manor Lev-tov
Sent: Thursday, November 03, 2011 4:22 AM
To: raxa-jss-e...@googlegroups.com
Subject: Re: An EMR system using a document oriented database

 

The large number of observations per encounter is part what I was alluding to when saying that document oriented databases are better suited for the concept/observation pattern than relational databases.  In the document oriented approach, adding an encounter could involve an addition of a single document.  In the relational approach adding an encounter is an insert for the encounter, plus an insert for each observation - so a total of approximately 7 to 100 inserts.  I believe that OpenMRS uses a separate transaction for each observation and there is a small overhead associated with each open and commit transaction that cumulatively could have a noticeable impact on throughput in a large scale deployment.
-Manor

Paul Biondich

unread,
Nov 3, 2011, 9:26:21 AM11/3/11
to raxa-jss-e...@googlegroups.com
Hi everyone, Daniel invited me here to make a couple of comments on this thread.

First and foremost, my advice is to focus first and foremost on what the clinical and administrative needs are for your work.  IMO, I've been privy to tons of conversations about what the "right" technology is to use for a particular problem... but a large number of them aren't buttressed by what the actual needs are.  I'm as much of a voyeur/enthusiastic tinkerer of new technologies as anyone, but I've learned that starting with the hammer instead of the nails leads to all sorts of religious debates that aren't as productive as they could be.

Given that you all are working against resource constraints, you should justify why you're making the work choices that you are.  I find that there are a lot of people in this space that approach things as a way of scratching an intellectual itch. If that's important to you all, then say so and identify that... 

With all of that said, there are strengths and deficits of working against a document-based paradigm.  Whether those are tolerable, IMO, is a product of how you intend to gather, store, and use the data.  I trust that you all will make the decision that makes the most sense for you all, and I'll do what I can to support your community regardless of what technical decisions you make.

Best!
-Paul

Andrew Kanter

unread,
Nov 5, 2011, 10:48:19 AM11/5/11
to Raxa JSS EMR Database
Hi, everyone, I was also asked to add my two-cents to this discussion.
To be honest, the technical details are beyond my capacity to comment
in any beneficial way. However, I would state that one of the most
significant benefits of OpenMRS is the community and the data model
(at least for transactional database design). There are clear
implications for analysis and for synching, which has led to
collaborations with Pentaho and other reporting technologies. MVP has
also been intrigued by the CouchDB approach for managing the front-end
data collection and synching to our core OpenMRS database.

We are sharing a common data dictionary across all of our sites and
the need for interoperability (semantic and messaging) is high on our
list of requirements. We believe that we can leverage a lot of what
the OpenMRS community provides, including primarily using mobile
devices for point-of-care interactions for now. As a new UI is
available for interactions with the database, we will be even more
eager to take advantage of it.

For all of you, I think I would first spend a lot of time defining the
requirements for a successful application under your use case, and
then make a technical decision on architecture. Technology is always
changing and it will therefore never be a static decision. Choose a
maturity model which will allow you the most flexibility while
reducing the risk of making changes to the data.

I look forward to working with all of you regardless of what you
decide!

Andy Kanter
Millennium Villages Project
Columbia International eHealth Lab

Gregory Kelleher

unread,
Nov 5, 2011, 12:16:50 PM11/5/11
to raxa-jss-e...@googlegroups.com
Andy,

It's great to have reports from the frontline!

Does MVP a description of system requirements or wish-list of functionality for the front end that you could share with the group? It could provide a real-world focus for the discussion.

A few comments and questions interspersed.

Thanks.

-- Greg


On Nov 5, 2011, at 10:48 AM, Andrew Kanter <aka...@ei.columbia.edu> wrote:

> Hi, everyone, I was also asked to add my two-cents to this discussion.
> To be honest, the technical details are beyond my capacity to comment
> in any beneficial way. However, I would state that one of the most
> significant benefits of OpenMRS is the community and the data model
> (at least for transactional database design). There are clear

What do you mean by 'transactional' in this context and how is the feature used?

> implications for analysis and for synching, which has led to
> collaborations with Pentaho and other reporting technologies. MVP has
> also been intrigued by the CouchDB approach for managing the front-end
> data collection and synching to our core OpenMRS database.

How do you deploy software and updates to the mobile devices? Also, do you use device-specific native apps or device-independent (HTML5) web apps? Would it be helpful if the software deployment / updates could be an automatic feature of data upload or syncing?

> We are sharing a common data dictionary across all of our sites and
> the need for interoperability (semantic and messaging) is high on our
> list of requirements. We believe that we can leverage a lot of what

How well does the data dictionary or concept dictionary match the MVP's needs for analysis, decision support and reporting? Do you collect data for which the concepts are not yet standardized in your OpenMRS?

> the OpenMRS community provides, including primarily using mobile
> devices for point-of-care interactions for now. As a new UI is
> available for interactions with the database, we will be even more
> eager to take advantage of it.
>
> For all of you, I think I would first spend a lot of time defining the
> requirements for a successful application under your use case, and
> then make a technical decision on architecture. Technology is always
> changing and it will therefore never be a static decision. Choose a
> maturity model which will allow you the most flexibility while
> reducing the risk of making changes to the data.

Good advice. But also beware of legacy attachments and arguments invoking sunk costs. Revising or rewriting a system to make better use of its innovations is neither "starting from scratch" nor "throwing out 40 years of research". Our task is to find the best path from where we are today to where we want to be tomorrow. Sunk costs might be deined as the cost incurred in the past that we'd spend differently today to get where we want to be tonorrow.

Staying close to the mainstream (eg, HTML5) reduces the risk that we'll go down blind alleys.

> I look forward to working with all of you regardless of what you
> decide!
>
> Andy Kanter
> Millennium Villages Project
> Columbia International eHealth Lab
>

Andrew Kanter

unread,
Nov 6, 2011, 8:01:07 AM11/6/11
to Raxa JSS EMR Database
Greg,
Thanks... MVP is definitely working in poor connectivity environments.
We are looking to compare a cloud based solution with one which
involves a national or regional server with asynchronous communication
to front-line devices. I am concerned about the form factor for the
android devices for clinical data entry (ok for CHWs). I would like to
see a system which can provide OpenMRS functionality, but do it
asynchronously. The problem now is that Synch for OpenMRS is too much
work. That was why we liked CouchDB for devices but OpenMRS for the
backend. See my other answers AK> below.

On Nov 5, 11:16 am, Gregory Kelleher <g...@kelleher.cc> wrote:
> Andy,
>
> It's great to have reports from the frontline!
>
> Does MVP a description of system requirements or wish-list of functionality for the front end that you could share with the group?  It could provide a real-world focus for the discussion.
>
> A few comments and questions interspersed.
>
> Thanks.
>
> -- Greg
>
> On Nov 5, 2011, at 10:48 AM, Andrew Kanter <akan...@ei.columbia.edu> wrote:
>
> > Hi, everyone, I was also asked to add my two-cents to this discussion.
> > To be honest, the technical details are beyond my capacity to comment
> > in any beneficial way. However, I would state that one of the most
> > significant benefits of OpenMRS is the community and the data model
> > (at least for transactional database design). There are clear
>
> What do you mean by 'transactional' in this context and how is the feature used?

AK> There are different requirements for storing data, making it
accessible at Point of Care and for analyzing the data. The E-A-V data
model is awesome for designing data collection tools and reassembling
the data for display (encounters), but is really hard for doing
analysis and linking to traditional reporting tools. That is why a
data warehouse or star schema is preferred for those latter functions.
Anyone having to write reports from the EAV model will let you know in
an instant why this is difficult.

>
> > implications for analysis and for synching, which has led to
> > collaborations with Pentaho and other reporting technologies. MVP has
> > also been intrigued by the CouchDB approach for managing the front-end
> > data collection and synching to our core OpenMRS database.
>
> How do you deploy software and updates to the mobile devices?  Also, do you use device-specific native apps or device-independent (HTML5) web apps?  Would it be helpful if the software deployment / updates could be an automatic feature of data upload or syncing?

AK> Right now we are using ChildCount+ (rapidSMS) linking via xforms
to OpenMRS and Android ODK. Nothing custom, but HTML5 is very
interesting. Synching of updates to the apps is a no-brainer.

>
> > We are sharing a common data dictionary across all of our sites and
> > the need for interoperability (semantic and messaging) is high on our
> > list of requirements. We believe that we can leverage a lot of what
>
> How well does the data dictionary or concept dictionary match the MVP's needs for analysis, decision support and reporting?  Do you collect data for which the concepts are not yet standardized in your OpenMRS?

AK> Well, it matches MVP very well. The question is how well does it
match the other 25 organizations who are using/downloaded it. We want
to provide a service by having people use the same dictionary. This is
fundamental to semantic interoperability. There is no reason why we
have to create different concepts to say the same thing.

>
> > the OpenMRS community provides, including primarily using mobile
> > devices for point-of-care interactions for now. As a new UI is
> > available for interactions with the database, we will be even more
> > eager to take advantage of it.
>
> > For all of you, I think I would first spend a lot of time defining the
> > requirements for a successful application under your use case, and
> > then make a technical decision on architecture. Technology is always
> > changing and it will therefore never be a static decision. Choose a
> > maturity model which will allow you the most flexibility while
> > reducing the risk of making changes to the data.
>
> Good advice. But also beware of legacy attachments and arguments invoking sunk costs.  Revising or rewriting a system to make better use of its innovations is neither "starting from scratch" nor "throwing out 40 years of research".  Our task is to find the best path from where we are today to where we want to be tomorrow.  Sunk costs might be deined as the cost incurred in the past that we'd spend differently today to get where we want to be tonorrow.

AK> Good point, but my main point was that the design of the system
should assume that technology will need to change and might be
completely different (SMS->Android/Data...) but that we don't want to
have to rearchitect the data, significantly migrate the database or
cause major retraining of the people who have to use it. Think
workflows, standard data elements, outcomes.... not specific
technologies...
Reply all
Reply to author
Forward
0 new messages