--
You received this message because you are subscribed to the Google Groups "Raxa JSS EMR Database" group.
To post to this group, send email to raxa-jss-e...@googlegroups.com.
To unsubscribe from this group, send email to raxa-jss-emr-dat...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/raxa-jss-emr-database?hl=en.
http://portal.sliderocket.com/AUPYX/Luscinia-A-CouchDB-powered-EMR-for-Android
-- Greg
This is an enjoyable (for a geek like me!) and an expert discussion, thank you all. A little history - this line of query was initiated when the following advantages of a CouchDB implementation for our Raxa JSS EMR project were mentioned by Greg:-
1. fault-tolerance
2. offline device data storage and syncing
3. easy rollout of new software updates
4. horizontal scalability
5. fitness to the task of the changing nature of patient records
6. HTML/JS/CSS in Couchapps are easy to learn because they are the language of the Web
...
Since there is a big cost for us in adapting OpenMRS to a NoSQL database, it would be great to know what those costs are and if these characteristics are achievable in the current implementation of OpenMRS.
Again, thank you all for your participation.
Shuro
-- Greg
Your sketch of view generation and index maintenance reads like a verbatim translation of CouchDB features into RDBMS. CouchDB is fast to write and modify documents and very fast at retrieving documents and stats, but slow at re-indexing (aka compiling views). Some of its speed comes from the feature that protects its integrity: updates are append-only, so there's no locking.
Each document has a unique key and is stored under that key in a shallow B-tree structure. The index values of pre-defined views are computed on the contents of the document when it is stored and these views are also stored in a B-tree for fast access.
Now the real trick: index updates are incremental; only the new document needs to be parsed (plus the replaced doc, if any). Queries use the views, hence their speed. The slow process of regenerating views is required only when the view itself is redefined. It is because data analysis frequently requires new temporary views that CouchDB isn't being considered a replacement for OpenMRS as an analytic repository.
-- Greg
You are right on the queries being blocked by view generation, and one
solution we've used is to preemptively build views via a daily/hourly
scheduled task.
In some cases we also explicitly specify stale=ok to avoid getting
hurt by view generation time. But this means we do not get fresh data,
but in some cases it is ok.
I am slightly worried about the number of writes seems like it's a
huge range. 1/4 million documents(at the high end) a day means that
we'll be stuck building views and compacting(otherwise just takes a
lot of diskspace). Is there a more precise range for number of new
observations added.
Also in my experience write times increases from a small number to a
high steady state number after a couple of million documents.
Some graphing from my benchmarking(obviously flawed and dumb, but just
a ball park.)
http://twitpic.com/5apy2t
If we have a few MRS observation reference document(s) in json, i
could run some scripts to give more representative
numbers on read/write/view generation time.
Nitpicking, but couch actually does not store the object in plain json
internally, but stores in erlang's serialized format, so it's not very
slow loading the object.
Most time really is spent in the http transport actually and not in
the deserialization. There have been reports in the wild of using
msgpack to reduce that time as well.
Modelling data in Couchdb is slightly harder because you have to know
what sort of queries are being used up front. My general rule is to
err on the side of flattened data model, where all the fields are in
one document. Only when the rate of read/writes are different and
usage scenarios are different then I start to break the documents.
Normalizing document does not buy us much in non relational world.
However there are scenarios that need a association relationship(a la
Joins) and Couchdb has a trick up it's sleeve with the Linked
documents
http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#Linked_documents
I have never used it in practice, but have conjured solutions to
certain problems using such a technique.
Hope i am making sense.
Cheers
sreekanth.
-- Greg
@Gabbar --
I think if you use the DB set up for the Stanford Hackathon you will be able to get a set of observations. If not, you can download the standalone version, install it along with the sample data, install the REST WS module, and then you can access the data via REST.
The reason for the large variability in the number of observations per encounter is variability in workflow. Different installations give encounters different levels of granularity. For example, an entire visit might be recorded on a single form entered as a single encounter. Or entry might be at the point of service so a visit would comprise 6 smaller encounters. Also, a form for an HIV visit might be 4 pages long while an observed TB drug session would only have one observation.
@Manor and Greg --
Actually I think it is the indexing rather than the inserting that takes the time in both cases. I think you might end up with fewer indexes overall in a CouchDB than in an SQL DB.
OpenMRS keeps some sub-objects unsaved in the data access object until their parent is saved. Thus the person names, IDs and attributes are not saved until the person is saved; the transaction is around the savePatient method. However, I believe observations are written immediately. You can check the code through Fisheye at source.openmrs.org.
There really is a need to retrieve observations to support the clinician. That's the meaning of a *longitudinal
* medical record. The doc wants to see the BMI or CD4 count or weight or vaccinations over the last 2 years. And this does apply to sets of patients and not just a single patient -- the HIV program needs to know who hasn't showed up for their appointment, the TB program needs to be sure that the local strain is responding to medication, the ANC program needs to assign follow-up tasks to community health workers, the hospital infection control officer needs to see all cases of hospital-acquired infection.
@David --
Thanks for the kind remarks. I don't think my coeur is criing, as my intern this summer said, you either know it or you don't.
From: raxa-jss-e...@googlegroups.com [mailto:raxa-jss-e...@googlegroups.com]
On Behalf Of Manor Lev-tov
Sent: Thursday, November 03, 2011 4:22 AM
To: raxa-jss-e...@googlegroups.com
Subject: Re: An EMR system using a document oriented database
The large number of observations per encounter is part what I was alluding to when saying that document oriented databases are better suited for the concept/observation pattern than relational databases. In
the document oriented approach, adding an encounter could involve an addition of a single document. In the relational approach adding an encounter is an insert for the encounter, plus an insert for each observation - so a total of approximately 7 to 100 inserts.
I believe that OpenMRS uses a separate transaction for each observation and there is a small overhead associated with each open and commit transaction that cumulatively could have a noticeable impact on throughput in a large scale deployment.
-Manor
It's great to have reports from the frontline!
Does MVP a description of system requirements or wish-list of functionality for the front end that you could share with the group? It could provide a real-world focus for the discussion.
A few comments and questions interspersed.
Thanks.
-- Greg
On Nov 5, 2011, at 10:48 AM, Andrew Kanter <aka...@ei.columbia.edu> wrote:
> Hi, everyone, I was also asked to add my two-cents to this discussion.
> To be honest, the technical details are beyond my capacity to comment
> in any beneficial way. However, I would state that one of the most
> significant benefits of OpenMRS is the community and the data model
> (at least for transactional database design). There are clear
What do you mean by 'transactional' in this context and how is the feature used?
> implications for analysis and for synching, which has led to
> collaborations with Pentaho and other reporting technologies. MVP has
> also been intrigued by the CouchDB approach for managing the front-end
> data collection and synching to our core OpenMRS database.
How do you deploy software and updates to the mobile devices? Also, do you use device-specific native apps or device-independent (HTML5) web apps? Would it be helpful if the software deployment / updates could be an automatic feature of data upload or syncing?
> We are sharing a common data dictionary across all of our sites and
> the need for interoperability (semantic and messaging) is high on our
> list of requirements. We believe that we can leverage a lot of what
How well does the data dictionary or concept dictionary match the MVP's needs for analysis, decision support and reporting? Do you collect data for which the concepts are not yet standardized in your OpenMRS?
> the OpenMRS community provides, including primarily using mobile
> devices for point-of-care interactions for now. As a new UI is
> available for interactions with the database, we will be even more
> eager to take advantage of it.
>
> For all of you, I think I would first spend a lot of time defining the
> requirements for a successful application under your use case, and
> then make a technical decision on architecture. Technology is always
> changing and it will therefore never be a static decision. Choose a
> maturity model which will allow you the most flexibility while
> reducing the risk of making changes to the data.
Good advice. But also beware of legacy attachments and arguments invoking sunk costs. Revising or rewriting a system to make better use of its innovations is neither "starting from scratch" nor "throwing out 40 years of research". Our task is to find the best path from where we are today to where we want to be tomorrow. Sunk costs might be deined as the cost incurred in the past that we'd spend differently today to get where we want to be tonorrow.
Staying close to the mainstream (eg, HTML5) reduces the risk that we'll go down blind alleys.
> I look forward to working with all of you regardless of what you
> decide!
>
> Andy Kanter
> Millennium Villages Project
> Columbia International eHealth Lab
>