Best way to store medical records, patients and clinicians

835 views
Skip to first unread message

cayasso

unread,
Jul 28, 2011, 9:15:40 AM7/28/11
to mongodb-user
Hi all, I would like to get some advice on storing structuring my DB
collections, I am working on developing an application similar to
http://www.zocdoc.com/ Users will be able to find doctors based on
location and speciality. I am trying to find the best way to store all
this data.

My first question, is should I store patients and doctors in the same
collection? To give you an idea of the data structure

Patients = {
_id,
name,
birthday,
gender,
isSpecialist,
location: { country, zipcode, city, province, coords: [lat, long]},
verified,
records: [issues, medications, allergies],
doctors[ids...]}

And

Doctors = {
_id,
name,
gender,
speciality,
tags,
location: { country, zipcode, city, province, coords: [lat, long]},
verified,
education,
certifications,
affiliations,
patients: [ids...],
workplaces: { zipcode, city, province, coords: [lat, long] }
}

As you can see both objects store similar data so it makes me think
that I can handle both entities in a single collection and just add a
flag like isDoctor = true/false and so distinguish them, as you can
see a patient can have n doctors and a doctor can have n patients for
this I am using the doctors, patients array. Is this the correct way
to go what do you guys think? or maybe there is a better way I am not
seeing.

My other question is what is the best way to store the doctor calendar
data in the doctors collection any recommendations here, should I
store each date doctor is available in a field called schedule as
array

As you can see here my concern is that handling all this data in one
single collection gets very messy, I really need some advice here.

Thank you in advance!

Jonathan


Dan Cook

unread,
Jul 28, 2011, 1:46:41 PM7/28/11
to mongod...@googlegroups.com
Of course the answer is "it depends."  :-)

I don't think Mongo cares about the document structure inside a collection.  You do however, when querying, so the document would have to be structured so that queries are efficient. (setting up the correct indexes for example so that all the doctors documents are not "searched" when looking at the "patients").   There are tradeoffs with respect to collections/documents. 

I will deffer to others more experienced here, but one thing I was told when doing schema design is to not have documents that can grow continuously forever.  Documents are limited in size (16 Mb?) and the patients records array looks like it could be quite large, depending on the contents.  It might be worth splitting the records into another document/collection. 

Also keep in mind that Mongo does not enforce referential integrity.  So in the patients document the doctor id, which I assume is an ObjectId(), may or may not exist in the doctors document.

What is the access pattern?  I assume the patients information would be updated much more frequently than the doctors information.  If you need the doctors information for each patient consider "de-normalizing" the some of the information from the doctors document and put it into the patients document.  For example you may only need the doctors name.  That could easily be put in the patients documents instead of the doctor id. 

Please reference
http://www.mongodb.org/display/DOCS/Schema+Design

Dan



--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


gregorinator

unread,
Jul 28, 2011, 3:21:45 PM7/28/11
to mongod...@googlegroups.com
An easy mistake to make, coming from the relational world, it to
follow the relational process when structuring NoSQL data. The
relational process is (roughly) to make a list of the data entities,
and associate the data attributes with the data entities where they
have a one-to-one relationship. You'll notice that this doesn't take
into account, at all, how the data is to be _used_. It only considers
the relationships between the entities and attributes themselves.

In a document model, the relationships between the data attributes and
entities is less important (not unimportant, just less), because
there's no schema. What's important in the document model is how you
are going to query the data. This is because, for maximum
performance, you want to retrieve everything you need in a single
Find. In a relational model, you may read a sales order master row,
then read the sales order line item rows, then read the product
information rows for the products on the sales order line items. In a
document model, you want to query a sales order and have everything
returned to you at once, so you make it all part of the same document.

An example that someone asked about on this list recently was a
document model for a professional sports league: He (or she) wanted
to know if there should be a document for each player, a document for
each team (containing the players in an array), or a document for the
entire league (containing the teams in an array). And the answer
was... it depends. How is the data going to be retrieved? Is it
going to be necessary to retrieve an entire league at once? If not,
then an entire team at once? Or is the application going to be
primarily centered around retrieving individual players.

So the answer is, you need to know how your final application is going
to need to retrieve the information, and then structure the document
around the application's needs. Unlike in the relational model, you
can't model document data based on the data alone. You have to model
based on how it's going to be used.

> My first question, is should I store patients and doctors in the same
> collection? To give you an idea of the data structure

{snip}


> As you can see both objects store similar data so it makes me think
> that I can handle both entities in a single collection and just add a
> flag

I think the question is irrelevant. Just because patients and doctors
share some attributes doesn't suggest that they should be stored
together. Or not. Since there's no schema, and every document has
its own name:value pairs, it makes no difference if there's a
"address":"x" pair in one collection and a similar "address":"x" pair
in another collection. The determining factor, again, should be: How
is the data going to be used? If you're going to have queries that
pull patient documents and doctor documents in the same query, then
putting them in the same collection makes sense. On the other hand
(as seems more likely) if certain queries will pull patient
information, and separate queries will pull doctor information, then
it might be simpler for the human programmers to follow if they're in
different collections.

Here's an example of carrying these concepts to an extreme, using your
data model: Let's say the _only_ purpose of your system is to allow
patients to see their doctor information -- I know this isn't what you
really want, but just assume it is for now. This means the only query
that will ever be performed is that a patient will want to pull his or
her personal information, as well as a list of all of his or her
doctors, with the doctors' addresses, phone numbers, etc. In this
case, the optimal model would be one document per patient, and each
patient document would contain all of that patient's doctors _and_ all
the doctors' information. That way, you would pull everything you
need to show the patient in a single query. Of course, this would
hugely denormalize the doctor information -- the same doctor's address
and phone number, for example, would be repeated in the document of
every patient that had that doctor. But that would be the best
document design, for that use. In that case, you may ask, what would
you do if a doctor changed his phone number? It's in hundreds or
thousands of documents. The answer, though, is simple: You would be
to have a batch process that ran in a quiet time to update all the
phone numbers.

I don't know about you, but when I first came to MongoDB from the
relational world, talk like this had my head spinning. But I got used
to it, because, as you can see, modeling in a document world is very
different from modelling in a relational world. And the first
difference is that you have to know the end-use of the data before you
model, and have it shape your model.

One caveat to denormalizing data, at least in MongoDB, is that there's
a severe performance penalty to adding to the size of an existing
document, so if a document is going to be added to frequently,
normalization of the data (by moving it to its own collection) is
suggested. However, this creates its own performance penalty on
reads. You have to use your knowledge of how the data is going to be
used to decide which is worse. For example, patients may not, on
avergae, add new doctors very often -- maybe once a month, maybe once
every two months? If the patient/doctor data is going to read
frequently, it might be worth it to denormalize the doctor data by
adding it to the patient document, since each patient will only take
the performance hit of increasing the document size once every month
or two. On the other hand, doctors might add new patients very
frequently, so it might make sense to move the doctor/patient
relationship out of a doctor document and into another collection.
Again, you have to understand how your data is going to be used, and
change, in order to make these kinds of decisions. Also, keep in mind
that if data is removed from a document, the document doesn't get
smaller, so that opens up extra space for something to be added later.
So if doctors are deleting patients at about the same rate as adding
them, they won't incur the performance hit of expnading the document
unless they exceed their previous high-water mark.

Also, something you can do to mitigate the performance penalty of
adding to document size is to "inflate" new documents by prepopulating
them with bogus data when you add them. For example, when you add a
new patient document, and you think it's reasonable that a patient may
have up to ten doctors, you can initialize the patient document with:

"doctors" : ["bogus_id1", "bogus_id2", "bogus_id3", "bogus_id3",
"bogus_id4", "bogus_id5", "bogus_id6", "bogus_id7","bogus_id8",
"bogus_id9", "bogus_id10"]

Then, after you add it, remove the bogus doctors:

patients.update( { "_id": "123" }, { "$unset" : { "doctors" : 1 } } )

Now the patient document will already have enough space to add the
first ten doctors, and the performance penalty will only kick in when
you get to the eleventh.

HTH,
gs

cayasso

unread,
Jul 28, 2011, 10:46:54 PM7/28/11
to mongodb-user
Hey thank you both, you have given me light when I was in darkness,
all of this document and schema-less DB is new to me, I can see now,
at first I was more focus on the schema than the data access I have
now a new view and will have to reconsider some of my initial
thinking, it makes so sense now.

@gregorinator considering what Dan mentioned above related to 16mb
document size limit, is that also an important fact beside the data
access to consider for splitting a document, should I always consider
this? I know it depends on how often the data is updated and the
traffic projection but in the worst case-scenario, where a document
will always grow like for example a document that contain history
data, what could be a good strategy for this kind of implementation.

Thanks for such a extended explanation @gregorinator I really
appreciate it and how others kind find this thread helpful.

JB

gregorinator

unread,
Jul 29, 2011, 1:09:47 PM7/29/11
to mongod...@googlegroups.com
On Thu, Jul 28, 2011 at 10:46 PM, cayasso <cay...@gmail.com> wrote:
> @gregorinator considering what Dan mentioned above related to 16mb
> document size limit, is that also an important fact beside the data
> access to consider for splitting a document,

The short answer: No.

The long answer: Have you stopped to think how big a 16Mb document
really is? Consider that the entire King James Bible -- Old and New
Testaments -- in text form is only 4.8Mb. You could fit the entire
King James Bible -- Old and New Testaments -- into a 16Mb document
_three times_ and _still_ have space left over. Seriously, how much
history can a single patient (for example) accumulate? More than
three King James Bibles worth? I'd hate to be _that_ patient's
doctor. :)

I'll concede that if your history includes large binaries, such as
x-rays, MRI's, or long (very long) audio dictations, then a document
could get pretty big, so in that case it might make sense to make each
large binary a document in a separate collection, referenced in the
history. And if an individual binary exceeds 16Mb by itself, you
always have MongoDB's awesome GridFS to fall back on. Having the
binaries in separate docs might even make more sense from a "how you
use the data" perspective, since what you might do is display text
history, then have a button that says "View X-Ray", and the user
clicks on the button to see the X-Ray (or whatever). In that case, it
would make sense to use a separate query to retrieve the binary, and
so it would make sense to store the binary in a separate document.

HTH,
gs

Dan Cook

unread,
Jul 29, 2011, 1:22:20 PM7/29/11
to mongod...@googlegroups.com
On Fri, Jul 29, 2011 at 10:09 AM, gregorinator <gregor...@gmail.com> wrote:
On Thu, Jul 28, 2011 at 10:46 PM, cayasso <cay...@gmail.com> wrote:
> @gregorinator considering what Dan mentioned above related to 16mb
> document size limit, is that also an important fact beside the data
> access to consider for splitting a document,

The short answer:  No.

The long answer:  Have you stopped to think how big a 16Mb document
really is?  Consider that the entire King James Bible -- Old and New
Testaments -- in text form is only 4.8Mb.  You could fit the entire
King James Bible -- Old and New Testaments -- into a 16Mb document
_three times_ and _still_ have space left over.  Seriously, how much
history can a single patient (for example) accumulate?  More than
three King James Bibles worth?  I'd hate to be _that_ patient's
doctor. :)

Yeah I thought the same thing also and this was covered in our awesome Mongo Applications class, but since the contents of the records wasn't specified I thought it would be remiss if I did not at least mention it.  :-)

Cheers,
Dan

gregorinator

unread,
Jul 30, 2011, 8:21:03 AM7/30/11
to mongod...@googlegroups.com
On Fri, Jul 29, 2011 at 1:22 PM, Dan Cook <one...@gmail.com> wrote:
> since the contents of the records wasn't specified I
> thought it would be remiss if I did not at least mention it.  :-)

Granted. Although the document size isn't a practical limit in most
situations, it's still something that every developer should be aware
of and have in the back of his or her mind at all times. Especially
if the data includes large binaries.

gs

cayasso

unread,
May 16, 2012, 4:55:12 PM5/16/12
to mongod...@googlegroups.com

Hey guys I never thanked you for the great post and explanation, I always point people who are starting with mongoDB to read this post.

Thanks,
JB 
Reply all
Reply to author
Forward
0 new messages