For reports, etc., a proposed Simplified Medical Document format based on static web pages semantic annotation

David Clunie

unread,

Aug 23, 2012, 7:41:39 AM8/23/12

to veterinary-heal...@googlegroups.com

Hi all

I have been evaluating alternative approaches to encoding (human) radiology reports and other documents, that satisfy the need to convey format content, meta-data, and if  available, structured and coded content, and I dare say these would be equally applicable for veterinary applications as well.

Simple static web pages + semantic markup seem to be the way to go, without relying on any medically-specific standards, just a medical vocabulary.

Details and some primitive examples are provided here:

http://dclunie.blogspot.com/2012/08/a-simplified-medical-document-format.html

David

Stuart Turner

unread,

Aug 23, 2012, 1:09:45 PM8/23/12

to veterinary-heal...@googlegroups.com

Hi David

Thanks for this. You have thoughtfully highlighted the semantic web approaches and the compelling reasons for their adoption. While I've seen naturally more acceptance and activity in the life science realm, as you've mentioned (including your and Keith Boone's posts and possibly with FHIR), there is more collaboration and research that is bearing some fruit in the clinical realm.

In a semantic clinical question/answer decision support tool we prototyped an instance of DITA by creating a DITA specialization for a context-aware information retrieval platform, similar in scope to HL7's InfoButton, but without the HL7 part. DITA was used to specialize a topic schema from its base elements and we reuse of several open resource-related vocabularies (Dublin Core, FOAF, and provenance metadata), the natural use of content via DITAs transclusion methods, and annotation of topics using NLMs MeSH and content (to some extent) with SNOMED CT. This was found to be at least tractable within a single instance. I'm still uncertain about the efficacy of inter-enterprise exchange using DITA and whether it should be continued.

After some discussions within W3C's Healthcare and Life Science Group (W3C HCLS) [1], we are presently reviewing an approach to use microdata and the W3C's WebSchemas Medical Health proposal [2] and Schema.org's Health and Medical vocabulary [3] to provide resources scoped to clinicians and clients from within an EMR. We especially wish to annotate resources with levels of evidence (LOE). Properties such as <MedicalGuideline> and <MedicalEvidenceLevel> in the schema are exciting.

We have faced trying to repurpose CDA for veterinary radiology reporting. There are obvious barriers for us (veterinary medicine) such as limited HL7 implementation (v2), expertise and especially intellectual property/proprietary licensing issues as you mentioned. Your comment about open alternatives is extremely welcome. That group is exploring alternatives and reuse of existing and open semantic web technologies I think is the best path.

Starting with NCI in 1964, the Veterinary Medical Database (VMDB) [4] has been collecting summary discharge data (i.e. discharge diagnoses and limited patient demographics) for several species from veterinary teaching hospitals in the U.S. The data has been encoded more recently (since 2004?) with SNOMED CT and historically with SNVDO. With their support, I'm hoping to reify the existing model into RDF and a triple store to evaluate SemWeb approaches as well and to expand the model in several representations (UML and RDFa). I'll be seeking expertise at the W3C HCLS Semantic Web Summer School and Hackathon at the MIT CSAIL Stat Center in Cambridge, MA next week so will hopefully have more to share.

[1] http://www.w3.org/blog/hcls/
[2] http://www.w3.org/wiki/WebSchemas/MedicalHealthProposal
[3] http://schemaorg-medicalext.appspot.com/

Thanks for your post

~ Stuart

---------------
Stuart Turner, DVM, MS
Biomedical Informaticist | Principal, Leafpath Informatics, LLC
stu...@leafpath.org | +1.916.596.0255 | @ Skype <turner.stuart>
http://leafpath.org | FOAF: http://stuartturner.org/foaf.rdf

> --
> To unsubscribe from this group, send email to
> veterinary-health-it-...@googlegroups.com or visit
> http://groups.google.com/group/veterinary-health-it-standards?hl=en

David Clunie

unread,

Aug 23, 2012, 1:34:14 PM8/23/12

to veterinary-heal...@googlegroups.com

Hi Stuart

Thanks for the additional background; it is good to know that there
has been so much work done in related areas already, both for non-
patient specific knowledge as well as for extraction of information
from patient-specific sources.

In the meantime, I thought I would take a crack at converting one of
Dennis's sample reports at

"https://sites.google.com/site/vetradiologyreportstandards/home/sample-xml"

by taking his dir_vrrs.xml sample, then applying his CDA-DIR.xsl stylesheet
to make HTML, than adding in RDFa semantic annotations into the HTML to
recapture the structured content.

The result is attached as an HTML file; also attached is the Turtle
(Terse RDF Triple) extract from feeding the annotated HTML file into
"http://rdfa.info/play/", which shows the extracted semantic content.

The effort is pretty quick and dirty and needs tidying up, but it is
interesting how much can be achieved with nothing medically-specific
except the vocabulary, and even then a lot of schema.org that isn't
related to the Medical Health stuff can also be re-used (see the
contact address, for example). Extending the Medical Health vocabulary
to cover the needs of patient-specific documents (rather than just
knowledge sources like web pages) would be a good way to go if the
W3C group is interested.

Obviously the plain text parts could also be annotated with codes, to
allow for more knowledge extraction without resorting to NLP, but I
didn't do that yet, since that is beyond the lowest common denominator
in authoring at the moment. I did code what was coded with LOINC and
SNOMED in the original, of course, using the schema.org/MedicalCode
construct.

David

dir_vrrs_rdfa.html

dir_vrrs_rdfa.ttl

Stuart Turner

unread,

Aug 23, 2012, 7:04:23 PM8/23/12

to veterinary-heal...@googlegroups.com

On Aug 23, 2012, at 10:34 AM, David Clunie wrote:

> The result is attached as an HTML file; also attached is the Turtle
> (Terse RDF Triple) extract from feeding the annotated HTML file into
> "http://rdfa.info/play/", which shows the extracted semantic content.

Very nice. Yet another cool tool I need to find time to explore.

> Extending the Medical Health vocabulary
> to cover the needs of patient-specific documents (rather than just
> knowledge sources like web pages) would be a good way to go if the
> W3C group is interested.

I think there is interest. Although I haven't been present during the inception of some of these efforts, I does seem that their intended scope has been to sufficiently and pragmatically represent resources with sufficient metadata for a basic representation without modeling all of the static and dynamic semantics of health care. They are obviously intended for self-describing documents and namespaces that embrace the open graph and perhaps less reliance on things like registries (e.g. for OIDs).

>
> I did code what was coded with LOINC and
> SNOMED in the original, of course, using the schema.org/MedicalCode
> construct.

So one of the challenges is how much work we do to extend these emerging schemas (and their environment) without getting in the weeds and defeating their intrinsically different representation and the relative elegance in their simplicity. Another is how we persist context as we do in more traditional document and information models in health care.

One example is the use of concept descriptor (CD) in HL7/ISO 21090 which is more comprehensive and arguably more useful than MedicalCode in schema.org. Of the 18 attributes for CD in ISO 21090, there is the subset of about 8 that seem to well represent the majority of use cases. Schema.org has 2 (codeValue and codeSystem). One is used as a predicate/value in a triple, the other in an enumerated value domain within a complex datatype. I'm still trying to grok how we effectively use one vs. the other - or both - and make safe transformations between them. See my mapping below. Note that codingSystem (schema.org) is the same as codeSystemName, not codeSystem in the CD datatype.

ConceptCode.png

Dennis Ballance

unread,

Aug 30, 2012, 12:26:16 PM8/30/12

to veterinary-heal...@googlegroups.com

One concern I have about this approach is that HTML is intended to be primarily a layout and display markup. Certainly semantic meaning can be attached to the various elements, but taking this approach completely removes any structural semantics. A particular piece of data could appear in the first <p>, or a <td> located near the bottom of the page. A parser would have to scan the entire document for each semantic tag, and someone wanting to render the data in a different way would have to deconstruct then reconstruct the document. By contrast, a structural XML document allows things like xpath to access data in a consistent tree location.

The upside is that the semantic tagging eliminates the need to embed the domain model into the document – indeed, one could conceivably devise the document without having any domain model at all, and build the model later. From a pragmatic perspective, this would make “getting to market” much easier. Certainly in the veterinary domain the implementers are more concerned with getting the data from A to B, and as long as the structure of that data is consistent from one transaction to the next, that’s “good enough”. VRRS is very aware that the “window of opportunity” is closing quickly for radiology reporting, and if Sem-HTML allows us to sidestep the issue of having a domain model, that makes it pretty attractive to me.