Using EWD's DOM Parser: a primer

604 views
Skip to first unread message

rtweed

unread,
Feb 22, 2009, 9:20:24 AM2/22/09
to Enterprise Web Developer Community
At the core of EWD is a lightweight HTML/XML DOM parser that can be
used as a tool in its own right. XML DOM processing is a very
powerful technique that can be used for all sorts of tasks. Of course
it's the main task involved in EWD custom tag definition, but there's
many more uses for EWD's DOM parser. You could even use it to create
a Mumps-based Native XML Database.

The parser is very lenient - it was designed to cater for the "lazy"
format of HTML, so it won't refuse to parse a document that isn't
correctly structured XML. It will do its best to tidy it up.
However, once parsed, the document is handled as correctly structured
XML, and if you output it again, it will be properly structured XML.
The parser doesn't worry about namespace declarations - it just
handles them as attributes. Prefixed names are treated as just a
standard tag name. So don't expect EWD's parser to validate your XML
documents! It's up to you to get it right. But by not having to
worry about all that XML "bureacracy", you'll find that EWD's parser
is a lot easier and quicker to deal with than most.

The DOM parser is all about its APIs. All the main W3C XML DOM APIs
are available in EWD and the entire suite of available APIs is
described in detail in the ewdMgr (EWD's portal) application. Click
on the Documentation tab and you'll find all the information you need,
complete with examples.

However I thought it would be helpful to provide the key starting
point, which is "how do you instantiate or create a DOM in the first
place?".....and of course, "having created a DOM, how do I convert it
back into an XML file?"

There are several ways of instantiating/building a DOM:

1) Building a DOM from scratch.

This is pretty cool - you can actually generate a complete XML
completely programmatically. You kick it off with a single API call:

s docOID=$$newXMLDocument^%zewdDOM
(docName,outerTagName,addProcessingInstruction)

where:

docName = the document name (DOM name) you want to assign to the DOM
you're going to create
outerTagName = the tag name of the outermost tag in the XML document
addProcessingInstruction = 1 if you want to add an initial default <?
xml version='1.0' encoding='UTF-8'?> to the XML document. (If you want
a different encoding, specify 0 and add your own)

docOID = the document (DOM) OID that will be assigned by EWD to the
DOM that is created.


Before we go any further, a bit of explanation about DOM docNames and
docOIDs. Each DOM has 2 unique identifiers - the OID which is an
opaque, automatically generated identifier, and a meaningful name that
you assign. Each must be unique - no other DOM must already exist
with these values. Somewhat confusingly you'll find that some APIs
require the docOID, some require the docName. This really goes back
to the original W3C API definitions.


The $$newXMLDocument function will delete any existing DOM with the
docName you specify.

So if you call the following:

s docOID=$$newXMLDocument^%zewdDOM("demo","xxx",1)

you'll create a DOM named "demo" which should look like this:

<?xml version='1.0' encoding='UTF-8'?>
<xxx />


So having run this API, how can we see what the DOM looks like? The
answer is the $$outputDOM function. Try this:

s ok=$$outputDOM^%zewdDOM("demo",1,2)

and you should see:

<?xml version='1.0' encoding='UTF-8'?>
<xxx />


The outputDOM function is used for viewing the current state of your
DOM, and also for spitting out the DOM into a file. Just specify the
outputLocation as "file" and add the location path, eg:

s ok=$$outputDOM^%zewdDOM("demo",1,2,"file",,"/tmp/demo.xml")

The 2 is controlling the layout of the XML document: 2= "prettified"
indented output. Set it to 0 and it will spit out the DOM as a
stream:

s ok=$$outputDOM^%zewdDOM("demo",1,0)
<?xml version='1.0' encoding='UTF-8'?><xxx />


So we have a simple DOM instantiated. Now what?

Well you'll probably want to add new tags into the DOM. That's really
easy. Just use the "macro" API addElementToDOM. However, we need to
find out something first.

In a DOM, all the tags and attributes etc are represented as "nodes",
each with their own OID known as the nodeOID. In our document we've
just created, that outer tag (<xxx />) is known as the
"documentElement" and before we do anything we need to discover its
nodeOID:

s docName="demo"
s deOID=$$getDocumentElement^%zewdDOM(docName)

We can check that it's what we expect:

w $$getTagName^%zewdDOM(deOID)
xxx

Now we can add a new child tag into the DOM, using the documentElement
as the parentNode:

s attr("hello")="world"
s newOID=$$addElementToDOM^%zewdDOM("yyy",deOID,,.attr,"Bingo!")

Let's check what it's done:

s ok=$$outputDOM^%zewdDOM("demo",1,2)
<?xml version='1.0' encoding='UTF-8'?>
<xxx>
<yyy hello="world">
Bingo!
</yyy>
</xxx>


OK so that's one way to get a DOM started. How about if we want to
process an XML (or HTML) file? Just use the API call $$parseXMLFile^
%zewdAPI (Note that this is in ^%zewdAPI, not ^%zewdDOM), eg:


s ok=$$parseXMLFile^%zewdAPI("/tmp/demo.xml","secondDOM")

If it worked OK, ok="". If not it will tell you what went wrong, eg:

w ok
The file path /tmp/demox.xml does not exist

If it parsed OK, you can now list the document with $$outputDOM^
%zewdDOM("secondDOM",1,2)


And finally, what about if we want to grab some HTML from a web site
and turn it into a DOM so we can process it? Just use $$parseURL^
%zewdAPI, eg:

s ok=$$parseURL^%zewdAPI("www.mgateway.com","/","third",,1)

then list the DOM it created:

s ok=$$outputDOM^%zewdDOM("third",1,2)

Note the 1 at the end of the parameter list for parseURL. That tells
the parser that the content needs to be processed as XHTML, not XML.
When you set it to 1, all tag names and attribute names are converted
to lower case. If you want to retain the exact names in their
original case, specify 0.


You've probably realised by now that you can also use EWD's DOM in
conjunction with REST services that you set up using m_apache. Just
write out the HTTP header records including

Content-type:text/xml

then add a call to outputDOM and the contents of the DOM you've
created will be transmitted to the awaiting client system.


So that's it really! You now have a DOM and you can use any of those
API methods to manipulate it.


A few more tricks:

How do I find the nodeOID of a particular tag?

The easiest way is if the tag has an id attribute:

s nodeOID=$$getElementById^%zewdDOM("myId",docOID)

Otherwise you can return a local array of all tags matching the name:

$$getElementsArrayByTagName^%zewdDOM("yyy",docName,,.nodes)

zwr nodes
nodes("12-4")=""

If you know that there's just one tag with that name:

s nodeOID=$$getTagOID^%zewdDOM("yyy",docName)
zwr nodeOID
12-4

There are many other APIs for navigating around in a DOM - consult the
ewdMgr documentation.



How do you get rid of a DOM once you're done?

s ok=$$removeDocument^%zewdDOM(docName)



Can I clear down all my DOMs in one go?

d clearDOMs^%zewdDOM



Can I clear down DOMs that start with a particular prefix?

d clearDOMsByPrefix^%zewdDOM("myPrefix")




Can I get a list of my DOMs?


d listDOMs^%zewdDOM(.listOfDOMs)

GTM>zwr
listOfDOMs("demo")=""
listOfDOMs("third")=""


Where are DOMs physically held?

In the global ^zewdDOM

Please DON'T manipulate this global yourself. ALWAYS use the API
methods.


That's enough to get you going! Have fun with EWD's DOM parser!

glilly

unread,
Feb 23, 2009, 11:29:37 AM2/23/09
to Enterprise Web Developer Community
Rob:

Thanks for the Tutorial. It is very useful. I got everything to work
except reading in an XML file. I guess our CCR files are too big. I
tried reading in an interesting one and then a smaller, less
interesting one, and I got size errors as listed below.

It looks like the limitation is only in the File to Global processing
in gtmImportFile^zewdHTMLParser . One approach to expand it's
capabilities would be to check for the existence of ^%ZISH and then
use FileToGlobal (FTG^%ZISH) to read the entire file in, and then run
through it to put it into something that is suitable for the rest of
the parseDocument code to use... something like:

i $G(^%ZISH)["" D ;
. S ZPATH="/home/wvehr1/EHR/CCR" ; the path needs to be separated from
the file name
. S ZFILE="PAT_26_CCR_V1_0_16.xml" ; the filename needs to be
separated from the path
. S ZTMP=$NA(^TMP("XML",$J,0)) ; tmp storage where %ZISH will put the
XML
. D FTG^%ZISH(ZPATH,ZFILE,ZTMP,3) ; the 3 tells ZISH to increment the
third subscript
. move the xml to ^CacheTempEWD($J)

GTM>s ok=$$parseXMLFile^%zewdAPI("/home/wvehr1/EHR/CCR/
PAT_2_CCR_V1_0_16a.xml","
%GTM-E-REC2BIG, Record size (15025) is greater than maximum (4080) for
region: DEFAULT
%GTM-I-GVIS, Global variable: ^CacheTempEWD(2774,1)
%GTM-I-RTSLOC, At M source location gtmImportFile+14^
%zewdHTMLParser
wvehr1@worldvista:~$ gtm

GTM>s ok=$$parseXMLFile^%zewdAPI("/home/wvehr1/EHR/CCR/
PAT_26_CCR_V1_0_16.xml","
secondDOM")
%GTM-E-REC2BIG, Record size (6703) is greater than maximum (4080) for
region: DEFAULT,%GTM-I-GVIS, Global variable:
^CacheTempEWD(3464,1)
At M source location eof+1^%zewdHTMLParser
%GTM-W-NOTPRINCIO, Output currently directed to device /home/wvehr1/
EHR/CCR/PAT_26_CCR_V1_0_16.xml

rtweed

unread,
Feb 23, 2009, 12:04:57 PM2/23/09
to Enterprise Web Developer Community
Yes it's a record size issue. In previous postings I've noted that
the global ^zewdDOM needs to be configured to cope with up to 32k
strings. If you import big XML files, you'll also need to make the
global ^CacheTempEWD capable of handling long strings too.

If your XML documents have even longer string lengths (as is sometimes
the case if they hold very large quantities of text inside tags), then
I'd need to amend the way the parser handles text nodes. This may or
may not be an easy task! :-)

I'm not familiar with ^%ZISH. I'll take a look

Rob

LD 'Gus' Landis

unread,
Feb 23, 2009, 12:15:56 PM2/23/09
to enterprise-web-de...@googlegroups.com
George,

The issue is that there are several globals that need
to be set to hold 32K strings. How to do this is in the
thread named something like "building ewd from scratch".

For sure, putting ewd into a "stock" VISTA environment
will not work.

The short of it is:
/etc/sysctl.conf add
# for ewd and GT.M 32k global nodes
kernel.shmmax = 134217728
and reboot your system (Sorry, AFAIK this is a boot param).

Regarding your mumps.dat, you need to create the
segments like with parameters like:

$ cd /usr/local/gtm/ewd
$ source /usr/local/gtm/gtmprofile
$ $gde
GTM>d ^GDE
GDE>change -segment default -block=32256
GDE>change -region default -key=255 -record=32240
GDE>exit
$ mupip create

Note: One of the things a I'll be doing at the RMU
techical conference is working this into what we need
for VISTA... since there are different requirements
for different things... e.g. journaling for replication,
not journaling temp globals, etc.

etc.

Cheers,
--ldl
--
---
NOTE: If it is important CALL ME - I may miss email
---
LD Landis - N0YRQ - de la tierra del encanto
3960 Schooner Loop, Las Cruces, NM 88012
651/340-4007 N32 21'48.28" W106 46'5.80"
"If a thing is worth doing,
it is worth doing badly." –GK Chesterton.

An interpretation: For things worth doing: Doing them, even if badly,
is better than doing nothing perfectly (on them).

"but I trust my family jewels only to Linux." -- DE Knuth
(http://www.informit.com/articles/article.aspx?p=1193856)

George Lilly

unread,
Feb 23, 2009, 12:28:23 PM2/23/09
to enterprise-web-de...@googlegroups.com
Larry and Rob:

FTG%^ZISH (which a VistA Kernal routine and opensource) seems to handle this issue without any special GTM configuration. If this can be done without too much programming hassle, isn't it a better way to go? It can read in HUGE XML files very quickly. What ends up on each global node is usually quite small by comparison. Below is a snippet from the ^TMP global after reading in a 60K CCR file.

George 

^TMP("XML",2630,349)="</Telephone>"
^TMP("XML",2630,350)="<Source>"
^TMP("XML",2630,351)="<Actor>"
^TMP("XML",2630,352)="<ActorID>ACTORSYSTEM_1</ActorID>"
^TMP("XML",2630,353)="</Actor>"
^TMP("XML",2630,354)="</Source>"
^TMP("XML",2630,355)="</Actor>"
^TMP("XML",2630,356)="<Actor>"
^TMP("XML",2630,357)="<ActorObjectID>ACTORSYSTEM_1</ActorObjectID>"
^TMP("XML",2630,358)="<InformationSystem>"
^TMP("XML",2630,359)="<Name>WorldVistA EHR/VOE</Name>"
^TMP("XML",2630,360)="<Version>1.0</Version>"
^TMP("XML",2630,361)="</InformationSystem>"
^TMP("XML",2630,362)="<Source>"
^TMP("XML",2630,363)="<Actor>"
^TMP("XML",2630,364)="<ActorID>ACTORSYSTEM_1</ActorID>"
^TMP("XML",2630,365)="</Actor>"
^TMP("XML",2630,366)="</Source>"
^TMP("XML",2630,367)="</Actor>"
^TMP("XML",2630,368)="</Actors>"
^TMP("XML",2630,369)="</ContinuityOfCareRecord>"

rtweed

unread,
Feb 23, 2009, 12:37:37 PM2/23/09
to Enterprise Web Developer Community
George

It isn't the size of the document and resulting global that's the
problem, it's the individual record size, and ^zewdDOM most definitely
needs 32k records to store large text nodes (it doesn't take much
Javascript to create one of these!)

However I see what you've done - you've split up the input document
into records that are only as long as each tag, and merging ^TMP into
^CacheTempEWD($j) should be all you need to do to get the documents to
parse successfully. That should at least avoid the need to
reconfigure ^CacheTempEWD. What goes in ^CacheTempEWD should look
like:

^CacheTempEWD($j,lineNo) = a chunk from the XML file

The parser doesn't mind how the content in this global is structured,
or it it breaks chunks in the middle of tag boundaries - it buffers it
internally to cater for any kind of content break-up. So a simple
MERGE of your ^TMP global into ^CacheTemp should be all that's needed!

Rob
> ...
>
> read more »

LD 'Gus' Landis

unread,
Feb 23, 2009, 12:38:13 PM2/23/09
to enterprise-web-de...@googlegroups.com
Rob,

If it does turn out that there are documents that
exceed the 32K, and if you end up making a way
to allow larger chunks without increasing the
block size...

It would be nice to have a parameter that controls
the size of a block.

Not a high priority IMO.

Cheers,
--ldl

LD 'Gus' Landis

unread,
Feb 23, 2009, 12:39:19 PM2/23/09
to enterprise-web-de...@googlegroups.com
This is going to be FUN!!!

On Sun, Feb 22, 2009 at 7:20 AM, rtweed <rob....@gmail.com> wrote:
>
> At the core of EWD is a lightweight HTML/XML DOM parser that can be

Thanks Rob!

rtweed

unread,
Feb 23, 2009, 12:53:09 PM2/23/09
to Enterprise Web Developer Community
Wait till you start really playing around with the DOM API methods.
You'll be amazed what you can do in just a few API calls. Check out
methods such as

removeIntermediateNode

insertIntermediateNode

importNode (one of my favourites - allows you to merge one DOM into
another)

getChildrenInOrder


Also you need to learn the basic "primitive" DOM methods -
createElement, appendChild, setAttribute, getParent, getFirstChild,
getNextSibling etc

Bear in mind that the create* methods add a new node into the document
but not into the tree. appendChild and insertBefore are what are used
to connect the new node to the correct place in the DOM tree.

Also you need to know that removeChild will unlink a node and its sub-
tree from the DOM tree, but still leaves it in the document. So
that's really cool - you can unhook a sub-tree and re-connect it
somewhere else. if you look at the source of the macro APIs such as
removeIntermediateNode, you'll see how this trick can be used.


DOM processing really is totally awesome once you get into it, and the
Mumps engine is just a perfect vehicle for it.

You also have to learn recursive programming - you'll find you need to
do this quite a bit when handling DOM structures. A bit daunting at
first but you soon get the hang of it! :-)

Rob




On 23 Feb, 17:39, "LD 'Gus' Landis" <ldlan...@gmail.com> wrote:
> This is going to be FUN!!!
>

George Lilly

unread,
Feb 23, 2009, 1:19:18 PM2/23/09
to enterprise-web-de...@googlegroups.com
Rob:

No need to Merge; FTG^%ZISH can read directly into ^CacheTempEWD($j):

 i $g(^%ZISH)["" d  ; if the VistA Kernal routine %ZISH exists
 . n zfile,zpath,ztmp s (zfile,zpath,ztmp)=""
 . s zfile=$re($p($re(filepath),"/",1)) ;file name
 . s zpath=$p(filepath,zfile,1) ; file path
 . s ztmp=$na(^CacheTempEWD($j,0))
 . s ok=$$FTG^%ZISH(zpath,zfile,ztmp,2) ; import the file incrementing subscr 2

here's the end of it... going on to get the rest of the parse...
gpl

^CacheTempEWD(3772,6161)="<Actor>"
^CacheTempEWD(3772,6162)="<ActorID>ACTORSYSTEM_1</ActorID>"
^CacheTempEWD(3772,6163)="</Actor>"
^CacheTempEWD(3772,6164)="</Source>"
^CacheTempEWD(3772,6165)="</Actor>"
^CacheTempEWD(3772,6166)="<Actor>"
^CacheTempEWD(3772,6167)="<ActorObjectID>ACTORSYSTEM_1</ActorObjectID>"
^CacheTempEWD(3772,6168)="<InformationSystem>"
^CacheTempEWD(3772,6169)="<Name>WorldVistA EHR/VOE</Name>"
^CacheTempEWD(3772,6170)="<Version>1.0</Version>"
^CacheTempEWD(3772,6171)="</InformationSystem>"
^CacheTempEWD(3772,6172)="<Source>"
^CacheTempEWD(3772,6173)="<Actor>"
^CacheTempEWD(3772,6174)="<ActorID>ACTORSYSTEM_1</ActorID>"
^CacheTempEWD(3772,6175)="</Actor>"
^CacheTempEWD(3772,6176)="</Source>"
^CacheTempEWD(3772,6177)="</Actor>"
^CacheTempEWD(3772,6178)="</Actors>"
^CacheTempEWD(3772,6179)="</ContinuityOfCareRecord>"

rtweed

unread,
Feb 23, 2009, 1:26:34 PM2/23/09
to Enterprise Web Developer Community
Excellent

A word of warning - parsing large XML documents into a DOM may be
pretty slow and will probably increase the size by as much as 10X due
to all the pointers and indexes. You need to ask why you're turning
it into a DOM. If all you're doing is spitting it back out, then it's
not a very efficient process. If it's complex analysis and
transformation, then clearly the DOM is perfect, provided you're
willing to accept that the parsing process may take a while. Small
to medium sized documents are never a problem, and GT.M seems to be
very fast for this kind of thing from what I've seen so far. I'll be
interested to hear how you get on with these big documents

Rob
> ...
>
> read more »

George Lilly

unread,
Feb 23, 2009, 1:40:55 PM2/23/09
to enterprise-web-de...@googlegroups.com
An incoming CCR is an XML file containing clinical information from another provider. Our initial DOM processing will be to parse the file into our CCR ELEMENTS file. From there, it can be combined with the clinical Elements from this provider into a "cumulative CCR". We will also probably store the XML non-parsed as it was received in the CCR INCOMING XML file. 

I'm not ready to write the parsing/importing code yet, but I'll let you know about performance and XPath functionality into the CCR which I will experiment with in the next few days.

gpl

LD 'Gus' Landis

unread,
Feb 23, 2009, 1:41:26 PM2/23/09
to enterprise-web-de...@googlegroups.com
Rob,

There is an XML package written for VISTA that
Wally Fort wrote. I believe George (CCR-CCD)
is using that. There may be some lessons on
handling large elements/attributes there.

I am very very excited to see how complete your
implementation of the DOM is. I am so looking
forward to "diving in"!

Most of my experience with the DOM and family
is from Python, where quite a few of the XML
notables hang out.

Cheers,
--ldl

On Mon, Feb 23, 2009 at 10:53 AM, rtweed <rob....@gmail.com> wrote:
>
> Wait till you start really playing around with the DOM API methods.
> You'll be amazed what you can do in just a few API calls. Check out
> methods such as

> ...

LD 'Gus' Landis

unread,
Feb 23, 2009, 2:02:12 PM2/23/09
to enterprise-web-de...@googlegroups.com
George,

When you ran into the node size limit with the
ewd DOM, was that something that you had
been able to load with the VISTA DOM?

Cheers,
--ldl

On Mon, Feb 23, 2009 at 11:40 AM, George Lilly <gli...@glilly.net> wrote:
> An incoming CCR is an XML file containing clinical information from another
> provider. Our initial DOM processing will be to parse the file into our CCR
> ELEMENTS file. From there, it can be combined with the clinical Elements
> from this provider into a "cumulative CCR". We will also probably store the
> XML non-parsed as it was received in the CCR INCOMING XML file.
> I'm not ready to write the parsing/importing code yet, but I'll let you know
> about performance and XPath functionality into the CCR which I will
> experiment with in the next few days.
> gpl
>

--

George Lilly

unread,
Feb 23, 2009, 2:10:01 PM2/23/09
to enterprise-web-de...@googlegroups.com
Gus:

Wally's MXML parser is part of VistA and has the main entry point of EN^MXMLPRSE. It parses an XML file and has functions for DOM manipulation. As far as I know, there is no XPath functionality as part of the package. 

For the CCR-CCD project, we have been exporting, not importing CCRs using a template. We wrote our own XML processing package GPLXPATH which all of the extract processors use to map extracted Variables to their pieces of the template. We've tried out MXML but haven't used it yet, mostly because of the lack of XPath support.

It looks to me like I can rewrite the GPLXPATH package to have it do its manipulations using the EWD DOM package. We have functions like INSERT^GPLXPATH and CP^GPLXPATH and MAP^GPLXPATH which act on xml nodes and their children. I see a C0CXPATH package which uses EWD and gives a consistent interface to the dozen or so Extraction routines.

We are getting ready to try and support incoming CCRs, and that's where we will really need a DOM processor and hopefully one with XPath support. 

gpl

George Lilly

unread,
Feb 23, 2009, 2:13:01 PM2/23/09
to enterprise-web-de...@googlegroups.com
EN^MXMLPRSE encodes a DOM from an XML file already in a global. I just ran it against the XML file I imported to ^CacheTempEWD($j) and it processed with no trouble... To get the XML File into the global, we use FTG^%ZISH.

gpl

rtweed

unread,
Feb 23, 2009, 2:16:46 PM2/23/09
to Enterprise Web Developer Community
If EWD has problems with one of your big XML files, this is maybe
something we could look at in the technical meeting. I guess it's not
possible to send an example of one of the XML documents so I can see
how it deals with it myself?



On 23 Feb, 19:13, George Lilly <gli...@glilly.net> wrote:
> EN^MXMLPRSE encodes a DOM from an XML file already in a global. I just ran
> it against the XML file I imported to ^CacheTempEWD($j) and it processed
> with no trouble... To get the XML File into the global, we use FTG^%ZISH.
> gpl
>

George Lilly

unread,
Feb 23, 2009, 2:26:52 PM2/23/09
to enterprise-web-de...@googlegroups.com
Rob:

Here's a sample and the stylesheet you use to view it in your browser. I don't think EWD is going to have any trouble with it now that we have it reading into your global with %ZISH. I'm looking forward to manipulating it with your XPath functions.

There is no Protected Health Information in this file. It's all test data. We have hundreds of them.

George
PAT_2_CCR_V1_0_16.xml
ccr.xsl

George Lilly

unread,
Feb 23, 2009, 6:01:20 PM2/23/09
to enterprise-web-de...@googlegroups.com, CCD-CCR-project
Rob: I've incorporated the test for and call to FTG^%ZISH into _zewdHTMLParser.m for your consideration. I've attached the modified routine. Here is the changed part: gtmImportFile(filepath) n buf,buflen,i,len,lineNo,maxlen,x1,x2,xlen k ^CacheTempEWD($j) i $g(^%ZISH)["" d QUIT i ; if VistA Kernal routine %ZISH exists - gpl 2/23/09 . n zfile,zpath,ztmp,zok s (zfile,zpath,ztmp)="" . s zfile=$re($p($re(filepath),"/",1)) ;file name . s zpath=$p(filepath,zfile,1) ; file path . s ztmp=$na(^CacheTempEWD($j,0)) . s zok=$$FTG^%ZISH(zpath,zfile,ztmp,2) ; import the file increment subscr 2 . s i=$o(^CacheTempEWD($j,""),-1) ; highest line number o filepath:(readonly:stream:exception="g importNotExists") u filepath:exception="g eof" s lineNo=1,buf="",maxlen=15000
_zewdHTMLParser.m

George Lilly

unread,
Feb 23, 2009, 6:02:45 PM2/23/09
to enterprise-web-de...@googlegroups.com, CCD-CCR-project
It now seems to create a DOM from large CCR xml files with no trouble. I'm going to try out the XPath features next. 

glilly

unread,
Feb 23, 2009, 6:11:57 PM2/23/09
to CCD-CCR-project, enterprise-web-de...@googlegroups.com
The code snippet got really messed up in this message :(

gtmImportFile(filepath)
n buf,buflen,i,len,lineNo,maxlen,x1,x2,xlen
k ^CacheTempEWD($j)
i $g(^%ZISH)["" d QUIT i ; if VistA Kernal routine %ZISH exists - gpl
2/23/09
. n zfile,zpath,ztmp,zok s (zfile,zpath,ztmp)=""
. s zfile=$re($p($re(filepath),"/",1)) ;file name
. s zpath=$p(filepath,zfile,1) ; file path
. s ztmp=$na(^CacheTempEWD($j,0))
. s zok=$$FTG^%ZISH(zpath,zfile,ztmp,2) ; import the file increment
subscr 2
. s i=$o(^CacheTempEWD($j,""),-1) ; highest line number
o filepath:(readonly:stream:exception="g importNotExists")
u filepath:exception="g eof"
s lineNo=1,buf="",maxlen=15000

On Feb 23, 6:01 pm, George Lilly <gli...@glilly.net> wrote:
> Rob: I've incorporated the test for and call to FTG^%ZISH into
> _zewdHTMLParser.m for your consideration. I've attached the modified
> routine. Here is the changed part: gtmImportFile(filepath) n
> buf,buflen,i,len,lineNo,maxlen,x1,x2,xlen k ^CacheTempEWD($j) i
> $g(^%ZISH)["" d QUIT i ; if VistA Kernal routine %ZISH exists - gpl 2/23/09
> . n zfile,zpath,ztmp,zok s (zfile,zpath,ztmp)="" . s
> zfile=$re($p($re(filepath),"/",1)) ;file name . s zpath=$p(filepath,zfile,1)
> ; file path . s ztmp=$na(^CacheTempEWD($j,0)) . s
> zok=$$FTG^%ZISH(zpath,zfile,ztmp,2) ; import the file increment subscr 2 . s
> i=$o(^CacheTempEWD($j,""),-1) ; highest line number o
> filepath:(readonly:stream:exception="g importNotExists") u
> filepath:exception="g eof" s lineNo=1,buf="",maxlen=15000
>
> On Mon, Feb 23, 2009 at 2:26 PM, George Lilly <gli...@glilly.net> wrote:
> > Rob:
> > Here's a sample and the stylesheet you use to view it in your browser. I
> > don't think EWD is going to have any trouble with it now that we have it
> > reading into your global with %ZISH. I'm looking forward to manipulating it
> > with your XPath functions.
>
> > There is no Protected Health Information in this file. It's all test data.
> > We have hundreds of them.
>
> > George
>

>  _zewdHTMLParser.m
> 76KViewDownload

glilly

unread,
Feb 23, 2009, 10:31:20 PM2/23/09
to Enterprise Web Developer Community, CCD-CCR-project
I don't think that what I did here to test for the existence of ^%ZISH
is right. Perhaps I should test $T(FTG^%ZISH) instead. What's the
right way to do it?

rtweed

unread,
Feb 24, 2009, 3:34:09 AM2/24/09
to Enterprise Web Developer Community
I'll take a look but my immediate thought is that I'd rather keep
outside dependencies out of the EWD code base. However I'd be quite
happy to add a "raw" API that starts at the point where ^CacheTempEWD
is already populated - that way it leaves the responsibility of
populating ^CacheTempEWD up to the external system in whatever way is
necessary, and leaves EWD's parser entirely open.

Rob


On 23 Feb, 23:01, George Lilly <gli...@glilly.net> wrote:
> Rob: I've incorporated the test for and call to FTG^%ZISH into
> _zewdHTMLParser.m for your consideration. I've attached the modified
> routine. Here is the changed part: gtmImportFile(filepath) n
> buf,buflen,i,len,lineNo,maxlen,x1,x2,xlen k ^CacheTempEWD($j) i
> $g(^%ZISH)["" d QUIT i ; if VistA Kernal routine %ZISH exists - gpl 2/23/09
> . n zfile,zpath,ztmp,zok s (zfile,zpath,ztmp)="" . s
> zfile=$re($p($re(filepath),"/",1)) ;file name . s zpath=$p(filepath,zfile,1)
> ; file path . s ztmp=$na(^CacheTempEWD($j,0)) . s
> zok=$$FTG^%ZISH(zpath,zfile,ztmp,2) ; import the file increment subscr 2 . s
> i=$o(^CacheTempEWD($j,""),-1) ; highest line number o
> filepath:(readonly:stream:exception="g importNotExists") u
> filepath:exception="g eof" s lineNo=1,buf="",maxlen=15000
>
> On Mon, Feb 23, 2009 at 2:26 PM, George Lilly <gli...@glilly.net> wrote:
> > Rob:
> > Here's a sample and the stylesheet you use to view it in your browser. I
> > don't think EWD is going to have any trouble with it now that we have it
> > reading into your global with %ZISH. I'm looking forward to manipulating it
> > with your XPath functions.
>
> > There is no Protected Health Information in this file. It's all test data.
> > We have hundreds of them.
>
> > George
>
>  _zewdHTMLParser.m
> 76KViewDownload

rtweed

unread,
Feb 24, 2009, 3:56:40 AM2/24/09
to Enterprise Web Developer Community
In fact EWD already has such an API:

s error=$$parseDocument^%zewdHTMLParser(docName,isHTML)

This parses whatever is in ^CacheTempEWD($j) and creates a DOM named
docName. isHTML should be set to 0 for XML documents.

error="" if it parsed successfully.

So I'd recommend you plug into that from whatever code you need to
process your XML documents.

Rob

rtweed

unread,
Feb 24, 2009, 4:34:50 AM2/24/09
to Enterprise Web Developer Community
Your sample document parses into a DOM without any problems in the
Virtual Appliance - takes about 2 seconds to parse on my setup. I
didn't need to make any configuration changes beyond the standard way
the Virtual Appliance is already configured.

Rob


On 23 Feb, 19:26, George Lilly <gli...@glilly.net> wrote:
> Rob:
> Here's a sample and the stylesheet you use to view it in your browser. I
> don't think EWD is going to have any trouble with it now that we have it
> reading into your global with %ZISH. I'm looking forward to manipulating it
> with your XPath functions.
>
> There is no Protected Health Information in this file. It's all test data.
> We have hundreds of them.
>
> George
>
>  PAT_2_CCR_V1_0_16.xml
> 60KViewDownload
>
>  ccr.xsl
> 189KViewDownload

George Lilly

unread,
Feb 24, 2009, 10:32:32 AM2/24/09
to enterprise-web-de...@googlegroups.com
Great Rob. That will work.

Thank you.

gpl
Reply all
Reply to author
Forward
0 new messages