XSLT to compare DTDs (NLM version)

Demian Hess

unread,

Aug 21, 2012, 9:15:34 PM8/21/12

to dtdan...@googlegroups.com

This is the XSLT that compares two versions of a DTD--but only works with the output from the NLM version of the code. I could rewrite to use the current output...

Looking again at the "datadictionary" and it is quite different from the dtdanalyzerr in terms of its output--capture general entity and parameter entity declarations.

however, i dropped the ability to capture annotations, which is really a shame. don't remember why.

i do like the fact that it capture the filename and line number where a declaration appears--but that info highly dependent on the parser used. xerces did a good job.

the datadict app also will automatically apply a transform. i would get back the metadata untransformed by passing it thru a, um, transform that just said <xsl:copy-of/>

compare-dtds.xsl

Chris Maloney

unread,

Aug 21, 2012, 10:37:32 PM8/21/12

to dtdan...@googlegroups.com

Hi, Demian,

I'm a little bit confused. I found a copy of dataDistribution that you sent me quite some time ago, I guess, but it is slightly different from this new version. In your new version, all of the jar files have extensions ".jar_hide", is this some new convention I've never heard of?. Your new zip file has a lot of mac artifacts. But other than that, the files seem the same. I'll attach my version of the zip file, for reference.

Does the datadictionary version use the same basic principle of capturing parse events from xerces? Why does it need saxon, is that for the transformation that it applies at the end?

Why is that the main executable class is "gov.pubmedcentral.dtd.documentation.Application", but you say that you "dropped the ability to capture annotations". I would have thought that the annotations are crucial to good documentation.

Glancing at the code, I see that these two are hugely different -- basically two completely independent applications. So I think "merging" is out. Did datadictionary evolve from (what we're now calling) dtdanalyzer? If so, then presumably it has a lot of improvements. Capturing entity declarations and filenames and line numbers are, I think, huge plusses. The task that got me started on this was to add a few features, and there might be some overlap there -- I'll have to check. But otherwise, if the output format is completely different, it might be tough to justify migrating everything to a completely new version.

These are just some thoughts and questions off the top of my head. I'll be able to look deeper at it on Thursday.

Cheers!

Chris

--
You received this message because you are subscribed to the Google Groups "DtdAnalyzer" group.
To view this discussion on the web visit https://groups.google.com/d/msg/dtdanalyzer/-/S5DC9YM5hl0J.
To post to this group, send email to dtdan...@googlegroups.com.
To unsubscribe from this group, send email to dtdanalyzer...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/dtdanalyzer?hl=en.

datadictionaryDistribution.zip

Demian Hess

unread,

Aug 21, 2012, 10:56:17 PM8/21/12

to dtdan...@googlegroups.com, dtdan...@googlegroups.com

i added 'hide' to the jars because email was refusing to send--so just rename.

saxon is to apply a transform after the xml is created. i could then immediately produce the output i wanted from the app.

to get the dtd xml representation i'd simply specify an xsl that just copied out the input tree.

t removed annotation ability because nlm wasnt interested. i like annotations but it requires adding special comments and noone at nlm wanted to do that. i figured it'd be easy to add back in.

the core classes work the same way as the dtd analyzer--captures rhe decl and lexical events. i tweaked the content model to make less element centric.