I'm a little bit confused. I found a copy of dataDistribution that you sent me quite some time ago, I guess, but it is slightly different from this new version. In your new version, all of the jar files have extensions ".jar_hide", is this some new convention I've never heard of?. Your new zip file has a lot of mac artifacts. But other than that, the files seem the same. I'll attach my version of the zip file, for reference.
Does the datadictionary version use the same basic principle of capturing parse events from xerces? Why does it need saxon, is that for the transformation that it applies at the end?
Why is that the main executable class is "gov.pubmedcentral.dtd.documentation.Application", but you say that you "dropped the ability to capture annotations". I would have thought that the annotations are crucial to good documentation.
Glancing at the code, I see that these two are hugely different -- basically two completely independent applications. So I think "merging" is out. Did datadictionary evolve from (what we're now calling) dtdanalyzer? If so, then presumably it has a lot of improvements. Capturing entity declarations and filenames and line numbers are, I think, huge plusses. The task that got me started on this was to add a few features, and there might be some overlap there -- I'll have to check. But otherwise, if the output format is completely different, it might be tough to justify migrating everything to a completely new version.
These are just some thoughts and questions off the top of my head. I'll be able to look deeper at it on Thursday.
Cheers!
Chris