Enable XML catalogs ?

6 views
Skip to first unread message

Steven D. Majewski

unread,
Mar 13, 2019, 8:56:45 PM3/13/19
to xtf-user@googlegroups.com List
Does anyone happen to know how to enable XML catalogs for XTF ?
Is this something that can be done with a config, or does it require some java code ?

— Steve Majewski


Martin Haye

unread,
Mar 14, 2019, 12:57:50 PM3/14/19
to xtf-...@googlegroups.com
Not sure what you mean by "XML catalogs". Are you talking about an XML interface to the data like OAI-PMH (which is built in)?

--Martin

--
You received this message because you are subscribed to the Google Groups "XTF Users List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xtf-user+u...@googlegroups.com.
To post to this group, send email to xtf-...@googlegroups.com.
Visit this group at https://groups.google.com/group/xtf-user.
For more options, visit https://groups.google.com/d/optout.

Steven D. Majewski

unread,
Mar 14, 2019, 1:30:40 PM3/14/19
to xtf-user@googlegroups.com List


They are typically used to resolve DTDs, charents, and schemas to local copies instead of making a remote request. 
Oxygen uses them, and to use them in other software usually requires configuring a resolver for the XML parser. 


However, what I wanted to do will probably not work out of the box. 

I have a collection of EAD XML files that use XInclude’s for boiler plate published information, so when an org changes it’s contact info it only has to be changed in one place. That file has a server URL that I would like to resolve to a local file instead of making a network request. 

Supposedly this is doable using XML Catalogs:

However, I was trying to test this with another package that does implement catalog support [http://basex.org] , and with catalogs enabled, it didn’t appear to resolve my XInclude files locally.  ( i.e. when I turned wifi off on my laptop to verify that access was local only, the parser failed during ingest. ) 

So it appears unlikely that I can get what I want without using a custom resolver, and not certain that will work without source code changes. i.e. if someone has configured XTF to use XML catalogs for DTD and entity resolution, that is no guarantee that it will work for resolving XInclude’s.  

So I think that’s a “Nevermind!”  

I could batch edit all of those files to use relative file: URI’s to make them resolve locally, but then they wouldn’t work if taken out of context, which is the whole point of XML Catalogs.

BTW: If you ever try to batch validate thousands of XML files, you will find that another need for catalog files is that most sites that host official DTDs and Schema ( w3c, loc.gov ) will cut you off after making some dozens of requests.
So you need to download the schema locally and redirect to local copy, and XML catalogs is one way of doing that. 
If Oxygen didn’t support XML Catalogs, validation would break occasionally after using it a lot. 

— Steve M. 

Martin Haye

unread,
Mar 14, 2019, 2:44:08 PM3/14/19
to xtf-...@googlegroups.com
Hmm, interesting. Thanks for the education! I did a fair amount of work in XTF to eliminate DTD loading, due to just the problem you mention with hosting sites shutting off access or being very slow.

For your EADs, maybe you could make your index prefilter slurp in the boilerplate stuff?

--Martin

Steven D. Majewski

unread,
Mar 14, 2019, 3:11:34 PM3/14/19
to xtf-user@googlegroups.com List

On Mar 14, 2019, at 2:43 PM, Martin Haye <r.c.mar...@ucop.edu> wrote:

Hmm, interesting. Thanks for the education! I did a fair amount of work in XTF to eliminate DTD loading, due to just the problem you mention with hosting sites shutting off access or being very slow.

For your EADs, maybe you could make your index prefilter slurp in the boilerplate stuff?

--Martin


That’s what already happening — XInclude’s get expanded on index prefilter. 

( And back in 2013, there was a thread about different ways of doing that: prefilter stylesheet vs parser options. 
  I was just searching in my mail archive, and I may revisit one of those alternatives if I can fix it there. ) 

However, the source of the include files is always the same even when I’m trying to run something different on a test site.  Editing the XInclude files on the test site has no effect on how the test copy is indexed as it’s always pulling from the canonical production site.  

But this question really came up in looking to optimize indexing. 

When I was testing my corpus in basex, I was seeing how much the XInclude expansion added to parsing and indexing for that package, so I was thinking it must be a similar load for XTF.  ( Not as simple to see the contribution of that one step to the total indexing time in XTF. ) 

I don’t often have to do a complete reindex on production, but I’m testing a lot of different changes, so indexing speed is a limiting factor. Currently, I have produced a subset of data for my tests. But that subset it getting out of sync with the production data. 

— Steve. 
Reply all
Reply to author
Forward
0 new messages