Metadata from database

34 views
Skip to first unread message

Gregory Marler

unread,
Feb 20, 2018, 11:43:15 AM2/20/18
to XTF Users List
Hello,

For the requirements of a project, I'm trying to make sure I'm making the best/cleanest use of XTF before I end up hacking it out of recognition.

I have meta data in a database that supports the documents (in TEI XML).
(A) One table has a record for each document, this is 1:1 mapping.
(B) Several tables have supporting data that gets referenced. E.g. a document may include 
As was covered in <citation id="321">the book</citation> it is..
and 321 matches a record in the citations Db table.


The most XTF related documentation I have found is a thread from 2008 (and later replies).
This seems to be a good option at least for (A). However it is without context.
It is talking about changing the preFilter.xsl, correct?
Where do I get the saxon-sql and mysql connect jar files from, in order to place them in the lib directory? Is that the xtf/WEB-INF/lib folder?


I'm not sure if that solution is best for (B), because would it end up copying in the same database content where references to it is repeated?
The intention is that those supporting data tables will be used to display extra info like footnotes, but will also become facets that users can search on.


Many thanks for help you can provide, and I hope I'm able to help others by giving me info as to how things can be done.
Gregory.


Conal Tuohy

unread,
Feb 20, 2018, 10:22:09 PM2/20/18
to xtf-...@googlegroups.com
G'day Gregory!

A bit slow to respond I'm sorry; you've probably moved on with this already. I don't have personal experience using Saxon's XSLT extension module, so I won't comment on the earlier discussion thread directly, but I did have a couple of other thoughts about using pipelines in XTF, which has been exercising my brain recently (especially the bit about the awkwardness of adding new steps to an XTF pipeline).

You should probably also consider the option of moving the DB integration out of XTF into a separate batch process, e.g. a bash script -- or an XProc script would be my personal preference -- that either uses the Saxon XSLT extensions which you've referred to, or else uses the command-line apps mysql or mysqldump to produce XML files, and then uses an XSLT to integrate the data from MySQL into each TEI file; effectively a script that takes your TEI corpus as input and outputs a copy which has been enhanced. I think there's some advantage to be had from not having your code entangled with any particular version of XTF, and also it may be easier just for debugging, or avoiding dependency issues such as e.g. being forced to use XTF's version of Saxon.

My other suggestion would be to treat the data integration process as a part of the TEI processing pipeline; i.e. that database integration should be a process that takes TEI as input and produces TEI as output (which then gets fed into the start of XTF's pipelines). From a logical perspective, it makes sense to think of the DB-enhancement of the TEI documents as something independent of your actual publication platform (i.e. some specific version of XTF), and instead to see it as a relational DB staging area for editing TEI metadata. The data in your bibliographic database "supporting" your TEI corpus should have obvious and direct mappings onto TEI metadata elements, so it should come fairly naturally to write your DB integration code so that it simply merges the relational data into the TEI documents, encoded with the appropriate TEI elements. Then you can hand off the resulting TEI corpus to the remainer of XTF's pipelines, and (to some extent, at least) rely on existing (or upcoming) features of XTF's indexing and display stylesheets to handle that markup.

A complication with this approach is that XTF's pipelines have a fixed set of steps, so it's not totally straightforward to add a "database enhancement" step to the start. I believe the simplest way to do it is to use the "pipelining" features that are built into XSLT itself, where you transform the input and capture the output of the transformation (a result tree) in a variable, and then apply-templates to that variable to transform it a second time (and you can obviously extend that pattern to an arbitrary number of pipeline steps). If you do this kind of pipeline in XSLT it's helpful to use the same @mode for all the templates in a particular stage of the pipeline, so that you avoid any confusion between templates belonging to different stages in the pipeline.

So if it were me, and doing it in XTF (using the Saxon XSLT extension), I would write the enhancement as a single XSLT "db-enhance.xsl" which would contain a bunch of templates all with @mode="enhance", and I'd integrate that with the preFilter and dynaXML pipelines by finding the "local override" stylesheet which is the first stage of that pipeline, and editing it to add:

<xsl:import href="../../db-enhance.xsl"/>

In that "db-enhance.xsl" you'd have a template which matched TEI docs only if they had not already been enhanced, e.g.

<!-- NB this is the template that effectively adds a stage to one of XTF's pipelines -->
<!-- Catch a TEI document which isn't enhanced. Enhance it first, and resubmit the enhanced version for further processing -->
<xsl:template match="tei:TEI[not(contains(@type, 'database-enhanced'))]">
   <xsl:variable name="enhanced-tei-document>
      <xsl:copy><!-- copy the TEI element and its attributes -->
         <xsl:copy-of select="@*"/>
         <!-- set a flag so that the apply-templates below would not be matched again by this rule -->
         <xsl:attribute name="type">database-enhanced</xsl:attribute>
         <!-- copy the TEI, inserting the DB data where appropriate -->
         <xsl:apply-templates mode="enhance"/> 
      </xsl:copy>
   </xsl:variable>
   <!-- pass the enhanced document on to XTF's normal processing -->
   <xsl:apply-templates select="$enhanced-tei-document"/>
</xsl:template>

<xsl:template match="*" mode="enhance"><!-- identity template to copy all elements that aren't being enhanced -->...

<!-- insert bibliography from DB -->
<xsl:template match="tei:sourceDesc" mode="enhance">
   <xsl:copy><!-- copy existing sourceDesc -->
      <xsl:copy-of select="@* | *"/>
      <!-- insert bibliography from database -->
      <listBibl>
         <!-- make db calls to import relevant bibliographic citations -->
         ... 
      </listBibl>
   </xsl:copy>
</xsl:template>
 
etc.

I hope you find that helpful in some way!

Conal

--
You received this message because you are subscribed to the Google Groups "XTF Users List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xtf-user+unsubscribe@googlegroups.com.
To post to this group, send email to xtf-...@googlegroups.com.
Visit this group at https://groups.google.com/group/xtf-user.
For more options, visit https://groups.google.com/d/optout.



--

Seth.Public

unread,
Sep 11, 2018, 5:07:26 PM9/11/18
to xtf-...@googlegroups.com
Sorry I didn't see this since I had not been following xtf.

Yes, it goes into preFilter stylesheet, whatever yours is named. I have renamed my files, but you basically need a choose or something to decide whether to use the sql or mets, or head etc. As an example, right now I use:

               <xsl:choose>
<xsl:when test="$docmeta//text()">
<xsl:apply-templates  select="$docmeta" mode="sqlmeta"/>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates  select="document($docpath)" mode="integral"/>
</xsl:otherwise>
</xsl:choose>


The prefilter is for metadata mainly - so if you want to use sql to integrate notes, just make a richer xml file and then add a noindex tag to your notes. Or just pull in notes on demand using javascript. But if you study the saxon sql docs it might be able to do what you want. It's not that complex so YMMV. At any rate, yes, if you use a citation more than once it will be there more than once.

I am trying to attach the required files. They were a real PITA to find at the time, and I have no idea if you can even find them now. Delete the .foo extension and drop them into WEB_INF/lib; it should work as stated in those posts.

Seth



lib.zip

Steven D Majewski

unread,
Sep 11, 2018, 6:14:51 PM9/11/18
to xtf-...@googlegroups.com
On Sep 11, 2018, at 5:06 PM, Seth.Public <seth....@gmail.com> wrote:

Sorry I didn't see this since I had not been following xtf.

Yes, it goes into preFilter stylesheet, whatever yours is named. I have renamed my files, but you basically need a choose or something to decide whether to use the sql or mets, or head etc. As an example, right now I use:

               <xsl:choose>
<xsl:when test="$docmeta//text()">
<xsl:apply-templates  select="$docmeta" mode="sqlmeta"/>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates  select="document($docpath)" mode="integral"/>
</xsl:otherwise>
</xsl:choose>


The prefilter is for metadata mainly - so if you want to use sql to integrate notes, just make a richer xml file and then add a noindex tag to your notes. Or just pull in notes on demand using javascript. But if you study the saxon sql docs it might be able to do what you want. It's not that complex so YMMV. At any rate, yes, if you use a citation more than once it will be there more than once.

I am trying to attach the required files. They were a real PITA to find at the time, and I have no idea if you can even find them now. Delete the .foo extension and drop them into WEB_INF/lib; it should work as stated in those posts.

Seth



I think you have to go back to the Saxon-B 9.1.0.8j  version of the software from: http://saxon.sourceforge.net

http://www.saxonica.com/download/information.xml#earlier only goes back to 9.2, which is not compatible with XTF. 

( 9.2 and later changed both the plumbing and licensing for external functions. ) 


I’m guessing you might also probably want an older ( 5.x ) version of the mysql connector from: https://dev.mysql.com/downloads/connector/j/



Regarding the original question:

If the database is relatively static, you might also consider dumping the tables into an XML format, and using document() 
function and xpath to select the database row. ( The downside of doing that is that it would get messy and inefficient if you have to do too many join equivalents using xpath. However, maybe you could define a view that does the joins in mysql and then dump the view in XML. )  


— Steve Majewski





On Thu, Feb 8, 2018 at 2:15 AM, 'Gregory Marler' via XTF Users List <xtf-...@googlegroups.com> wrote:
Hello,

For the requirements of a project, I'm trying to make sure I'm making the best/cleanest use of XTF before I end up hacking it out of recognition.

I have meta data in a database that supports the documents (in TEI XML).
(A) One table has a record for each document, this is 1:1 mapping.
(B) Several tables have supporting data that gets referenced. E.g. a document may include 
As was covered in <citation id="321">the book</citation> it is..
and 321 matches a record in the citations Db table.


The most XTF related documentation I have found is a thread from 2008 (and later replies).
This seems to be a good option at least for (A). However it is without context.
It is talking about changing the preFilter.xsl, correct?
Where do I get the saxon-sql and mysql connect jar files from, in order to place them in the lib directory? Is that the xtf/WEB-INF/lib folder?


I'm not sure if that solution is best for (B), because would it end up copying in the same database content where references to it is repeated?
The intention is that those supporting data tables will be used to display extra info like footnotes, but will also become facets that users can search on.


Many thanks for help you can provide, and I hope I'm able to help others by giving me info as to how things can be done.
Gregory.



--
You received this message because you are subscribed to the Google Groups "XTF Users List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xtf-user+unsubscribe@googlegroups.com.
To post to this group, send email to xtf-...@googlegroups.com.
Visit this group at https://groups.google.com/group/xtf-user.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "XTF Users List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xtf-user+u...@googlegroups.com.

To post to this group, send email to xtf-...@googlegroups.com.
Visit this group at https://groups.google.com/group/xtf-user.
For more options, visit https://groups.google.com/d/optout.
<lib.zip>

Seth.Public

unread,
Sep 12, 2018, 12:54:43 PM9/12/18
to xtf-...@googlegroups.com
Sorry, it's been a while since I have looked into my xtf install. On the bright side, I noticed this since I am giving it a revamp! As far as I remember that saxon sql jar I sent out was built from source by a third party and works with Saxon 8.9. 9.x might not work, and also license compatibility might be an issue. I think Saxonica scrubbed the 8.x source from the internet specifically to change the license for 9.x, hence my hesitation on functionality etc. These are vague memories from ten years ago (!), so YMMV. The mysql connector was a 5.x connector, and again afaik that version was required as well.

Seth

<lib.zip>

Steven D Majewski

unread,
Sep 12, 2018, 1:29:42 PM9/12/18
to xtf-user@googlegroups.com List
On Sep 12, 2018, at 12:54 PM, Seth.Public <seth....@gmail.com> wrote:

Sorry, it's been a while since I have looked into my xtf install. On the bright side, I noticed this since I am giving it a revamp!

Same here!

As far as I remember that saxon sql jar I sent out was built from source by a third party and works with Saxon 8.9. 9.x might not work, and also license compatibility might be an issue. I think Saxonica scrubbed the 8.x source from the internet specifically to change the license for 9.x, hence my hesitation on functionality etc. These are vague memories from ten years ago (!), so YMMV. The mysql connector was a 5.x connector, and again afaik that version was required as well.



9.2 was where both code organization and license changed to three tiers. 
9.1, including source code, is still available on sourceforge and XTF is using a modified version of that. 
Modifications are why they is no separate saxon jar in XTF distribution — they are in xtf.jar.
And in fact, doing an ‘unzip -l xtf.jar | grep saxon’ , I see that there are a number of org/cdlib/xtf/saxonExt/sql/ classes in that jar, so I wonder if saxon9-sql.jar is actually necessary. However, the ones in that jar are prefixed as: net/sf/saxon/sql/ , so the package names are different. Maybe that was really the issue. 

— Steve Majewski


To unsubscribe from this group and stop receiving emails from it, send an email to xtf-user+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages