Getting an attribute from an xml file indexed by solr

35 views
Skip to first unread message

Shane McCarthy

unread,
Jun 1, 2016, 11:17:02 AM6/1/16
to isla...@googlegroups.com
I am working on a project where the the solr index is queried and the results are used to perform some chemical analysis.  I began work on this project after DGI had set up the repository and I have been able to modify the repository to keep up with the changes in the project up until now.

I cannot figure out how to get an attribute from the xml files generated with the slurp_all_chemicalML_to_solr.xslt stylesheet.  The stylesheet from the DGI set up did not do this correctly and the reading I have done about XSLT makes me think it is going to take a lot more reading for me to get it so I thought I would ask here first in case there is something that i am over looking.

Thank you,
Shane


bro...@barnard.edu

unread,
Jun 1, 2016, 11:27:39 AM6/1/16
to islandora
If you have administrative access to /solr/admin (<tomcat host>:<port, typically 8080>/solr/admin/schema.jsp) you can look at the fields SOLR is indexing and how it handles attributes. You may find your field already there :)

Shane McCarthy

unread,
Jun 1, 2016, 12:08:12 PM6/1/16
to isla...@googlegroups.com
I have looked at the solr/admin and queried to see all the fields that are there.  Unfortunately when I try to index an attribute from the xml it is not there.  There are many fields that I am not sure about where they come from so I could look through all the stylesheets to see if any of those could be used as examples. 

Thanks.

--
For more information about using this group, please read our Listserv Guidelines: http://islandora.ca/content/welcome-islandora-listserv
---
You received this message because you are subscribed to the Google Groups "islandora" group.
To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.
Visit this group at https://groups.google.com/group/islandora.
To view this discussion on the web visit https://groups.google.com/d/msgid/islandora/a0916101-1927-4ad3-95b9-871b09b1b506%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Benjamin Rosner

unread,
Jun 1, 2016, 12:10:17 PM6/1/16
to isla...@googlegroups.com
They're coming from the primary Solr schema.xml, would be my guess. 

You received this message because you are subscribed to a topic in the Google Groups "islandora" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/islandora/enh8R4NazwA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to islandora+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Benjamin Rosner

Instructional Applications Developer
Library and Academic Information Services
Sulzberger Hall Annex, Barnard College
p: 212-854-9005

Shane McCarthy

unread,
Jun 1, 2016, 2:47:53 PM6/1/16
to isla...@googlegroups.com
Yes but the xslt files that generate those fields in xml to be handled by schema.xml.  In those xslt files I may find an example to help with my problem.

Jared Whiklo

unread,
Jun 3, 2016, 9:17:11 AM6/3/16
to isla...@googlegroups.com
Hey Shane,

You want to have an attribute from some XML datastream indexed as a new
field in Solr?

I don't have any slurp_all_chemicalML_to_solr.xslt, so I can't speak
specifically to it.

But in general, if you can get a template to match on the XML element
then you can create a new field using the standard

<xsl:element name="field">
<xsl:attribute name="name">
<xsl:text>shanes_attribute_ms</xsl:text>
</xsl:attribute>
<xsl:value-of select="/xpath/to/element@attribute" />
</xsl:element>

Cheers,
jared
> --
> For more information about using this group, please read our Listserv
> Guidelines: http://islandora.ca/content/welcome-islandora-listserv
> ---
> You received this message because you are subscribed to the Google
> Groups "islandora" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to islandora+...@googlegroups.com
> <mailto:islandora+...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/islandora.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/islandora/CAAhmLgNvu%2BETW0XtWh3QAD0XMaB6%2B%2Bak61ms0d-NWZUi%2BhN%3DSw%40mail.gmail.com
> <https://groups.google.com/d/msgid/islandora/CAAhmLgNvu%2BETW0XtWh3QAD0XMaB6%2B%2Bak61ms0d-NWZUi%2BhN%3DSw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
Jared Whiklo
jwh...@gmail.com
--------------------------------------------------
You know you're from Winnipeg when...You use a down comforter in the summer.

signature.asc

Shane McCarthy

unread,
Jun 3, 2016, 1:35:38 PM6/3/16
to isla...@googlegroups.com
Yes Jared that is exactly what I would like to do.  I had hoped I would be able to do this by only modifying the stylesheet that was there.  However, that is proving to be more difficult than I imagined.  I can see how your suggestion should work and can get it to work on an example I create but not how to put it into the stylesheet that is putting the data in Solr.

I have more islandora to learn.  I cannot seem to generate the xml file for solr that the xslt will generate using xalan.  So I have been fumbling around in the dark it seems.

Thank you,
Shane 

To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.

Jared Whiklo

unread,
Jun 5, 2016, 11:34:34 AM6/5/16
to isla...@googlegroups.com

If you can provide a sample XML and your current stylesheet I can try to help more.

Cheers,
Jared

Shane McCarthy

unread,
Jun 5, 2016, 2:27:15 PM6/5/16
to isla...@googlegroups.com
Thank you Jared.  

I have attached a syle sheet that gets the value of the element to solr.  I have been unable to get an attribute's value for that element.  This style sheet is what I have been working off of by adding fields by adding lines like lines 25-29.  Unfortunately getting an attribute's value value has not been achieved.

I tried to simply change the XPATH of the match on line 25 to include /@columns for example but it does not get anything to solr.

Any pointers would be greatly appreciated.

Cheers,
Shane



sample.xml
sample.xslt

Shane McCarthy

unread,
Jun 6, 2016, 12:42:14 PM6/6/16
to isla...@googlegroups.com
I have managed to get the values I needed into Solr.  I simply had to specify that I select="@columns" instead of select=".".

Thanks for your suggestions.

Cheers,

Shane

Jared Whiklo

unread,
Jun 6, 2016, 1:05:17 PM6/6/16
to isla...@googlegroups.com
Hey Shane,

So not sure which attribute you are trying to get, but I was able to
move from an output of.

<?xml version="1.0" encoding="UTF-8"?>
<field xmlns:mods="http://www.loc.gov/mods/v3"
xmlns:java="http://xml.apache.org/xalan/java"
xmlns:foxml="info:fedora/fedora-system:def/foxml#"
name="cml_atomic_overlap_population_ms">10.519302 0.185676 0.310778
0.310776 -0.008815 -0.008815 0.310776 -0.008814 0.185676 5.962179
-0.008927 -0.008933 0.378882 0.378877 -0.008931 0.378875 0.310778
-0.008927 9.249839 -0.002375 0.001927 -7.0E-5 -0.002375 -7.0E-5 0.310776
-0.008933 -0.002375 9.249849 -7.0E-5 -7.0E-5 -0.002375 0.001929
-0.008815 0.378882 0.001927 -7.0E-5 0.47601 -0.009721 -7.0E-5 -0.009721
-0.008815 0.378877 -7.0E-5 -7.0E-5 -0.009721 0.476008 0.001928 -0.00972
0.310776 -0.008931 -0.002375 -0.002375 -7.0E-5 0.001928 9.249846 -7.0E-5
-0.008814 0.378875 -7.0E-5 0.001929 -0.009721 -0.00972 -7.0E-5
0.476007</field>

To an (I'm guessing) over-kill induced.

<?xml version="1.0" encoding="UTF-8"?>
<field xmlns:mods="http://www.loc.gov/mods/v3"
xmlns:java="http://xml.apache.org/xalan/java"
xmlns:foxml="info:fedora/fedora-system:def/foxml#"
name="cml_atomic_overlap_population_ms">10.519302 0.185676 0.310778
0.310776 -0.008815 -0.008815 0.310776 -0.008814 0.185676 5.962179
-0.008927 -0.008933 0.378882 0.378877 -0.008931 0.378875 0.310778
-0.008927 9.249839 -0.002375 0.001927 -7.0E-5 -0.002375 -7.0E-5 0.310776
-0.008933 -0.002375 9.249849 -7.0E-5 -7.0E-5 -0.002375 0.001929
-0.008815 0.378882 0.001927 -7.0E-5 0.47601 -0.009721 -7.0E-5 -0.009721
-0.008815 0.378877 -7.0E-5 -7.0E-5 -0.009721 0.476008 0.001928 -0.00972
0.310776 -0.008931 -0.002375 -0.002375 -7.0E-5 0.001928 9.249846 -7.0E-5
-0.008814 0.378875 -7.0E-5 0.001929 -0.009721 -0.00972 -7.0E-5
0.476007</field>
<field xmlns:mods="http://www.loc.gov/mods/v3"
xmlns:java="http://xml.apache.org/xalan/java"
xmlns:foxml="info:fedora/fedora-system:def/foxml#"
name="attrib_rows_ms">8</field>
<field xmlns:mods="http://www.loc.gov/mods/v3"
xmlns:java="http://xml.apache.org/xalan/java"
xmlns:foxml="info:fedora/fedora-system:def/foxml#"
name="attrib_columns_ms">8</field>
<field xmlns:mods="http://www.loc.gov/mods/v3"
xmlns:java="http://xml.apache.org/xalan/java"
xmlns:foxml="info:fedora/fedora-system:def/foxml#"
name="attrib_datatype_ms">xsd:double</field>
<field xmlns:mods="http://www.loc.gov/mods/v3"
xmlns:java="http://xml.apache.org/xalan/java"
xmlns:foxml="info:fedora/fedora-system:def/foxml#"
name="attrib_dictref_ms">cc:mulliken</field>
<field xmlns:mods="http://www.loc.gov/mods/v3"
xmlns:java="http://xml.apache.org/xalan/java"
xmlns:foxml="info:fedora/fedora-system:def/foxml#"
name="attrib_matrixtype_ms">squareSymmetric</field>
<field xmlns:mods="http://www.loc.gov/mods/v3"
xmlns:java="http://xml.apache.org/xalan/java"
xmlns:foxml="info:fedora/fedora-system:def/foxml#"
name="attrib_units_ms">nonSi:elementaryCharge</field>
<field xmlns:mods="http://www.loc.gov/mods/v3"
xmlns:java="http://xml.apache.org/xalan/java"
xmlns:foxml="info:fedora/fedora-system:def/foxml#"
name="attrib_templateref_ms">l601.condensed</field>

To do this I made a couple of assumptions, but hopefully you can adjust
as necessary.

I have attached my modified copy of the sample.xslt

I will mention that I tested this directly with the XALAN jar and NOT
using Gsearch, but in past that is how I have adjusted my stylesheets.

Cheers,
jared

On 2016-06-05 1:27 PM, Shane McCarthy wrote:
> Thank you Jared.
>
> I have attached a syle sheet that gets the value of the element to
> solr. I have been unable to get an attribute's value for that element.
> This style sheet is what I have been working off of by adding fields by
> adding lines like lines 25-29. Unfortunately getting an attribute's
> value value has not been achieved.
>
> I tried to simply change the XPATH of the match on line 25 to include
> /@columns for example but it does not get anything to solr.
>
> Any pointers would be greatly appreciated.
>
> Cheers,
> Shane
>
>
>
> On Sun, Jun 5, 2016 at 12:34 PM, Jared Whiklo <jwh...@gmail.com
> <mailto:jwh...@gmail.com>> wrote:
>
> If you can provide a sample XML and your current stylesheet I can
> try to help more.
>
> Cheers,
> Jared
>
> On 3 Jun 2016 12:35 pm, "Shane McCarthy" <smcc...@upei.ca
> <mailto:islandora%2Bunsu...@googlegroups.com>
> > <mailto:islandora+...@googlegroups.com
> <mailto:islandora%2Bunsu...@googlegroups.com>>.
> jwh...@gmail.com <mailto:jwh...@gmail.com>
> --------------------------------------------------
> You know you're from Winnipeg when...You use a down
> comforter in the summer.
>
> --
> For more information about using this group, please read our
> Listserv Guidelines:
> http://islandora.ca/content/welcome-islandora-listserv
> ---
> You received this message because you are subscribed to the
> Google Groups "islandora" group.
> To unsubscribe from this group and stop receiving emails
> from it, send an email to
> islandora+...@googlegroups.com
> <mailto:islandora%2Bunsu...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/islandora.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/islandora/2417c420-b2cc-01f6-5f87-e553a9cfbb2a%40gmail.com.
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> For more information about using this group, please read our
> Listserv Guidelines:
> http://islandora.ca/content/welcome-islandora-listserv
> ---
> You received this message because you are subscribed to the
> Google Groups "islandora" group.
> To unsubscribe from this group and stop receiving emails from
> it, send an email to islandora+...@googlegroups.com
> <mailto:islandora+...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/islandora.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/islandora/CAAhmLgNVr3pXRoHX9W4DcN7WSbBBNrMwKLBOz6tTtc9zTZ4Cow%40mail.gmail.com
> <https://groups.google.com/d/msgid/islandora/CAAhmLgNVr3pXRoHX9W4DcN7WSbBBNrMwKLBOz6tTtc9zTZ4Cow%40mail.gmail.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.
>
> --
> For more information about using this group, please read our
> Listserv Guidelines:
> http://islandora.ca/content/welcome-islandora-listserv
> ---
> You received this message because you are subscribed to the Google
> Groups "islandora" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to islandora+...@googlegroups.com
> <mailto:islandora+...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/islandora.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/islandora/CADOUazZfKssi8b_3f6OtKHSa%2BbuL1tou6vTavPFtS_QZZ%3D7o-A%40mail.gmail.com
> <https://groups.google.com/d/msgid/islandora/CADOUazZfKssi8b_3f6OtKHSa%2BbuL1tou6vTavPFtS_QZZ%3D7o-A%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> For more information about using this group, please read our Listserv
> Guidelines: http://islandora.ca/content/welcome-islandora-listserv
> ---
> You received this message because you are subscribed to the Google
> Groups "islandora" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to islandora+...@googlegroups.com
> <mailto:islandora+...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/islandora.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/islandora/CAAhmLgOFkpFvRh8ZUui%2Btb1z%3DG2XT%3D0yest0VgXCBQPBsfjcoQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/islandora/CAAhmLgOFkpFvRh8ZUui%2Btb1z%3DG2XT%3D0yest0VgXCBQPBsfjcoQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
Jared Whiklo
jwh...@gmail.com
--------------------------------------------------
You are validating my inherent mistrust of strangers.
sample.xslt
signature.asc

Shane McCarthy

unread,
Jun 6, 2016, 2:46:09 PM6/6/16
to isla...@googlegroups.com
Thanks again Jared.  I am sure there are ample things I will learn by looking at the stylesheet you have created.

Cheers,
Shane

To unsubscribe from this group and stop receiving emails from it, send an email to islandora+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages