Solr fields - if indexing dynamicField, is copyField really needed?

1,339 views
Skip to first unread message

bgil...@pitt.edu

unread,
Feb 19, 2018, 12:22:05 PM2/19/18
to islandora
I am hoping to reduce the number of fields that we index in Solr and am wondering if the easiest thing to do might be to remove the section of <copyField *> from the schema.xml.

Specifically, the possibility of having Solr do extra work seems likely.  For example, the case with the "*_ms" fields below.  What is the functional difference between these two commands?
   <dynamicField name="*_ms"  type="string"  indexed="true"  stored="true" multiValued="true"/>
   <copyField source="*_s" dest="*_ms"/>

From what I am reading these copyField commands are really duplicating fields that are already indexed, but under a different suffix.  The one copyField I believe we'd need to keep would be the amalgamation copyField for "catch_all_fields_mt".

<copyField source="*_s" dest="*_mlt"/>
<copyField source="*_s" dest="*_t"/>
<copyField source="*_s" dest="*_ms"/>
<copyField source="*_s" dest="*_mt"/>
<copyField source="*_s" dest="*_ss"/>
<copyField source="*_ms" dest="*_mt"/>
<copyField source="*_dt" dest="*_mt"/>
<copyField source="*_dt" dest="*_mdt"/>
<copyField source="*_mdt" dest="*_mt"/>


Could we safely drop the above section of copyField commands from our schema.xml??

Any help / understanding is greatly appreciated.

Thank you, 

Brian Gillingham

University of Pittsburgh | University Library System



Jared Whiklo

unread,
Feb 20, 2018, 9:32:47 AM2/20/18
to isla...@googlegroups.com
Hey Brian,

It really is a question of what fields you use, what fields you search
and how your indexing XSLTs look.

Normally the default has copyFields from *_s (string) fields to *_t
(text) fields because the text fields are tokenized and a therefore
better to search, but if you try to display them to the user they are
all cut up (or tokenized) so you use the string fields.

However things like copying the *_s (single string) to a *_ms
(multi-valued string) really doesn't make a whole lot of sense to me.

As for the question about
> What is
>> the functional difference between these two commands?
>> <dynamicField name="*_ms" type="string" indexed="true"
>> stored="true" multiValued="true"/>
>> <copyField source="*_s" dest="*_ms"/>

The first is defining a field, any field Solr is sent that ends with _ms
will be assigned the string fieldType (you can find the definition of
this in the schema.xml too (search for name="string").

Indexed means you can search on this field, stored means you can get
this field back in a result and multiValued means you can define more
than one value for the field.

The second (copyField) says every time we get a field that ends with _s
(ie. title_s) also make another field with _ms (ie. title_ms). As I said
above this particular situation doesn't really make sense to me.

So you can certainly drop some of those copyFields especially if you
never access the fields.

The other thing to save index space is to not "store" fields you aren't
going to display to a user. I don't "store" any *_t fields, so I have a
OCR_t field which I can search with (using OCR_t:"the text to search")
but that field is not returned as part of the record. This saves some space.

cheers,
jared
> --
> For more information about using this group, please read our Listserv
> Guidelines: http://islandora.ca/content/welcome-islandora-listserv
> ---
> You received this message because you are subscribed to the Google
> Groups "islandora" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to islandora+...@googlegroups.com
> <mailto:islandora+...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/islandora.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/islandora/ca23ef2f-1c0f-4deb-889a-89564c4005df%40googlegroups.com
> <https://groups.google.com/d/msgid/islandora/ca23ef2f-1c0f-4deb-889a-89564c4005df%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
Jared Whiklo
jwh...@gmail.com
--------------------------------------------------
At least I have a positive attitude about my destructive habits.

signature.asc

Rosie Le Faive

unread,
Feb 21, 2018, 5:43:39 PM2/21/18
to islandora
However things like copying the *_s (single string) to a *_ms 
(multi-valued string) really doesn't make a whole lot of sense to me. 

I know why we do this!

The slurp_all_mods.xslt that a lot of us use (thanks DGI!) takes the first instance of any given field and puts it in a single valued field, and puts the rest in the same-named multi-valued field. For example, if you have three authors alice, bob, and carly, then the document that's sent to solr by gsearch contains:

MODS_complicatedfieldname_s: alice
MODS_complicatedfieldname_ms: bob
MODS_complicatedfieldname_ms: carly

It expects that within solr, all _s fields get copied into their _ms fields, resulting in your solr containing:

MODS_complicatedfieldname_s: alice
MODS_complicatedfieldname_ms: alice
MODS_complicatedfieldname_ms: bob
MODS_complicatedfieldname_ms: carly

Now you can configure Islandora to display the MODS_complicatedfieldname_ms field and it'll show all your authors not just bob and carly. But you can also... sort! on the first instance of any string field, which might be arbitrary, but sorting is a feature that clients like (and they're often ok with it being "sort on the first one"). 

Reply all
Reply to author
Forward
0 new messages