Hi grantaka36,
I believe it is expected that punctuation will be ignored. The primary purpose of the abstract search is to provide full text query capabilities. Are you trying to look for some other kind of data in the abstracts? You might be interested in
https://github.com/PLOS/allofplos if you want to do more sophisticated analysis of the content.
The abstract is put through the following configuration in solr:
<fieldType name="text" class="solr.TextField" autoGeneratePhraseQueries="true" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.LengthFilterFactory" min="3" max="100"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="0" generateNumberParts="1" stemEnglishPossessive="1" splitOnCaseChange="0" generateWordParts="1" splitOnNumerics="0" catenateAll="0" catenateWords="0"/>
<filter class="solr.FlattenGraphFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.LengthFilterFactory" min="3" max="100"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.WordDelimiterGraphFilterFactory" catenateNumbers="0" generateNumberParts="1" stemEnglishPossessive="1" splitOnCaseChange="0" generateWordParts="1" splitOnNumerics="0" catenateAll="0" catenateWords="0"/>
<filter class="solr.FlattenGraphFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
best, Erik
On Wed, 03 Jul 2019 00:14:32 -0700,
grantaka36 <
grant...@gmail.com> wrote:
>
> [1.1 <text/plain; UTF-8 (quoted-printable)>]
> [1.2 <text/html; UTF-8 (quoted-printable)>]
> --
> You received this message because you are subscribed to the Google Groups "PLOS API Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
plos-api-develo...@googlegroups.com.
> To view this discussion on the web visit
>
https://groups.google.com/d/msgid/plos-api-developers/cc95997d-fef7-4ec8-8465-d03b075ad5bc%40googlegroups.com.
> For more options, visit
https://groups.google.com/d/optout.
🐝 PLOS | OPEN FOR DISCOVERY
🐘 Erik Hetzner | Software Developer
📮 1160 Battery Street, Suite 225, San Francisco, CA 94111
📞 Main
+1 415 624 1200