stopwords in config file

4 views
Skip to first unread message

Peter MacDonald

unread,
Nov 3, 2008, 5:10:11 PM11/3/08
to xtf-...@googlegroups.com
I have just gotten started with XTF and want to index some TEI P5 files. I set up the following config file, but when I run it I am getting an error regarding stopwords. My "stopwords.txt" file is just a simple list of words.

Does anyone know why I am getting the java exception error listed below mentioning a stopwords problem?

<?xml version="1.0" encoding="utf-8"?>
<textIndexer-config>
 
    <index name="cw">
        <src path="./data/cw"/>
        <db path="./index"/>
        <chunk size="200" overlap="20"/>
        <docselector path="./style/textIndexer/docSelector.xsl"/>
    <stopwords path="./data/cw/stop-words.txt"/>
        <pluralmap path="./conf/pluralFolding/pluralMap.txt.gz"/>
        <accentmap path="./conf/accentFolding/accentMap.txt"/>
        <spellcheck createDict="yes"/>
    </index>
 
</textIndexer-config>

[ERROR MESSAGE FOLLOWS]

*** Error: class java.lang.RuntimeException
java.lang.RuntimeException: Index stop words (a an and are as at be but by for if in into is it no not of on or s such t that the their then there these they this to was will with) doesn't match config (a an and are as at be but by for if in into is it no not of on or s such t that the their then there these they this to was will with
)
        at org.cdlib.xtf.textIndexer.XMLTextProcessor.open(XMLTextProcessor.java:559)
        at org.cdlib.xtf.textIndexer.SrcTreeProcessor.open(SrcTreeProcessor.java:150)
        at org.cdlib.xtf.textIndexer.TextIndexer.main(TextIndexer.java:328)

Thanks for any help,
Peter

Peter MacDonald
general: peterma...@pobox.com
fun: hebri...@gmail.com
work: pmac...@hamilton.edu

Martin Haye

unread,
Nov 3, 2008, 4:13:33 PM11/3/08
to xtf-...@googlegroups.com
Hi Peter,

This happens if you change the stop-word list and then try to do an incremental index run. If you run a “-clean” index the error should go away.

--Martin
Reply all
Reply to author
Forward
0 new messages