사용자 사전 사용시 단어 추가 문의 (apache solr 7.x 버전 사용)

454 views
Skip to first unread message

Shin Sungwoo

unread,
Jan 23, 2019, 12:52:29 AM1/23/19
to 은전한닢 프로젝트
안녕하세요. 가입승인 감사드립니다.

새해복 많이받으세요~

apache solr 버전업 과정 중에, seunjeon이 공식 라이브러리로 적용된걸 보고 사용해보고 있는데 문의사항이 있어 질문드립니다.

"안전화" 라는 텍스트를 검색 시 "화" 글자가 탈락하는 현상이 있어 이부분을 어떻게 처리해야할지 정확히 모르겠습니다.

예시로 주신 mecab-ko-dic 을 확인해서, 사용자 정의사전을 정의해봤는데도 동일합니다. (https://docs.google.com/spreadsheets/d/1-9blXKjtjeKZqsf4NzHeYJCrr49-nXeRF6D80udfcwY/edit#gid=1718487366)

안전화,0,0,0,NNG,,F,안전화,Compound,*,*,안전/NNG/*+화/XSN/*

아래는 노출 예시입니다.

KT
text
raw_bytes
start
end
positionLength
type
termFrequency
posType
leftPOS
rightPOS
morphemes
reading
position
안전
[ec 95 88 ec a0 84]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
1
[ed 99 94]
2
3
1
word
1
MORPHEME
XSN(Noun Suffix)
XSN(Noun Suffix)
2
SGF
text
raw_bytes
start
end
positionLength
type
termFrequency
posType
leftPOS
rightPOS
morphemes
reading
position
안전
[ec 95 88 ec a0 84]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
1
[ed 99 94]
2
3
1
word
1
MORPHEME
XSN(Noun Suffix)
XSN(Noun Suffix)
2
CGF
text
raw_bytes
start
end
positionLength
type
termFrequency
posType
leftPOS
rightPOS
morphemes
reading
position
안전
[ec 95 88 ec a0 84]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
1
[ed 99 94]
2
3
1
word
1
MORPHEME
XSN(Noun Suffix)
XSN(Noun Suffix)
2
SF
text
raw_bytes
start
end
positionLength
type
termFrequency
posType
leftPOS
rightPOS
morphemes
reading
position
안전
[ec 95 88 ec a0 84]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
1
[ed 99 94]
2
3
1
word
1
MORPHEME
XSN(Noun Suffix)
XSN(Noun Suffix)
2
KPOSSF
text
raw_bytes
start
end
positionLength
type
termFrequency
posType
leftPOS
rightPOS
morphemes
reading
position
안전
[ec 95 88 ec a0 84]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
1
KRFF
text
raw_bytes
start
end
positionLength
type
termFrequency
posType
leftPOS
rightPOS
morphemes
reading
position
안전
[ec 95 88 ec a0 84]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
1
LCF
text
raw_bytes
start
end
positionLength
type
termFrequency
posType
leftPOS
rightPOS
morphemes
reading
position
안전
[ec 95 88 ec a0 84]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
1



KT
text
raw_bytes
start
end
positionLength
type
termFrequency
posType
leftPOS
rightPOS
morphemes
reading
position
무궁
[eb ac b4 ea b6 81]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
1
[ed 99 94]
2
3
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
2
SGF
text
raw_bytes
start
end
positionLength
type
termFrequency
posType
leftPOS
rightPOS
morphemes
reading
position
무궁
[eb ac b4 ea b6 81]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
1
[ed 99 94]
2
3
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
2
CGF
text
raw_bytes
start
end
positionLength
type
termFrequency
posType
leftPOS
rightPOS
morphemes
reading
position
무궁
[eb ac b4 ea b6 81]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
1
[ed 99 94]
2
3
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
2
SF
text
raw_bytes
start
end
positionLength
type
termFrequency
posType
leftPOS
rightPOS
morphemes
reading
position
무궁
[eb ac b4 ea b6 81]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
1
[ed 99 94]
2
3
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
2
KPOSSF
text
raw_bytes
start
end
positionLength
type
termFrequency
posType
leftPOS
rightPOS
morphemes
reading
position
무궁
[eb ac b4 ea b6 81]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
1
[ed 99 94]
2
3
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
2
KRFF
text
raw_bytes
start
end
positionLength
type
termFrequency
posType
leftPOS
rightPOS
morphemes
reading
position
무궁
[eb ac b4 ea b6 81]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
1
[ed 99 94]
2
3
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
2
LCF
text
raw_bytes
start
end
positionLength
type
termFrequency
posType
leftPOS
rightPOS
morphemes
reading
position
무궁
[eb ac b4 ea b6 81]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
1
[ed 99 94]
2
3
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)
2


Shin Sungwoo

unread,
Jan 23, 2019, 2:25:37 AM1/23/19
to 은전한닢 프로젝트
자답입니다.. 

확인해보니, solr.KoreanPartOfSpeechStopFilterFactory 에서 뒤에 단어를 잘라내고있었습니다;

아래 적용 케이스 샘플로 공유드립니다


<fieldType name="text_ko" class="solr.TextField" >
  <analyzer>
    <!-- decompoundMode: mixed (is keep original term and add all decompounded terms), discard (default, removes the compound form, only keeps the parts), none (no decompounding) -->
    <tokenizer class="solr.KoreanTokenizerFactory"
decompoundMode="discard"
userDictionary="../../userDictionary/userDictionary.txt"
userDictionaryEncoding="UTF-8"
    />
    <filter class="solr.SynonymGraphFilterFactory" synonyms="../../userDictionary/synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.CommonGramsFilterFactory" words="../../userDictionary/stopwords_ko.txt" ignoreCase="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="../../userDictionary/stopwords_ko.txt"/>
    <!-- removes some part of speech stuff like EOMI (Pos.E) -->
    <!--<filter class="solr.KoreanPartOfSpeechStopFilterFactory" />-->
    <!-- Replaces term text with the Hangul transcription of Hanja characters, if applicable: -->
    <filter class="solr.KoreanReadingFormFilterFactory" />
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType>
Reply all
Reply to author
Forward
0 new messages