사용자 사전 사용시 단어 추가 문의 (apache solr 7.x 버전 사용)

456 views

Skip to first unread message

Shin Sungwoo

unread,

Jan 23, 2019, 12:52:29 AM1/23/19

to 은전한닢 프로젝트

안녕하세요. 가입승인 감사드립니다.

새해복 많이받으세요~

apache solr 버전업 과정 중에, seunjeon이 공식 라이브러리로 적용된걸 보고 사용해보고 있는데 문의사항이 있어 질문드립니다.

"안전화" 라는 텍스트를 검색 시 "화" 글자가 탈락하는 현상이 있어 이부분을 어떻게 처리해야할지 정확히 모르겠습니다.

예시로 주신 mecab-ko-dic 을 확인해서, 사용자 정의사전을 정의해봤는데도 동일합니다. (https://docs.google.com/spreadsheets/d/1-9blXKjtjeKZqsf4NzHeYJCrr49-nXeRF6D80udfcwY/edit#gid=1718487366)

안전화,0,0,0,NNG,,F,안전화,Compound,*,*,안전/NNG/*+화/XSN/*

아래는 노출 예시입니다.

text

raw_bytes

start

end

positionLength

type

termFrequency

posType

leftPOS

rightPOS

morphemes

reading

position

안전
[ec 95 88 ec a0 84]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


1

화
[ed 99 94]
2
3
1
word
1
MORPHEME
XSN(Noun Suffix)
XSN(Noun Suffix)


2

SGF

text

raw_bytes

start

end

positionLength

type

termFrequency

posType

leftPOS

rightPOS

morphemes

reading

position

안전
[ec 95 88 ec a0 84]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


1

화
[ed 99 94]
2
3
1
word
1
MORPHEME
XSN(Noun Suffix)
XSN(Noun Suffix)


2

CGF

text

raw_bytes

start

end

positionLength

type

termFrequency

posType

leftPOS

rightPOS

morphemes

reading

position

안전
[ec 95 88 ec a0 84]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


1

화
[ed 99 94]
2
3
1
word
1
MORPHEME
XSN(Noun Suffix)
XSN(Noun Suffix)


2

text

raw_bytes

start

end

positionLength

type

termFrequency

posType

leftPOS

rightPOS

morphemes

reading

position

안전
[ec 95 88 ec a0 84]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


1

화
[ed 99 94]
2
3
1
word
1
MORPHEME
XSN(Noun Suffix)
XSN(Noun Suffix)


2

KPOSSF

text

raw_bytes

start

end

positionLength

type

termFrequency

posType

leftPOS

rightPOS

morphemes

reading

position

안전
[ec 95 88 ec a0 84]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


1

KRFF

text

raw_bytes

start

end

positionLength

type

termFrequency

posType

leftPOS

rightPOS

morphemes

reading

position

안전
[ec 95 88 ec a0 84]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


1

LCF

text

raw_bytes

start

end

positionLength

type

termFrequency

posType

leftPOS

rightPOS

morphemes

reading

position

안전
[ec 95 88 ec a0 84]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


1

text

raw_bytes

start

end

positionLength

type

termFrequency

posType

leftPOS

rightPOS

morphemes

reading

position

무궁
[eb ac b4 ea b6 81]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


1

화
[ed 99 94]
2
3
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


2

SGF

text

raw_bytes

start

end

positionLength

type

termFrequency

posType

leftPOS

rightPOS

morphemes

reading

position

무궁
[eb ac b4 ea b6 81]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


1

화
[ed 99 94]
2
3
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


2

CGF

text

raw_bytes

start

end

positionLength

type

termFrequency

posType

leftPOS

rightPOS

morphemes

reading

position

무궁
[eb ac b4 ea b6 81]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


1

화
[ed 99 94]
2
3
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


2

text

raw_bytes

start

end

positionLength

type

termFrequency

posType

leftPOS

rightPOS

morphemes

reading

position

무궁
[eb ac b4 ea b6 81]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


1

화
[ed 99 94]
2
3
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


2

KPOSSF

text

raw_bytes

start

end

positionLength

type

termFrequency

posType

leftPOS

rightPOS

morphemes

reading

position

무궁
[eb ac b4 ea b6 81]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


1

화
[ed 99 94]
2
3
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


2

KRFF

text

raw_bytes

start

end

positionLength

type

termFrequency

posType

leftPOS

rightPOS

morphemes

reading

position

무궁
[eb ac b4 ea b6 81]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


1

화
[ed 99 94]
2
3
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


2

LCF

text

raw_bytes

start

end

positionLength

type

termFrequency

posType

leftPOS

rightPOS

morphemes

reading

position

무궁
[eb ac b4 ea b6 81]
0
2
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


1

화
[ed 99 94]
2
3
1
word
1
MORPHEME
NNG(General Noun)
NNG(General Noun)


2

Shin Sungwoo

unread,

Jan 23, 2019, 2:25:37 AM1/23/19

to 은전한닢 프로젝트

자답입니다..

확인해보니, solr.KoreanPartOfSpeechStopFilterFactory 에서 뒤에 단어를 잘라내고있었습니다;

아래 적용 케이스 샘플로 공유드립니다

<tokenizer class="solr.KoreanTokenizerFactory"

decompoundMode="discard"

userDictionary="../../userDictionary/userDictionary.txt"

userDictionaryEncoding="UTF-8"

</analyzer>

</fieldType>

Reply all

Reply to author

Forward

0 new messages