형태소 분석을 공부중인 학생입니다. 한가지 질문드려도 될까요?

122 views
Skip to first unread message

yang...@gmail.com

unread,
Oct 1, 2018, 1:32:34 AM10/1/18
to open-korean-text
/src/main/scala/org/openkoreantext/processor/tokenizer/ParsedChunk.scala 에서 스코어링을 하는 과정에 대하여 설명된 자료가 있을까요?

트위터P/좋A/지E = 2.50
트위터P/좋A/지V = 2.80 와 같은 점수가 어떤 것에 의해서 매겨진 점수인지 궁금합니다.

가령 "트위터/P"의 점수를 계산한다면 countTokens가 1, countUnknowns가 0 등등 이므로 (1 * 0.18 + 0 * 0.3 + ...) = 1.17 이 되는 과정을 자세히 알고 싶습니다.

스칼라에 익숙하지 않아서 이렇게 도움을 요청해봅니다.

Hohyon Ryu

unread,
Oct 1, 2018, 4:44:34 PM10/1/18
to open-kor...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "open-korean-text" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-korean-te...@googlegroups.com.
To post to this group, send email to open-kor...@googlegroups.com.
Visit this group at https://groups.google.com/group/open-korean-text.
To view this discussion on the web visit https://groups.google.com/d/msgid/open-korean-text/760d263f-0cdf-4396-b86b-b859d02a2499%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Will Hohyon Ryu
유호현
Software Engineer at Airbnb

실리콘밸리를 그리다: https://brunch.co.kr/magazine/svillustrated
Message has been deleted

yang...@gmail.com

unread,
Oct 7, 2018, 9:19:49 PM10/7/18
to open-korean-text
ParsedChunk.scala에서 스코어링하는 과정을 출력하기위해 아래와 같이 조금 수정해보았는데 NoSuchElementException: head of empty list가 발생합니다.

isInitialPostPosition 함수를 실행하면서 발생하는 것 같은데 단순히 표준출력함수를 추가한것이 문제가 될 수 있나요?

만약 그렇다면 어떻게 수정하면 좋을지 조언 부탁드립니다.


case class ParsedChunk(posNodes: Seq[KoreanToken], words: Int,
profile: TokenizerProfile = TokenizerProfile.defaultProfile) {

// Using lazy val to cache the score
lazy val score = countTokens * profile.tokenCount +
countUnknowns * profile.unknown +
words * profile.wordCount +
getUnknownCoverage * profile.unknownCoverage +
getFreqScore * profile.freq +
countPos(Unknown) * profile.unknownPosCount +
isExactMatch * profile.exactMatch +
isAllNouns * profile.allNoun +
isPreferredPattern * profile.preferredPattern +
countPos(Determiner) * profile.determinerPosCount +
countPos(Exclamation) * profile.exclamationPosCount +
isInitialPostPosition * profile.initialPostPosition +
isNounHa * profile.haVerb +
hasSpaceOutOfGuide * profile.spaceGuidePenalty

println("countTokens: " + countTokens + ", multiplier: " + profile.tokenCount)
println("countUnknowns: " + countUnknowns + ", multiplier: " + profile.unknown)
println("words: " + words + ", multiplier: " + profile.wordCount)
println("getUnknownCoverage: " + getUnknownCoverage + ", multiplier: " + profile.unknownCoverage)
println("getFreqScore: " + getFreqScore + ", multiplier: " + profile.freq)
println("countPos(Unknown): " + countPos(Unknown) + ", multiplier: " + profile.unknownPosCount)
println("isExactMatch: " + isExactMatch + ", multiplier: " + profile.exactMatch)
println("isAllNouns: " + isAllNouns + ", multiplier: " + profile.allNoun)
println("isPreferredPattern: " + isPreferredPattern + ", multiplier: " + profile.preferredPattern)
println("countPos(Determiner): " + countPos(Determiner) + ", multiplier: " + profile.determinerPosCount)
println("countPos(Exclamation): " + countPos(Exclamation) + ", multiplier: " + profile.exclamationPosCount)
println("isInitialPostPosition: " + isInitialPostPosition + ", multiplier: " + profile.initialPostPosition)
println("isNounHa: " + isNounHa + ", multiplier: " + profile.haVerb)
println("hasSpaceOutOfGuide: " + hasSpaceOutOfGuide + ", multiplier: " + profile.spaceGuidePenalty)
println("final score: " + score)

Hohyon Ryu

unread,
Oct 7, 2018, 11:51:25 PM10/7/18
to open-kor...@googlegroups.com
Branch를 push해서 브랜치 이름을 알려주세요.
--
You received this message because you are subscribed to the Google Groups "open-korean-text" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-korean-te...@googlegroups.com.
To post to this group, send email to open-kor...@googlegroups.com.
Visit this group at https://groups.google.com/group/open-korean-text.

For more options, visit https://groups.google.com/d/optout.

yang...@gmail.com

unread,
Oct 8, 2018, 2:01:37 AM10/8/18
to open-korean-text
그쪽은 이미 늦은 시간일텐데 답변 감사합니다.

말씀하신대로 브랜치로 올렸습니다.

https://github.com/yanguun94/open-korean-text/tree/question_scoring

Hohyon Ryu

unread,
Oct 8, 2018, 2:45:56 AM10/8/18
to open-kor...@googlegroups.com
ㅋㅋ 지금 한국에 있습니다. :)
--
You received this message because you are subscribed to the Google Groups "open-korean-text" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-korean-te...@googlegroups.com.
To post to this group, send email to open-kor...@googlegroups.com.
Visit this group at https://groups.google.com/group/open-korean-text.

For more options, visit https://groups.google.com/d/optout.

yang...@gmail.com

unread,
Oct 9, 2018, 6:10:58 PM10/9/18
to open-korean-text
이것저것 하다보니 해결한 것 같습니다. 맨 처음 lazy val인 score를 부르기 전에 빈 배열을 가지고 실행한 것이 문제였습니다. 배열이 nonEmpty일때만 점수 출력을 실행하도록 하니 돌아가네요.

친절한 답변과 관심 감사합니다. 좋은 하루 되시길 바랍니다.

Hohyon Ryu

unread,
Oct 9, 2018, 6:41:13 PM10/9/18
to open-kor...@googlegroups.com
아 네 다행이네요. 한국 일정이 바쁘다보니 컴퓨터 앞에 앉을 시간도 없었어요. 죄송합니다. 행복한 하루 되세요!
On Wed, Oct 10, 2018 at 7:11 AM <yang...@gmail.com> wrote:
이것저것 하다보니 해결한 것 같습니다. 맨 처음 lazy val인 score를 부르기 전에 빈 배열을 가지고 실행한 것이 문제였습니다. 배열이 nonEmpty일때만 점수 출력을 실행하도록 하니 돌아가네요.

친절한 답변과 관심 감사합니다. 좋은 하루 되시길 바랍니다.

--
You received this message because you are subscribed to the Google Groups "open-korean-text" group.
To unsubscribe from this group and stop receiving emails from it, send an email to open-korean-te...@googlegroups.com.
To post to this group, send email to open-kor...@googlegroups.com.
Visit this group at https://groups.google.com/group/open-korean-text.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages