Hi,
I'm English speaker so if possible please help me answer by English.
Could you guys please explain more detail about configuration of tokenizer in example :
"tokenizer": {
"seunjeon_default_tokenizer": {
"type": "seunjeon_tokenizer",
"index_eojeol": false,
"user_words": ["낄끼+빠빠,-100", "c\\+\\+", "어그로", "버카충", "abc마트"]
}
What use of "index_eojeol", "user_words", "pos_tagging" params ?
Can I skip these configs when setup tokenizer and does it affect to accuracy of tokenizing process ?
Thank you guys in advance!