Help needed with table and sentence segmentation in Okapi tikal.

25 views
Skip to first unread message

Abdullah Khan

unread,
Oct 22, 2025, 4:46:46 AMOct 22
to okapi-users
Hello Okapi community,

I'm working on my personal project of college and facing segmentation issues with complex tables, sentences and names. The SRX rules break names and content incorrectly within table cells.

**Examples of the problem:**
- "Mr. John D. Smith" breaks into 3 segments
- Table descriptions with periods segment incorrectly  
- Complex row/column structures get fragmented

I've tried custom SRX rules , XLF File modification but tables remain problematic. Has anyone successfully handled complex table segmentation? Any guidance on pre-processing approaches or best practices would be greatly appreciated.

I'm using Okapi Tikal version 2.1.42.0 with custom SRX rules.

Thank you in advance for your help!

Best regards,
Abdullah Khan
defaultSegmentation.srx

Marc Mittag

unread,
Oct 22, 2025, 6:02:50 AMOct 22
to okapi...@googlegroups.com

Hi Abdullah,

I guess, you are not using the segmentation rules for English, where Mr. should be defined as an abbreviation.

So either you should use the one for English, or take over the corresponding rule in the one you use.

hope that helps

best

Marc

Abdullah Khan --
You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/okapi-users/67fa2f8a-6ea4-4364-9b50-c55f2866691an%40googlegroups.com.

Abdullah Khan

unread,
Oct 22, 2025, 6:08:42 AMOct 22
to okapi-users
Hi Marc,
I am using for the English and i am basically using for the parsing segments and then share to my LLM, but it does not able to parse the segments and it breaks at any point and does not works in the table data. I had also attached that file of rules please help me to fix this issue if possible i am phasing it from many days.

Thanks,
Abdullah Khan

Sergei Vasilyev

unread,
Oct 22, 2025, 6:29:14 AMOct 22
to Abdullah Khan, okapi-users
SA Abdullah,

Attached is a widely used SRX file (LGPL) from the languagetools project, should help.

Best,
Sergei

alternate-default.srx

Abdullah Khan

unread,
Oct 22, 2025, 6:34:39 AMOct 22
to okapi-users
WA Sergei,

Thank u so much for sharing but i am beginner can we connect so we can fix please connect with me if possible.

Thanks
Abdullah Khan
https://www.linkedin.com/in/abdullahkhanspn
abdullah...@gmail.com
Reply all
Reply to author
Forward
0 new messages