How to extract separate styling codes for a text run with multiple styles in docx file

35 views
Skip to first unread message

Doris Wong

unread,
Jul 27, 2023, 1:14:55 PM7/27/23
to okapi-users
Hi all, 

I am using the Okapi OpenXML filter to parse docx files for translation and merge the translated result back to docx. 
I notice that when a text run has multiple stylings applied, e.g. both bold and italic, the extracted text will be surrounded in a single opening/closing code pair, e.g. 
   <code type="x:bold;italic">text text text</code type="x:bold;italic">

However, this poses some difficulty for the translation process as the translators may want to handle the style differently depending on the target language, e.g. they may want to keep the bold style but remove the italic style. Thus it is preferable to have separate code for each style, e.g.
  <code type="bold"><code type="italic">text text text</code type="italic"></code type="bold">

Is this possible to achieve? What is the appropriate way to do it? Thanks a lot!


Menghan

Chase Tingley

unread,
Jul 27, 2023, 5:01:17 PM7/27/23
to Doris Wong, okapi-users
Hi,

Unfortunately, this isn't supported. Removing codes is not guaranteed to work when merging the target file, which is part of the problem. Also, for DOCX in particular, the style model is quite complicated -- "x:bold;italic" may indicate that bold and italic styles were applied separately in the source document, but it could also mean that a named style with bold and italic properties present was applied a single piece of formatting.  So handling all of those case would require quite complex markup manipulation on export, which we have never implemented.

--
You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-users/5be9c4d3-1f27-4481-a29d-c1fc977bd44an%40googlegroups.com.

Fredrik Dahlén

unread,
Jul 27, 2023, 5:58:39 PM7/27/23
to okapi-users
Hello.
We also have a need for this functionality (at least very related) and asked about it January.
The translators want to be able to modify and add/or delete the common text decorations.

My colleague Oscar created an issue together with @Jim Hargrave:
It was added to the 1.46 milestone.
So fingers crossed.

If there is anything I can do to get the ball rolling, please do not hesitate to ask.
I'm willing to contribute (if I can) but would need some guidance.

Thanks!
Fredrik

Marc Mittag

unread,
Jul 28, 2023, 7:53:57 AM7/28/23
to okapi...@googlegroups.com

Hi together,

we needed also the used Fonts to be reflected. So those will also be in 1.46, developed by Denis :-). Thank you Denis!

best

Marc

Reply all
Reply to author
Forward
0 new messages