OpenXML filter still extracts “Picture 1” labels with excludeGraphicMetadata=true (1.48.0-SNAPSHOT)

14 views
Skip to first unread message

Allan Kazakov

unread,
Jul 15, 2025, 9:46:08 AMJul 15
to okapi-users
Hi Okapi team,

I’m testing 2.1.48.0-SNAPSHOT. My goal is to suppress all hidden picture
labels (“Picture 1”, “filename.jpg”).

My `openxml-slim.fprm` (ParameterString format):
```
  useCodeFinder=true
  joinSimilarRuns=true
  simplifyCodes=true
  excludeGraphicMetadata=true
  extractDocPr=false
  extractDrawingText=false
```
Command:
```
  tikal -x -sl en -tl ru ^
        -fc okf_o...@openxml-slim.fprm ^
        sample.docx
```
`sample.docx.xlf` still contains:

  <source>Picture 1</source> …
  <source>f1_car_placeholder.jpg</source>

Is there a different key I should use (e.g. translateGraphicMetadata=false)?
Bug? Regression?  Any hint appreciated.

Minimal DOCX + .fprm attached.

Thanks!
— Allan
Formula1_Stylish_Document_With_Image_And_Table (1).docx
Formula1_Stylish_Document_With_Image_And_Table.docx (1).xlf

Chase Tingley

unread,
Jul 20, 2025, 3:59:51 PMJul 20
to Allan Kazakov, okapi-users
Hi Allan,

Using the latest 1.48.0-SNAPSHOT (built locally from the main branch), I was able to get this to work by setting 
bPreferenceTranslateWordExcludeGraphicMetaData.b=true
in the config (or by checking "Exclude Graphical Metadata" on the Word tab in the config in Rainbow).

I've attached a sample config that works for me, along with its output.  Note that there is still one TU containing the words "Picture 1", but this is from the caption in the document (which is plain text), as opposed to the second instance of "Picture 1" in the drawing metadata.



--
You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/okapi-users/152db85f-07f5-4906-bc4a-76911bee3c6bn%40googlegroups.com.
okf_openxml@allan.fprm
allan.docx.xlf
Message has been deleted

Allan Kazakov

unread,
Jul 23, 2025, 1:57:08 AMJul 23
to okapi-users

Thanks a lot, Chase—I really appreciate your help, that solution worked perfectly.

Now I'm trying to figure out how to exclude other metadata from being parsed, like the document's author, which is inside the <source> tag and gets picked up by my parser. I attempted to use OpenAI's o3 search capabilities to handle this, and while the search itself is powerful, it still didn’t help me filter out the metadata correctly.

If you happen to know of any resources where I can find more information on Okapi config options, I’d be very grateful! 🙏

Chase Tingley

unread,
Jul 23, 2025, 3:29:26 PMJul 23
to Allan Kazakov, okapi-users
Hi Allan,

Set bPreferenceTranslateDocProperties.b to false (this is equivalent to unchecking "Translate Document Properties" in the General Options).

Updated config and output attached.



allan.docx.xlf
okf_openxml@allan.fprm
Reply all
Reply to author
Forward
0 new messages