Denis helped us with one use case for charts. But this seems to be a deep rabbit hole if you want to fully localize a chart in PPTX. These are from a financial domain so pretty complicated. I found several other cases of cached values like numCache which can have currency values that need conversions etc..
What do you think about recursive extraction of embedded spreadsheets? Also the formatCode is more of an internationalized string format like ICU - we can extract it but translators may not understand how to localize it.
From our internal discussion:
These
charts normally reference spreadsheets (sometimes external
sometimes embedded). openxml does something clever. It caches
spreadsheet values in the PPTX. Denis and I decided it would
be easier to extract the cache values vs trying to find the
original spreadsheets. These spreadsheets have priority over
the cached values. So there will be a mismatch. If I
understand correctly, if the localized PPTX file with
translated cache values finds the original spreadsheet then
the cached values will be overwritten. So we will need to
localize the spreadsheet as well.There
is another even more complicated case. Sometimes I see that
PPTX files have embedded spreadsheets (not external).We
have also found that chart cached values use a formatting
string (like ICU) which can contain language specific content
(ex: <c:formatCode>#,##0"億円"</c:formatCode>
)!
On top of the difficulty of tracking down cached values vs
spreadsheets we have an internationalization problem.Attached
are some example files and screenshots showing what is
missing.
Thoughts?
--
You received this message because you are subscribed to the Google Groups "okapi-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-devel...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/okapi-devel/3ee48b02-7077-4ef2-80e4-649583074c15%40gmail.com.
Right, totally agree. But if I understand correctly, if the spreadsheets are embedded when the document is loaded they will overwrite the cache values. We may have no choice - hum could we remove the embedded spreadsheets as an option?
For documents without an embedded spreadsheet we have no choice but to extract the cache values and hope the users don't use the original, external spreadsheet
Then there is the formatCode (basically formatted string with placeholders like ICU) - we must extract those as well - but perhaps the filter can add a comment with specific instructions
J
To view this discussion visit https://groups.google.com/d/msgid/okapi-devel/CAGRYq4gA0GfMvTBSBc-Oh3Qk7cfrMAWzaXDa4GzdPQapw_eKTA%40mail.gmail.com.
Since openxml is growing in importance for us I will try to tackle this PR myself (guidance from Denis) to get some experience with openxml. Here's what I propose. I will create an issue as well.
Questions: