Code location for Generation of Tags for processing images

19 views
Skip to first unread message

Devesh kumar

unread,
Nov 3, 2021, 4:47:10 AM11/3/21
to okapi-users, okapi...@googlegroups.com
Hi all,

I was going through some debugging steps, but I am unable to understand in which java class or code where these <x>, <g> ,<ex> etc. tags get generated and later get attached to the segments which we see in the xliff file.

The reason being,
 if a file (e.g docx) is having a small image(like mathematical formula) along with the sentence , then in place of these images I see tags like <bx> or <ex> or <g> . I wanted to check if i can somehow create URL out of those image so that it can be previewed in the UI

Can anyone please help me out with this information?

thanks

Regards,
Devesh

Yves Savourel

unread,
Nov 3, 2021, 5:13:40 AM11/3/21
to okapi-users, okapi...@googlegroups.com

Hi Devesh,

 

The tags <x/>, <g></g>, etc. are the representation of the inline codes created by the filters.

 

For example: The filter (e.g. the OpenXmlFilter for docx file) takes an input in docx and extract all the translatable text out of it.

This is done by generating filter events:

One type of events is the TEXTUNIT event. It carries a TextUnit where you have the extracted data of a single paragraph-type chunk of text.

That data is in a TextFragment object made of a coded-text string and an array of Codes.

The coded-text string is the extracted text where each inline code is represented by a pair of special characters. Each of pair of special characters refers to one of the Code object in the list of Codes.

 

When represented in XLIFF format the TextFragment inline codes are represented with the <x/., <g></g> tags.

You may want to read https://okapiframework.org/devguide/gettingstarted.html#textUnits for more details and examples.

 

The step that uses the filter to create the filter events is the RawDocument2FilterEvents step.

 

Now the data associated with the inline codes are in the Code object. What you have available there depends on what the filter for the specific format does.

You should be able to see what available when selecting <bpt>/<ept> output for XLIFF (the original code the filter sees inside those elements).

 

I hope it helps.

-ys

--
You received this message because you are subscribed to the Google Groups "okapi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to okapi-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/okapi-users/CAPSsJVqbc%3Dkme9S8SvNC4wnqVM39Tzk%2B9B6kQy6t0R1oLn7qhw%40mail.gmail.com.

Devesh kumar

unread,
Nov 3, 2021, 5:44:56 AM11/3/21
to Yves Savourel, okapi-users, okapi...@googlegroups.com
thanks Ys, 

that was a great information. I will have a look at the code as per thr info you have provided.  

Devesh kumar

Reply all
Reply to author
Forward
0 new messages