Matija Kovač:
Hi,
I’m building a Python app around the Okapi Framework in order to be able to implement it in my project.
I set up a script to use Tikal for converting files into XLF first, only to realize that the output of the conversion escapes the “<“ character on some nested elements, but not all.
I thought it might just be the issue with Tikal, so I added Rainbow to my app too, but the output is the same \(in hindsight, of course it is, they’re both using the same OpenXML filter utility\).
So in my test .docx file I have some inline formatting to test the conversion process, like some of the words are bold, italic etc.
As expected, this results in embedded subelements of the <source> element in the translation units. All good so far.
However, it looks like the <run> element is always returned as “`<run1>`", with the left “<“ escaped, while the right is not.
I can of course fix this in python when storing the file, but then it refuses to merge back again.
I also tried with HTML files, where the escaped tags appear in the <sup> element, since there is no <run> element.
`<source xml:lang="en">This text is <bpt id="1"><b></bpt>bold<ept id="1"></b></ept>.</source>`
This inconsistent handling of embedded subelements in the XML structure, or rather their tag representation is of course leading to issues down the line when it comes to positioning tags through the MT process, as well as other issues.
Is this expected behavior, and if yes - may I know why and how am I expected to deal with this.
If not, how can it be fixed?
Here’s an example of my code, using Rainbow with the TranslationKitCreation:
```python
def rainbow_convert_to_xlf():
if 'file' not in request.files:
return jsonify({'error': 'No file part in the request'}), 400
file = request.files['file']
if file.filename == '':
return jsonify({'error': 'No file selected'}), 400
if not is_extension_supported(file.filename):
return jsonify({'error': 'Unsupported file extension'}), 400
source_lang = request.form.get('source_lang', 'en') # Default to English if not provided
target_lang = request.form.get('target_lang', 'en') # Default to English if not provided
folder_name = str(uuid.uuid4())
upload_folder_path = os.path.join(current_app.config['UPLOAD_FOLDER'], folder_name)
os.makedirs(upload_folder_path, exist_ok=True)
input_file_path = os.path.join(upload_folder_path, file.filename)
file.save(input_file_path)
xlf_filename = file.filename + '.xlf'
output_file_path = os.path.join(upload_folder_path, xlf_filename)
rainbow_command = [
'./rainbow/rainbow.sh',
'-x', 'TranslationKitCreation',
'-sl', source_lang,
'-tl', target_lang,
input_file_path,
'-o', output_file_path
]
try:
result = subprocess.run(rainbow_command, check=True, capture_output=True, text=True)
print("Rainbow Command Output:", result.stdout) # Debug output stdout
print("Rainbow Command Error:", result.stderr) # Debug output stderr
return jsonify({
'message': 'File converted to XLF successfully',
'xlf_file_url': f'http://{request.host}/get/{folder_name}/{xlf_filename}'
}), 200
except subprocess.CalledProcessError as e:
current_app.logger.error(f"Rainbow CLI failed: {e.stderr}")
return jsonify({
'error': 'Failed to convert file',
'message': e.stderr
}), 500
```
Many thanks for building all these amazing tools and for helping me out with this issue.