Apologies for all the updates but I have made some interesting discoveries over the past week which may help resolve this issue (I may have reached the limit of my abilities with this one):
mysql_charset in config.php
- mysql_charset needs to be set to on in the config.php before database creation for multibyte characters to be stored correctly as utf-8.
- If I comment out mysql_charset before starting staticsync, although umlauts are replaced by � in web browser, as soon as I re-activate mysql_charset in config, it now display umlauts correctly! To summarise, mysql_charset set to utf-8 breaks staticsync and truncates when it encounters an umlaut. By commenting out it before and 'un-commenting' after, files are imported (ingest set to false) correctly albeit without previews.
ghostscript and staticsync
- pdf previews are generated when uploading files by java upload because the file path is scrambled. However when files are imported via static sync it breaks when it encounters an umlaut. I am assuming the problems lie with RS (or perhaps, my installation of RS) as Ghostscript is supposed to support UTF-8 (or at least that's what I read; I have not tested this).
Comparing the debug output, with mysql_charset="UTF8" set to on vs off, it breaks at the following points:
- SQL: select file_path value from resource where ref='1' etc
- SQL: update resource set preview_tweaks (i.e. just before GS is supposed to start)
- it seems to think file source is: filestore/1_[etc]/1_[etc].pdf (i.e. that the file has been ingested, when in actual fact there is nothing in that folder)
With mysql_charset commented out, GS breaks when trying to create previews, e.g.:
- 'PDF multi page preview [....] /var/www/Analysis/hpihh/dradio.pdf', even though it recognises that: 'file source is var/www/Analysis/hpäihh/dradio.pdf'
I was wondering if anyone could shed any light on what may be happening here?
Many Thanks,
Maria