> I hope someone may be able to throw some light into this issue I have.
> I got a new bundle for an ongoing project and after updating from it,
> I found my previous localizable.strings file was not leveraged at all,
> having now a 100% of the strings to be translated again. No way (we
> are talking about over 1300 strings in that file only).
>
> Thing is that I checked the new localizable.strings file against the
> old one, and I get really weird diffs. If I use some diff tool, it
> tells me that the files are totally different, but what I see with my
> bare eyes is that only some strings were changed/added. It's like the
> font or the characters themselves were different, but you know, this
> is plain text.
>
> Then, I tried to export the new file as XLIFF (also tried .strings) to
> see how much leverage I could get through a CAT tool and I found that
> I cannot even export the file, I get an "Invalid character 0x0
> detected in string "X"" error per each line in the file.
Hard to make any educated guess without having access to the files, but
might it be an improperly detected character encoding?
As a starting point, I'd probably start with checking the original
Localizable.strings files, ie directly from the application bundle
without involving iLocalize at all. Open them in TextWrangler or
equivalent, and check which encoding they use.
--
Karl-Johan Norén karl-...@norensoversattningar.se
Noréns översättningar http://www.norensoversattningar.se
Box 334 070-324 92 05
SE-331 23 Värnamo +46(0)70-324 92 05
SWEDEN
> That was my first guess, but Wrangler claims it to be just regular
> UTF-8 file (old version was UTF-16) and TextEdit confirms so when I
> open it 'forcing' UTF-8 and it works perfect. And if I change the
> encoding to something else (such as MacOs Roman), I still get the same
> result.
Some things to check:
* Does a BOM make any difference?
* Show invisibles? (I once had a bug in AppleWorks that was due to
AppleGlot inserting a null byte in a string. I only found it in
ResEdit after I set it to show the length and hex values of every
string.)
* Ask the customer how the new file was generated.
You could maybe go around the problem by exporting a TMX from the
old localised version, and then using that to translate the new file
directly in a CAT tool, but if there is something structural with the
new file then it might cause trouble later on in the process in
iLocalize.
Looking closer at the diff, I note the following:
232,632 / 72,883 / 1,398
112,100 / 7,499 / 1,346
Ie, the new file has more than twice as many characters than the old
one, and ten times as many words. I think the UTF-16 to UTF-8 conversion
went seriously awry somewhere, with plenty of invisible characters
included.
> That was my first guess, but Wrangler claims it to be just regular
> UTF-8 file (old version was UTF-16) and TextEdit confirms so when I
> open it 'forcing' UTF-8 and it works perfect. And if I change the
> encoding to something else (such as MacOs Roman), I still get the same
> result.
Please do NEVER EVER use TextEdit for editing pure text files. You'd
always be better off with TextWrangler. :-)
> I just uploaded a screenshot of a comparison between the old (right)
> and the new (left) files here: http://www.openmints.com/tempo/weird.png
>
> It's really strange that the comparison results (bottom of the image)
> show that all characters are different from the very beginning, but
> the file it's 99% the same.
No, they are 99% different, because the new file is UTF-8, while the old
file is UTF-16. That's all, but that is important.
HTH,
---UlfDunkel
> I suspected that maybe the spaces had been corrupted somehow, but you
> got it right about the null byte thing. The difference of length in
> characters was caused exactly by that, by null bytes. When I turned on
> invisibles on the file (banging my head against the wall for not
> trying before), this is what I got:
>
> http://www.openmints.com/tempo/gremlins.png
>
> The file was really messed. Fortunately, TextWrangler has this cool
> feature called 'Zap gremlins' and I could fix the file in a click.
The gremlins screenshot shows two things:
- It is a UTF-16 coded file, opened in UTF-8 mode.
- You should start investigating UTF to better understand the difference.
You'll make that - you're not stupid! :-)
---UlfDunkel
Well, when you open an UTF-16 file in an editor which expects UTF-8 or
any ANSI encoding, it will show the Highbyte of the UTF-16 as ?
character or crap.
- - - - -
--
Bis bald / See you soon / A bientôt / Tot ziens / Ĝis revido
invers Software & DSD.net, Inh. Ulf Dunkel
dun...@icalamus.net - www.icalamus.net
iCalamus. Publish your ideas.
Pflichtangaben gem. § 37a HGB:
Sitz des Unternehmens: DE-49624 Löningen