Can't leverage nor export Localizable.strings file

69 views
Skip to first unread message

Jose Palomares

unread,
Feb 22, 2012, 2:58:54 PM2/22/12
to iLocalize
Hi all,

I hope someone may be able to throw some light into this issue I have.
I got a new bundle for an ongoing project and after updating from it,
I found my previous localizable.strings file was not leveraged at all,
having now a 100% of the strings to be translated again. No way (we
are talking about over 1300 strings in that file only).

Thing is that I checked the new localizable.strings file against the
old one, and I get really weird diffs. If I use some diff tool, it
tells me that the files are totally different, but what I see with my
bare eyes is that only some strings were changed/added. It's like the
font or the characters themselves were different, but you know, this
is plain text.

Then, I tried to export the new file as XLIFF (also tried .strings) to
see how much leverage I could get through a CAT tool and I found that
I cannot even export the file, I get an "Invalid character 0x0
detected in string "X"" error per each line in the file.

So my question is, do you have any idea what could be wrong with this
file? I've never seen this kind of issue before.

Thanks in advance,

Jose

Karl-Johan Norén

unread,
Feb 22, 2012, 3:06:24 PM2/22/12
to iloc...@googlegroups.com
22 feb 2012 kl. 20.58 skrev Jose Palomares:

> I hope someone may be able to throw some light into this issue I have.
> I got a new bundle for an ongoing project and after updating from it,
> I found my previous localizable.strings file was not leveraged at all,
> having now a 100% of the strings to be translated again. No way (we
> are talking about over 1300 strings in that file only).
>
> Thing is that I checked the new localizable.strings file against the
> old one, and I get really weird diffs. If I use some diff tool, it
> tells me that the files are totally different, but what I see with my
> bare eyes is that only some strings were changed/added. It's like the
> font or the characters themselves were different, but you know, this
> is plain text.
>
> Then, I tried to export the new file as XLIFF (also tried .strings) to
> see how much leverage I could get through a CAT tool and I found that
> I cannot even export the file, I get an "Invalid character 0x0
> detected in string "X"" error per each line in the file.


Hard to make any educated guess without having access to the files, but
might it be an improperly detected character encoding?

As a starting point, I'd probably start with checking the original
Localizable.strings files, ie directly from the application bundle
without involving iLocalize at all. Open them in TextWrangler or
equivalent, and check which encoding they use.

--
Karl-Johan Norén karl-...@norensoversattningar.se
Noréns översättningar http://www.norensoversattningar.se
Box 334 070-324 92 05
SE-331 23 Värnamo +46(0)70-324 92 05
SWEDEN

Jose Palomares

unread,
Feb 22, 2012, 3:45:42 PM2/22/12
to iLocalize
Thank you Karl-Johan. I know it's difficult to say without the actual
file, I appreciate the try though :)

That was my first guess, but Wrangler claims it to be just regular
UTF-8 file (old version was UTF-16) and TextEdit confirms so when I
open it 'forcing' UTF-8 and it works perfect. And if I change the
encoding to something else (such as MacOs Roman), I still get the same
result.

I just uploaded a screenshot of a comparison between the old (right)
and the new (left) files here: http://www.openmints.com/tempo/weird.png

It's really strange that the comparison results (bottom of the image)
show that all characters are different from the very beginning, but
the file it's 99% the same.

Jose



On Feb 22, 12:06 pm, Karl-Johan Norén <karl-

Karl-Johan Norén

unread,
Feb 22, 2012, 4:50:03 PM2/22/12
to iloc...@googlegroups.com
22 feb 2012 kl. 21.45 skrev Jose Palomares:

> That was my first guess, but Wrangler claims it to be just regular
> UTF-8 file (old version was UTF-16) and TextEdit confirms so when I
> open it 'forcing' UTF-8 and it works perfect. And if I change the
> encoding to something else (such as MacOs Roman), I still get the same
> result.

Some things to check:

* Does a BOM make any difference?
* Show invisibles? (I once had a bug in AppleWorks that was due to
AppleGlot inserting a null byte in a string. I only found it in
ResEdit after I set it to show the length and hex values of every
string.)
* Ask the customer how the new file was generated.

You could maybe go around the problem by exporting a TMX from the
old localised version, and then using that to translate the new file
directly in a CAT tool, but if there is something structural with the
new file then it might cause trouble later on in the process in
iLocalize.

Looking closer at the diff, I note the following:

232,632 / 72,883 / 1,398
112,100 / 7,499 / 1,346

Ie, the new file has more than twice as many characters than the old
one, and ten times as many words. I think the UTF-16 to UTF-8 conversion
went seriously awry somewhere, with plenty of invisible characters
included.

Jose Palomares

unread,
Feb 22, 2012, 8:13:56 PM2/22/12
to iLocalize
Hi again,

I'm glad to say that I sorted it out thanks to your suggestion. Thanks
so much for your help!

I had tried BOM, which wouldn't make any difference and using TMX
wasn't a good option, as there were multiple translations for the same
source segments. And of course I asked the client, but you know, such
things take time :)

I suspected that maybe the spaces had been corrupted somehow, but you
got it right about the null byte thing. The difference of length in
characters was caused exactly by that, by null bytes. When I turned on
invisibles on the file (banging my head against the wall for not
trying before), this is what I got:

http://www.openmints.com/tempo/gremlins.png

The file was really messed. Fortunately, TextWrangler has this cool
feature called 'Zap gremlins' and I could fix the file in a click.

Again, thank you very much for your help!!

Best,

Jose



On Feb 22, 1:50 pm, Karl-Johan Norén <karl-

Ulf Dunkel

unread,
Feb 23, 2012, 5:24:24 AM2/23/12
to iloc...@googlegroups.com
Hi José.

> That was my first guess, but Wrangler claims it to be just regular
> UTF-8 file (old version was UTF-16) and TextEdit confirms so when I
> open it 'forcing' UTF-8 and it works perfect. And if I change the
> encoding to something else (such as MacOs Roman), I still get the same
> result.

Please do NEVER EVER use TextEdit for editing pure text files. You'd
always be better off with TextWrangler. :-)


> I just uploaded a screenshot of a comparison between the old (right)
> and the new (left) files here: http://www.openmints.com/tempo/weird.png
>
> It's really strange that the comparison results (bottom of the image)
> show that all characters are different from the very beginning, but
> the file it's 99% the same.

No, they are 99% different, because the new file is UTF-8, while the old
file is UTF-16. That's all, but that is important.

HTH,
---UlfDunkel

Ulf Dunkel

unread,
Feb 23, 2012, 5:25:45 AM2/23/12
to iloc...@googlegroups.com
Hi José.

> I suspected that maybe the spaces had been corrupted somehow, but you
> got it right about the null byte thing. The difference of length in
> characters was caused exactly by that, by null bytes. When I turned on
> invisibles on the file (banging my head against the wall for not
> trying before), this is what I got:
>
> http://www.openmints.com/tempo/gremlins.png
>
> The file was really messed. Fortunately, TextWrangler has this cool
> feature called 'Zap gremlins' and I could fix the file in a click.

The gremlins screenshot shows two things:
- It is a UTF-16 coded file, opened in UTF-8 mode.
- You should start investigating UTF to better understand the difference.

You'll make that - you're not stupid! :-)

---UlfDunkel

Jose Palomares

unread,
Feb 28, 2012, 5:24:01 AM2/28/12
to iLocalize
Hi Ulf,

Thanks for the insightful additions. I meant that I used TextEdit to
see what encoding was detected automatically on the file. I would
never use it for anything else than RTFD documents, TextWrangler is my
lover :)

May I ask how can you tell the difference between UTF-8 and UTF-16 in
the picture? I quite understand the basic differences the two, but
couldn't tell from the picture. Did you mean that each question mark
may indicate an unresolved double byte character?

Again, thanks for your help. It's great to be back in the group and be
able to learn from you, guys.

Best,
Jose

Ulf Dunkel

unread,
Feb 29, 2012, 9:10:08 AM2/29/12
to iloc...@googlegroups.com
Hi Jose.

Well, when you open an UTF-16 file in an editor which expects UTF-8 or
any ANSI encoding, it will show the Highbyte of the UTF-16 as ?
character or crap.

- - - - -

--
Bis bald / See you soon / A bientôt / Tot ziens / Ĝis revido

invers Software & DSD.net, Inh. Ulf Dunkel
dun...@icalamus.net - www.icalamus.net

iCalamus. Publish your ideas.

Pflichtangaben gem. § 37a HGB:
Sitz des Unternehmens: DE-49624 Löningen

Reply all
Reply to author
Forward
0 new messages