Konopas for Swedish translators' conference

23 views
Skip to first unread message

Karl-Johan Norén

unread,
Feb 2, 2014, 4:13:45 PM2/2/14
to konop...@googlegroups.com
I'm currently involved in a non-fannish con, more specifically the
yearly conference of the Swedish Association of Professional Translators.

My early efforts can be seen here (development/test location ONLY), which
are built on Konopas 0.5.0:

<http://www.norensoversattningar.se/konopas/>

I'm using Google Docs as the backend.

Yes, it has been localised to Swedish :-) Translating the HTML page
was no great hassle, but I found that I had to dive into the app.js
file as well to make a fully localised version, which is far from ideal.
The way the strings are built up also makes localisation very hard, with
lots of sentence fragments merged together. Given the current architecture,
I believe a translation to a non-Germanic language to be next to
impossible.

I also have trouble with the search function. I can search for exactly
two participants (Per Gustavsson and Patrik Hadenius), but the search only
finds some of their items. Does anyone have any clue about what could cause
this? (Given the size of the conference, search is though hardly needed.)

I get an error parsing the name ranges using the KonOpas configurator
as well, with <http://www.norensoversattningar.se/konopas/data/people.js>
as the source. Could the use of mnemonic IDs (initials of the name,
instead of digits) be the source of this?

I've also looked into putting images of some of our participants into
the guide, but with no luck so far. All I get is a link when using an
absolute URL; a relative URL provides nothing at all. I have drawn the
conclusion (from looking at the Finncon programme) that getting this
will require editing app.js, meaning I'm stumped on my own.

Overall, I managed to get things together reasonably quickly, and the
other committee members have been quite impressed.

Cheers,
Karl-Johan

--
Karl-Johan Norén karl-...@norensoversattningar.se
Noréns översättningar http://www.norensoversattningar.se
Sjöåkravägen 40C 036-377 201
SE-564 31 Bankeryd +46(0)36-377 201
SWEDEN

Eemeli Aro

unread,
Feb 3, 2014, 5:16:20 PM2/3/14
to konop...@googlegroups.com
On 2 February 2014 23:13, Karl-Johan Norén
<karl-...@norensoversattningar.se> wrote:
> Yes, it has been localised to Swedish :-) Translating the HTML page
> was no great hassle, but I found that I had to dive into the app.js
> file as well to make a fully localised version, which is far from ideal.
> The way the strings are built up also makes localisation very hard, with
> lots of sentence fragments merged together. Given the current architecture,
> I believe a translation to a non-Germanic language to be next to
> impossible.

You're right, I haven't given all that much thought (yet) to
localization---and I really should. However, as you found, especially
for the "natural language" output used in the filter summary, that's
likely to require an entirely separate generator for different
languages.

Would you be willing to share your modified app.js, so I could look
into doing this? It'd be much easier with two languages to work with,
rather than just one...

> I also have trouble with the search function. I can search for exactly
> two participants (Per Gustavsson and Patrik Hadenius), but the search only
> finds some of their items. Does anyone have any clue about what could cause
> this? (Given the size of the conference, search is though hardly needed.)

That's a bug, actually. Thank you for pointing it out! I've attached a
one-line patch for 0.5.0 that should fix it, and I'll release a proper
0.5.1 or 0.6.0 soonish to point to the current master branch. There
was a pile of stuff I did in prep for Arisia that ought to be
released.

> I get an error parsing the name ranges using the KonOpas configurator
> as well, with <http://www.norensoversattningar.se/konopas/data/people.js>
> as the source. Could the use of mnemonic IDs (initials of the name,
> instead of digits) be the source of this?

Nope, that was me not accounting for the case of having fewer
participants in people.js than the "names per screenful" figure, which
got handled incorrectly. Fixed now:
http://konopas.org/util/config#http://www.norensoversattningar.se/konopas/data/people.js

> I've also looked into putting images of some of our participants into
> the guide, but with no luck so far. All I get is a link when using an
> absolute URL; a relative URL provides nothing at all. I have drawn the
> conclusion (from looking at the Finncon programme) that getting this
> will require editing app.js, meaning I'm stumped on my own.

The second attached patch should enable inline images for 0.5.0, as
well as disabling the image URL cleanup that was breaking relative
paths for them. I'd had to disable inline images for LoneStarCon,
which was the last dataset I've worked with which was supposed to have
images. However, that data never got cleaned up to a state where they
were useful, so I had to disable it. And never remembered to re-enable
it...

> Overall, I managed to get things together reasonably quickly, and the
> other committee members have been quite impressed.

Excellent!

eemeli
konopas_0.5.0_people_query.patch
konopas_0.5.0_part_images.patch

Karl-Johan Norén

unread,
Feb 3, 2014, 5:56:28 PM2/3/14
to konop...@googlegroups.com
3 feb 2014 kl. 23:16 skrev Eemeli Aro <eem...@gmail.com>:

> You're right, I haven't given all that much thought (yet) to
> localization---and I really should. However, as you found, especially
> for the "natural language" output used in the filter summary, that's
> likely to require an entirely separate generator for different
> languages.

The alternative would have to have separate language files, either served
separately, integrated through a preprocessor, or simply concatenated with
the other javascript files.

> Would you be willing to share your modified app.js, so I could look
> into doing this? It'd be much easier with two languages to work with,
> rather than just one...

Sure. Attached.

One example where your chosen method of building strings show themselves
is with plural forms. In English, plurals are almost always formed by
adding "s". But in Swedish we have:

1 timme - 2 timmar

Some languages make things even more complex, by having a different plural
for 2 compared to other numbers, but that's a bridge to cross later.

Generally speaking, once you start using building sentences out of
fragments (variables are fine), localisation becomes much much harder and
will at times become impossible.

[ snip ]

Thanks for the fixes!
app-sv.js.zip

Eemeli Aro

unread,
Feb 9, 2014, 7:31:42 PM2/9/14
to konop...@googlegroups.com

I think I've figured out a decent solution for internationalizing KonOpas, and e.g. http://dev.konopas.org now mostly speaks Swedish. The source file for that translation is http://dev.konopas.org/i18n/sv.json, which is preprocessed into http://dev.konopas.org/src/i18n.js. I have an English alternative as well, but figured the Swedish would be better for effect. :)

The markup language used is ICU MessageFormat, for which I found a great preprocessor (https://github.com/SlexAxton/messageformat.js). Which I've now hacked at quite a bit this weekend so the output is as clean as you see above; the changes aren't merged into the official repo yet, but you can get my latest version here: https://github.com/eemeli/messageformat.js/tree/functions

I picked MessageFormat because it seems both simple and powerful. It handles three different variable substitution patterns, using VAR as an example variable:

{VAR} gets replaced with the string value of the variable.

{VAR, select, a{first} bee{some thing} other{silly {VAR}-ness}} is pretty much a switch statement. Note the unlimited recursion and repetition.

{VAR, plural, =0{nada} one{just one} other {plenty, # in fact}} assumes that it's getting a number, and maps the choices according to locale-specific rules about how plural forms are handled (the output construction can be different for different languages). # gets replaced by the number itself.

I'll see if I can put up an online instance of the pre- processor under konopas.org/util/.

Eemeli

--
You received this message because you are subscribed to the Google Groups "KonOpas Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to konopas-dev...@googlegroups.com.
To post to this group, send email to konop...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Karl-Johan Norén

unread,
Feb 10, 2014, 6:19:58 AM2/10/14
to konop...@googlegroups.com
10 feb 2014 kl. 01:31 skrev Eemeli Aro <eem...@gmail.com>:

> I think I've figured out a decent solution for internationalizing KonOpas, and e.g. http://dev.konopas.org now mostly speaks Swedish. The source file for that translation is http://dev.konopas.org/i18n/sv.json, which is preprocessed into http://dev.konopas.org/src/i18n.js. I have an English alternative as well, but figured the Swedish would be better for effect. :)

Looks doable, and having all the strings available at once is great (in fact,
it points me at some strings I missed in my localisation). Main issue I see
right now is that the sv.json file contains some strings that are missing
from en.json.

Speaking as a translator, I think I'd also prefer to have the strings split
between the block tags (<p>, <div>, and so on). Especially if one uses a CAT
tool, having several levels of interpretation always causes trouble - it's
our equivalent to binary blobs.

I assume the i18n.js file is concatenated into konopas.min.js on deployment.

> I'll see if I can put up an online instance of the pre- processor under konopas.org/util/.

Would be great!

Eemeli Aro

unread,
Feb 10, 2014, 6:42:24 AM2/10/14
to konop...@googlegroups.com
On 10 February 2014 13:19, Karl-Johan Norén
<karl-...@norensoversattningar.se> wrote:
> 10 feb 2014 kl. 01:31 skrev Eemeli Aro <eem...@gmail.com>:
>
>> I think I've figured out a decent solution for internationalizing KonOpas, and e.g. http://dev.konopas.org now mostly speaks Swedish. The source file for that translation is http://dev.konopas.org/i18n/sv.json, which is preprocessed into http://dev.konopas.org/src/i18n.js. I have an English alternative as well, but figured the Swedish would be better for effect. :)
>
> Looks doable, and having all the strings available at once is great (in fact,
> it points me at some strings I missed in my localisation). Main issue I see
> right now is that the sv.json file contains some strings that are missing
> from en.json.

All the missing Swedish translations should have a ---prefix to make
them findable. The shorter strings are missing from the english
version because I'm using a wrapper that returns the key if no
translation is found, and it's shorter this way. I'll have to
obviously document this, but do you see it as a problem for actual
implementations? I could also provide a version with the complete set
of strings in English, in parallel with this one.

As an alternative, I could keep the full set in the JSON files, and
strip them out during the pre-compilation. However, that'd be a
further breaking change from how messageformat.js currently works, so
I'd rather get it fixed at the source before doing so.

> Speaking as a translator, I think I'd also prefer to have the strings split
> between the block tags (<p>, <div>, and so on). Especially if one uses a CAT
> tool, having several levels of interpretation always causes trouble - it's
> our equivalent to binary blobs.

Do you mean the way star_export and star_import are currently defined,
or also the mess that filter_sum turned into? The first ones are easy
to fix, fixing the latter would mean splitting the sentence
construction between MessageFormat and JavaScript, which I'd prefer
avoiding.

Also, as I'm no translator, how bad is MessageFormat compared to, say,
gettext, or other alternatives? Do you know of any tools that would be
useful for working with the file format used here?

> I assume the i18n.js file is concatenated into konopas.min.js on deployment.

Yes. If you're doing this yourself, the build.sh script now includes
an option for compiling the JSON into JS (./build -l sv), and includes
the i18n.js file when minifying (./build.sh -j).

eemeli

Gareth Kavanagh

unread,
Feb 10, 2014, 6:49:21 AM2/10/14
to konop...@googlegroups.com
A Quick question, How are you handling Sorts for the characters that do not exist in the English set?

I noticed previously that if you are sorting or splitting on say Ó that it basically vanishes from the list.

Gareth


Eemeli Aro

unread,
Feb 10, 2014, 7:17:22 AM2/10/14
to konop...@googlegroups.com
On 10 February 2014 13:49, Gareth Kavanagh <omeg...@gmail.com> wrote:
> A Quick question, How are you handling Sorts for the characters that do not
> exist in the English set?
>
> I noticed previously that if you are sorting or splitting on say Ó that it
> basically vanishes from the list.

Ah. You're right, it does. The sorting of names is currently by char
code value; A is 65, Z is 90, and Ó is 211. What's happening therefore
isn't about the characters being non-English, it's about their
alphabetic order not matching their char code order.

I'll look into this, and probably switch to using localeCompare()
instead, but may need to consider older browsers carefully as well
since it's a rather recent feature.

eemeli

Gareth Kavanagh

unread,
Feb 10, 2014, 7:25:27 AM2/10/14
to konop...@googlegroups.com
Thanks,

It was one of the few bugs i found, and then forgot all about until this language thing came up.



eemeli

Karl-Johan Norén

unread,
Feb 10, 2014, 7:56:58 AM2/10/14
to konop...@googlegroups.com
10 feb 2014 kl. 12:42 skrev Eemeli Aro <eem...@gmail.com>:

> All the missing Swedish translations should have a ---prefix to make
> them findable. The shorter strings are missing from the english
> version because I'm using a wrapper that returns the key if no
> translation is found, and it's shorter this way. I'll have to
> obviously document this, but do you see it as a problem for actual
> implementations? I could also provide a version with the complete set
> of strings in English, in parallel with this one.

Well, any localiser will probably start from the English files, and if that
one is missing strings, that's cause for much swearing and complaining to
the developer (says I as an experienced software localiser).

Having a "default" smaller English file for local builds is OK, but there
needs to be a canonical full set of strings for the localiser to work
with.

It also makes it easier to put things into CAT tools, if the language
files are of a uniform type with the English text exposed.

Ie the following are good:

"some string": "some string"
"key": "keyed string"

The following is not so good:

"some string": ""
"key": keyed string"

>> Speaking as a translator, I think I'd also prefer to have the strings split
>> between the block tags (<p>, <div>, and so on). Especially if one uses a CAT
>> tool, having several levels of interpretation always causes trouble - it's
>> our equivalent to binary blobs.
>
> Do you mean the way star_export and star_import are currently defined,
> or also the mess that filter_sum turned into? The first ones are easy
> to fix, fixing the latter would mean splitting the sentence
> construction between MessageFormat and JavaScript, which I'd prefer
> avoiding.

It was star_export and star_import I was thinking of.

filter_sum is hard to parse, but I think that one is best taken care of
with an example or two.

> Also, as I'm no translator, how bad is MessageFormat compared to, say,
> gettext, or other alternatives? Do you know of any tools that would be
> useful for working with the file format used here?

gettext might have an edge since there are more and more mature tools
for .po files, but most modern CAT tools can easily parse sane string
files, as long as they're uniform.

Eemeli Aro

unread,
Feb 10, 2014, 10:38:05 AM2/10/14
to konop...@googlegroups.com
On 10 February 2014 14:56, Karl-Johan Norén
<karl-...@norensoversattningar.se> wrote:
> 10 feb 2014 kl. 12:42 skrev Eemeli Aro <eem...@gmail.com>:
>
>> All the missing Swedish translations should have a ---prefix to make
>> them findable. The shorter strings are missing from the english
>> version because I'm using a wrapper that returns the key if no
>> translation is found, and it's shorter this way. I'll have to
>> obviously document this, but do you see it as a problem for actual
>> implementations? I could also provide a version with the complete set
>> of strings in English, in parallel with this one.
>
> Well, any localiser will probably start from the English files, and if that
> one is missing strings, that's cause for much swearing and complaining to
> the developer (says I as an experienced software localiser).
>
> Having a "default" smaller English file for local builds is OK, but there
> needs to be a canonical full set of strings for the localiser to work
> with.
>
> It also makes it easier to put things into CAT tools, if the language
> files are of a uniform type with the English text exposed.
>
> Ie the following are good:
>
> "some string": "some string"
> "key": "keyed string"
>
> The following is not so good:
>
> "some string": ""
> "key": keyed string"

I put together the tool I mentioned, it's now here:

http://konopas.org/util/i18n/

That page also links to a full/canonical English-languge version of
en.json, which should have all the strings with the full English texts
included.

>>> Speaking as a translator, I think I'd also prefer to have the strings split
>>> between the block tags (<p>, <div>, and so on). Especially if one uses a CAT
>>> tool, having several levels of interpretation always causes trouble - it's
>>> our equivalent to binary blobs.
>>
>> Do you mean the way star_export and star_import are currently defined,
>> or also the mess that filter_sum turned into? The first ones are easy
>> to fix, fixing the latter would mean splitting the sentence
>> construction between MessageFormat and JavaScript, which I'd prefer
>> avoiding.
>
> It was star_export and star_import I was thinking of.

I cleaned up almost all tags from the JSON, and split a few of the
more complex rules into parts. Had to leave most of the <b> and <a>
tags in, but they should be pretty easy to parse. star_import and
star_export in particular are now five separate strings.

> filter_sum is hard to parse, but I think that one is best taken care of
> with an example or two.

Yeah, that single string is the main reason I went with this type of
i18n solution, given its complexity. :) If jquery supported line
breaks in strings, I could enter it like this:

"filter_sum":
"Listing { N, plural,
one { one {TAG} item }
other { {ALL} # {TAG} items }
} { GOT_DAY, select,
true { on {DAY} }
other {}
} { GOT_AREA, select,
true { in {AREA} }
other {}
} { GOT_Q, select,
true { matching the query {Q} }
other {}
}"

Or even more simply if messageformat.js supported some kind of
selector for empty strings, but it doesn't.

>> Also, as I'm no translator, how bad is MessageFormat compared to, say,
>> gettext, or other alternatives? Do you know of any tools that would be
>> useful for working with the file format used here?
>
> gettext might have an edge since there are more and more mature tools
> for .po files, but most modern CAT tools can easily parse sane string
> files, as long as they're uniform.

Interesting; this isn't a field I've ever done anything in before.
Could you point me to some representative tool that I could play
around with, that runs on Linux?

eemeli

Karl-Johan Norén

unread,
Feb 10, 2014, 11:43:02 AM2/10/14
to konop...@googlegroups.com
10 feb 2014 kl. 16:38 skrev Eemeli Aro <eem...@gmail.com>:

> I put together the tool I mentioned, it's now here:
> http://konopas.org/util/i18n/

Looks good!

> I cleaned up almost all tags from the JSON, and split a few of the
> more complex rules into parts. Had to leave most of the <b> and <a>
> tags in, but they should be pretty easy to parse. star_import and
> star_export in particular are now five separate strings.

Yeah, those should be included. Basic rule is to expose the inline
elements, while the block elements are either stripped out, or
limited to a single block.

>> gettext might have an edge since there are more and more mature tools
>> for .po files, but most modern CAT tools can easily parse sane string
>> files, as long as they're uniform.
>
> Interesting; this isn't a field I've ever done anything in before.
> Could you point me to some representative tool that I could play
> around with, that runs on Linux?

Take a look at OmegaT. It's open source and written in Java. Wordfast
is the most lightweight of the commercial tools, and its demo mode
is quite full-featured (only limit is 500 translation segments in a
single TM). It's also available on Linux (and a web version).

Most translators and localisers I know don't work with OmegaT, but it
has the same basic working mechanism as commercial CAT tools - determine
translatable text, divide the text into segments, match untranslated
segments against a translation memory, and help the translator to
keep track of terminology and earlier translations.

Eemeli Aro

unread,
Feb 14, 2014, 3:28:42 AM2/14/14
to konop...@googlegroups.com
On 10 February 2014 18:43, Karl-Johan Norén
<karl-...@norensoversattningar.se> wrote:
> 10 feb 2014 kl. 16:38 skrev Eemeli Aro <eem...@gmail.com>:
>>> gettext might have an edge since there are more and more mature tools
>>> for .po files, but most modern CAT tools can easily parse sane string
>>> files, as long as they're uniform.
>>
>> Interesting; this isn't a field I've ever done anything in before.
>> Could you point me to some representative tool that I could play
>> around with, that runs on Linux?
>
> Take a look at OmegaT. It's open source and written in Java. Wordfast
> is the most lightweight of the commercial tools, and its demo mode
> is quite full-featured (only limit is 500 translation segments in a
> single TM). It's also available on Linux (and a web version).
>
> Most translators and localisers I know don't work with OmegaT, but it
> has the same basic working mechanism as commercial CAT tools - determine
> translatable text, divide the text into segments, match untranslated
> segments against a translation memory, and help the translator to
> keep track of terminology and earlier translations.

I played around a bit with OmegaT, and at least out of the box it
wasn't very good at reading messageformat.js JSON files. OTOH, the
interface seems a bit over-complex and I may have just missed
something. In any case, I added a couple of scripts for converting
between the JSON and gettext .PO-like files here:
https://github.com/eemeli/konopas/tree/master/i18n

json2po and po2json are simple sed scripts, and can only understand
msgid/msgstr .PO pairs. Both take stdin or a filename as input, and
write to stdout. The MessageFormat {variables} are left as they are in
the msgstr values, but at least that file format should be easier for
any standard software to segment. Are they potentially useful, or just
silly?

Also, thank you Karl-Johan for the Swedish translation, I added the
full set of strings you sent.

eemeli
Reply all
Reply to author
Forward
0 new messages