* Lionel English [2010-05-24 16:03]:
> There is a "rule" in the GCD that comics should be indexed in their
> original language, but it's always also been an option to index the
> comic in English, [...]
> Does this present a problem in terms of translation?
Well, of course: if you want to present the users a piece of text as
German when they ask for that language, you have to know it's actually
German ;-) So, we either need a way to explicitly tag text in the DB
with language information, or some set of conventions to tell reliably
the language in the absense of tagging.
> Are our (theoretical) tools smart enough to realize that the existing
> text is actually in English rather than French?
There is existing code that can identify the language from the
statistical properties of the text. I've bookmarked TextCat
(
http://odur.let.rug.nl/~vannoord/TextCat/) which also has a Python
implementation. But we don't want to always try to identify the
language, we should do it once when we reach the stage when we offer the
option of having the same index in multiple languages, and thereafter
have the indexers specify what language they're working in.
Probably the language an indexer is using should be specified implicitly
but with a way to override when needed - they shouldn't be constantly be
asked for this when 99% of the time they'll have some personal
preferences. For example, everything in English, or everything in
English except for Greek books where I always use Greek, or maybe for
Greek books I want to enter synopses and notes both in Greek and
English (the last would be nice for me personally - I don't know how
many others would want to do that though). Always having the option of
English is good, by the way, since it's the current international
language...
> For Types and Genres, I know there are some genres (I'm thinking
> specifically of manga) where the original language genre name is used in
> other countries, rather than a translated name. For example, yaoi manga are
> more commonly known in the US as yaoi rather than as ... "gay love stories
Right - another example is the specifically Italian genre of "giallo" (a
mix of crime, mystery and horror), or the French "polar" (a mix of
police and noir). I don't think this will be much of a problem if we
implement multilingual Types and Genres. We can just display the
language requested by the user, with a fall-back to the book's language
or to English.
For example, for a common genre like, say, "Superhero": Let's say I'm an
English speaker and I end up in an Italian book where the Italian
indexer specified "supereroi" - no problem, it's displayed in English
because at some point we specified these as the same genre in two
different languages.
Now let's say I'm still browsing in English, and I view an Italian book
where the indexer specified the genre "giallo", and there is no English
version of this genre. I'll get the original Italian word - it might be
a bit confusing, but there's no great harm done, and I might just Google
it and learn its meaning.
Now, let's say I'm browsing in Greek, and I end up in the index of a
Japanese manga where the Japanese indexer selected "やおい" (yaoi) as
the genre. Furthermore, let's say the database contains "Yaoi" as the
English translation of this genre, but no Greek translation.
We have two options: show the original Japanese genre, or fall back to
the English translation as more readable to an international audience.
I'm not sure what's the best course... Browsers can be configured to
send an "Accept-Language" header, by which they specify a list of
languages to the server in order of preference. So, if I've configured
my browser correctly to ask preferably for Greek, then English, then
French, the site can deduce what to do. I'm not sure we want to honour
this preference by field in this case, though - it seems like it would
take an awful lot of coding work for not much gain.
An example with creator names: Vasilis Lolos has published books with
US publishers, where he was credited like this, but also with Greek
publishers, where he was credited with the native version of his name
("Βασίλης Λώλος"). Should we always show his name as it appeared on the
book, or always in the preferred language of the user? And what if
someone is using the site in German? Do we display the English or the
Greek version? Maybe we should always ask indexers to enter a latin
transliteration of the name when entering a new creator whose name isn't
written in the latin alphabet...
I think we should revisit these questions when and if we implement
multilingual display for those fields, after seeing if it's a problem in
practice.
Alexandros