I just finished a five-month-long, 350,000-character project, during
which I used Felix from start to finish with the aim of maintaining
consistency of terminology. I think of Felix mainly as a tool for
maintaining quality rather than increasing throughput, which is
something that comes mainly from the nonrepetitive nature of most of the
work I do. In this respect, Felix really works wonderfully for me, as it
enables me to go back and dredge up expressions that I translated
some time ago with practically no effort at all.
This means that what I really find most useful about Felix is its
ability to search both translation memories for past collocations and
glossaries for standard vocabulary. One issue I have found, however, is
that using the same search algorithm for both translation memories and
glossaries has its limitations.
It appears to a layman like myself that the way that Felix calculates
the "accuracy" of Japanese language search results is simply by dividing
the total number of characters in the search string by the total number
of characters found to be common with the search string in any given
target string. So, if there are 10 characters in the search string, and
three of those characters are found in the target string, then the
accuracy is 30%.
While I find this approach more or less acceptable for searching
translation memories, I find it wholly inadequate for searching
glossaries for the simple reason that no weight is placed on the
sequence or relative position of the characters within the string, and
using a relatively high accuracy setting parses out far too many
relevant hits, while using a relatively low accuracy allows far too many
spurious hits. The old "feast or famine" syndrome.
What I would like to suggest to improve this situation is that the
search paradigm for glossary items be enhanced so that the user can
select the number of characters to use as a base unit.
You probably already understand what I'm getting at, but just to explain
it more clearly:
At the moment, if the search string is ABCD, the search paradigm
apparently searches for all As, then all Bs, then all Cs, and finally
all Ds, which is to say, it uses a base unit of one character. What I'm
suggesting is that, if the user were to specify two characters as the
base unit, the paradigm should search for all ABs, all BCs, and all CDs.
I don't know if this kind of approach would make any difference when
searching in alphabet-based languages, but I think you will recognize
immediately that this will probably improve accuracy for things like
yoji-jukugo almost immediately, since the four character strings in CJK
languages are quite often parsable as two two-character strings.
Obviously, this doesn't improve relevance for 100% matches, but I think
it will produce significant more relevant results for matches that are
less than 100%, especially those in the 50% or higher range.
I'm sure it would take a little bit of experimentation before it could
be implemented successfully, but if something like this were available
for glossary searches, I think it make the results much more relevant
for matches of less than 100%. I don't know if it would produce any
improvements for longer strings found in translation memories, although
I tend to think it would, especially if different settings could be
provided for kanji and kana. In other words, a base unit of two
characters for kanji words, but a base unit of three or more characters
for hiragana and katakana would probably increase the relevance of
search results in translation memories too.
Also, one more suggestion that is ancillary to the first is to provide
the user with the capability to customize the way that Felix lists
(sorts) the results in the glossary window. Sometimes I find that there
are 100% matches way down at the bottom of a long list of spurious
matches. Allowing the user to sort results by 前方一致、後方一致、or
some other conditions (ala Jamming) would also produce a considerable
improvement in usability.
Anyway, it would be nice if we could have some discussion about
improving search results and how they are displayed.
-----------------------------------------------------------------
Steven P. Venti
Mail: spve...@bhk-limited.com
Songs to Aging Children
http://www.youtube.com/profile?user=spventi&view=playlists
-----------------------------------------------------------------