Handling foreign words etc

20 views
Skip to first unread message

Craig Farrow

unread,
Oct 7, 2008, 1:02:11 AM10/7/08
to FieldWorks Language Explorer Discussion
Hi all,

Is there a good way to handle 'words' in an interlinear text that I
don't want to end up in my Lexicon? I'm working on a text right now that
talks about the meaning and etymology of a word. It has various
references to other languages, and even parts of words. Should I just
leave them as unknown, or create a POS category to throw them all into.
These are things I wouldn't ever want to include in a dictionary, but I
want to deal with somehow in the interlinear. How do others handle this?

To give an English transliteration of some of it:

Some people say the word 'gang' comes from the LgX word 'gong'; others
say it comes from 'gandi' in the word Samergandi; others say it comes
from LgY as used in 'kar gang'.

I've run across this before when my name comes up in texts that I'm
interlinerising. Again I feel funny entering that as a lexical entry!

Thanks for your help,

Craig.


Beth B

unread,
Oct 7, 2008, 5:08:04 AM10/7/08
to flex...@googlegroups.com
A proposal has been put forth to have a way to mark words as "not to
be analyzed". Or an alternative would be that anything that is not
in the Writing System that the Baseline is in would also just not be
analyzed (since FLEx doesn't try to analyze more than one language at
a time anyway).

This proposal has been put forth, but it's not clear how many users
want it, or how important users consider it to be.

It would be great to hear (a) do users want such a feature, and how
important is it to your work, and (b) how do others work around it?

-Beth

Eric & Susanne Johnson

unread,
Oct 7, 2008, 5:17:38 AM10/7/08
to flex...@googlegroups.com
Hi Craig,

I've added several other entry type categories besides "main entry" to
handle loanwords:

"Codeswitch to [national language]"
"Loanword from modern [national language"
"Geographic Proper Noun"
"Proper Noun"

"Codeswitch" is for items that come up in texts for which I know that
they do possess indigenous words, and the national language word used
may not be known to less bilingual speakers. For this I check the
"exclude as headword" box also. Should we produce a lexicon, I would
remove all of these.

The "loanword" category is for words that are not ancient loans, and
maybe only partially incorporated into the phonology (we have those as
well) but yet are broadly known and used because there is no indigenous
word. This is often technological words, words dealing with government
or other institutions, or logical conjunctions. These I do not exclude
as headwords and I would probably add these to a lexicon, as their tone
category or pragmatic usage in our language sometimes differs slightly
from that in the nat. lg.

Ancient loanwords I just add as main entries, though noting the probable
origins in the etymology field. (Most of the numbers, for example, are
probably ancient loans from an ancient form of the national language.)

Eric

Eric & Susanne Johnson

unread,
Oct 7, 2008, 5:17:38 AM10/7/08
to flex...@googlegroups.com
Hi Craig,

I've added several other entry type categories besides "main entry" to
handle loanwords:

"Codeswitch to [national language]"
"Loanword from modern [national language"
"Geographic Proper Noun"
"Proper Noun"

"Codeswitch" is for items that come up in texts for which I know that
they do possess indigenous words, and the national language word used
may not be known to less bilingual speakers. For this I check the
"exclude as headword" box also. Should we produce a lexicon, I would
remove all of these.

The "loanword" category is for words that are not ancient loans, and
maybe only partially incorporated into the phonology (we have those as
well) but yet are broadly known and used because there is no indigenous
word. This is often technological words, words dealing with government
or other institutions, or logical conjunctions. These I do not exclude
as headwords and I would probably add these to a lexicon, as their tone
category or pragmatic usage in our language sometimes differs slightly
from that in the nat. lg.

Ancient loanwords I just add as main entries, though noting the probable
origins in the etymology field. (Most of the numbers, for example, are
probably ancient loans from an ancient form of the national language.)

Eric

Ronald Moe

unread,
Oct 7, 2008, 4:33:23 PM10/7/08
to flex...@googlegroups.com
Craig Farrow wrote:
"Is there a good way to handle 'words' in an interlinear text that I
don't want to end up in my Lexicon?"

A couple of other people have responded to this giving some good
suggestions. If it is important to you to be able to gloss these words in
your texts, for instance so that they can be published or used in some other
way, then you will need to add them to your lexicon. But this is just a
temporary solution. Ultimately FLEx should provide a way to exclude them
from the vernacular lexical database. You could do this in Toolbox by
setting up a separate database for them and then telling the interlinearizer
to look in this second database as well as the primary vernacular database.
Unfortunately we don't have this option in FLEx yet, so you either have to
leave the word unanalyzed in your text or add it to your vernacular
database.

The trick is to somehow mark these bogus words so that you can later delete
them. Eric Johnson has suggested setting up extra options in the Entry Type
field. This might be an acceptable solution for partially borrowed words,
but isn't a good solution for truly foreign words that get included in a
text. For instance I might say, "The German word for dog is 'hund'." The
word 'hund' does not belong in an English dictionary. On the other hand,
some German words are working their way into English, but are only partly
assimilated. If you read grammars of Biblical Greek, you will frequently
encounter the German word 'aktionsart'. It is used in these grammars as a
technical term. It violates English spelling rules. So we would say it has
not been totally assimilated. If it was assimilated, it might be spelled
'actionsort'. So we might want to include 'aktionsart' in an English
dictionary, but mark it as a 'partial loan' or something like that. But we
wouldn't want 'hund' in the dictionary.

So we need a temporary solution for words like 'hund' until FLEx gives us a
better permanent solution. I would suggest that you use the DDP domain 9.8
'Unclassified and miscellaneous words' as a temporary home for such words.
Add them to your lexicon, classify them under domain 9.8, and then make a
note to yourself to delete them later. You should also mark them using the
"Exclude as Headword" field so they don't inadvertently get published. If
you are already using domain 9.8, you could add another domain 9.9 'Foreign
words in texts'. Later you can filter for all the words in this domain and
delete them.

Just an interesting side note-- In Biblical Greek dictionaries you can find
words like 'marana' and 'tha' that are not Greek words, but are Aramaic
words. The New Testament contains a few Aramaic quotes and for some odd
reason the Aramaic words in these quotes are listed in the Greek
dictionaries right alongside the Greek words. The same is true of all the
Hebrew names that occur in the New Testament. This really makes it fun for
me when I'm analyzing Greek phonology or morphology and these words are
mixed in with Greek phonological patterns and inflectional endings. So I
would really like a way to exclude words in texts from being added to the
lexicon.

Ron Moe


Hi all,

Thanks for your help,

Craig.


No virus found in this incoming message.
Checked by AVG - http://www.avg.com
Version: 8.0.173 / Virus Database: 270.7.6/1712 - Release Date: 10/7/2008
9:41 AM

pcun...@gmail.com

unread,
Oct 7, 2008, 10:08:54 PM10/7/08
to FLEx list
The language I'm working with is smattered (is that a word? Google
spell check doesn't think so . . . ) with borrowings from the /lingua
franca/, Solomon Islands Pijin. I've left them unanalysed but have not
been really satisfied with that; they leave big gaps. I really don't
want them cluttering up my lexicon, though. Here's what I think would
be neat: analyse the words, add them to the lexicon as "loan
words" (and configure the lexicon *not* to print/export "loan words"
by default), and tag them in interlinear view/print/export in some
readily identifiable way. The tagging would be to help someone who
decided to look at my text collection in the future and might not know
SI Pijin well enough to identify them by sight. This isn't real
important to my work (i.e., I'm sure there are more important features/
fixes for the programmers to deal with); it would just be neat.

Paul

Eric & Susanne Johnson

unread,
Oct 8, 2008, 1:02:09 AM10/8/08
to flex...@googlegroups.com
A proposal has been put forth to have a way to mark words as "not to
> be analyzed". Or an alternative would be that anything that is not
> in the Writing System that the Baseline is in would also just not be
> analyzed (since FLEx doesn't try to analyze more than one language at
> a time anyway).
>
> This proposal has been put forth, but it's not clear how many users
> want it, or how important users consider it to be.
>
> It would be great to hear (a) do users want such a feature, and how
> important is it to your work, and (b) how do others work around it?
>
> -Beth
>

This could be a useful feature for those of us in cross-border language
situations. We got to talk with some speakers of "our" language in a
neighboring country and could communicate to some degree, but kept
getting hung up on different loanwords from that national language.
Fortunately we had some foreign-friends who spoke that language who
could fill in the gaps. In the future we may want to extract share our
db with colleagues there who would only be interested in the
non-loanwords and analysis.

Eric

SarahW

unread,
Oct 9, 2008, 5:35:07 AM10/9/08
to FLEx list
I agree with Eric's remark about cross-border languages. I'd also
include words that have been borrowed from a lingua franca and
assimilated to the extent of using the phonetic features of the
language in question (as with 'actionsort' for 'aktionsart'). Toolbox
could register one as the surface form and the other as the underlying
form, so it was clear in interlinearisation that this was a borrowed
word - as a beginner with FLEx, could someone let me know which fields
are best used for this? None of the ones available seem ideal...
Sarah

Craig Farrow

unread,
Oct 14, 2008, 11:19:59 PM10/14/08
to flex...@googlegroups.com
Thanks Eric & Ron for your input.

I've gone with new categories of Foreign word, and (subcategory) Foreign
name for words that are truly foreign (and wouldn't want in a dictionary.)

I've also added Place name and Personal name as subcategories of Noun.
Place names may or may not be wanted in a dictionary (I think an
appendix of place-name spellings could be good; choosing which places to
include/exclude would come at a later date.) Personal names are for what
would be considered a usual name within the community (not a nat. lg name).

Craig.


Reply all
Reply to author
Forward
0 new messages