Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Foreign Language Vocabulary Acquisition

5 views
Skip to first unread message

Yao Ziyuan

unread,
Apr 28, 2007, 2:51:52 PM4/28/07
to
can be done by annotating some or all words in native language
documents with corresponding foreign language equivalents via a Web
browser plugin.

Yao Ziyuan

unread,
Apr 28, 2007, 3:09:24 PM4/28/07
to
Can be requested by the user to focus on specific categories of words.

Yao Ziyuan

unread,
Apr 28, 2007, 3:24:14 PM4/28/07
to
And provide foreign language ontologies related to the native language
document being read.

Yao Ziyuan

unread,
Apr 28, 2007, 3:32:32 PM4/28/07
to
And display to the user English equivalents as he types native
language words, and/or append such English equivalents to the
corresponding native language words inputted. The computer can choose
only those frequently inputted words in an input session as source
words for such appending.

Yao Ziyuan

unread,
Apr 28, 2007, 3:57:22 PM4/28/07
to
Could even append phonetic marks and/or other lexical information
about an appended English word at the first time of its appending.

Yao Ziyuan

unread,
Apr 28, 2007, 4:03:13 PM4/28/07
to
No doubt this rule also can apply to the as-you-read-a-document case.

Yao Ziyuan

unread,
Apr 28, 2007, 4:17:38 PM4/28/07
to
Such lexical information can even include words in the same set (see
Google Set) of the key word and their English equivalents.

Yao Ziyuan

unread,
Apr 28, 2007, 4:20:34 PM4/28/07
to
Two kinds of source language words are of appending interest:

(1) A word that repeat several times in the input session;
(2) Several words that belong to a same topic (topic words).

Yao Ziyuan

unread,
Apr 28, 2007, 4:24:22 PM4/28/07
to
Or there can even be a "lottery mode" where every word inputted has an
equal chance of being selected as a source word for translation
appending. Therefore commonly used words will be frequently appending
to the input strings.

Yao Ziyuan

unread,
Apr 28, 2007, 4:32:19 PM4/28/07
to
Which reading level set of words should be favored in appending? Most
common / elementary / general ones? advanced ones? domain-specific
ones? It can be either specified by the inputter or automatically
adjusted / randomly determined by the computer.

Yao Ziyuan

unread,
Apr 28, 2007, 4:36:07 PM4/28/07
to
The computer could automatically find out the level of professional-
ness and potential audience groups of a document being inputted (also
draw previous input experience into consideration) or being read and
select which words to append word translations to respectively.

Yao Ziyuan

unread,
Apr 28, 2007, 4:39:12 PM4/28/07
to
The computer could even replace source language words with English
equivalents after these words get enough times of English appending.

Yao Ziyuan

unread,
Apr 28, 2007, 4:47:56 PM4/28/07
to
The user can specify this on a target-window-title-keyword-pattern
basis.

Yao Ziyuan

unread,
Apr 28, 2007, 5:14:07 PM4/28/07
to
Needless to say, the append-as-you-type idea should be integrated with
an IME.

Yao Ziyuan

unread,
Apr 29, 2007, 6:53:40 AM4/29/07
to
It is a good chance to display systematic ontologies related to a
native language document being read in a user's browser or word
processor.

Yao Ziyuan

unread,
Apr 29, 2007, 8:23:24 AM4/29/07
to
There are so many kinds of lexical information that can be added after
the appended English word. I think the computer should distribute
these kinds of lexical information one at a time to each occurrence of
the English word.

Yao Ziyuan

unread,
Apr 29, 2007, 10:06:32 AM4/29/07
to
Lexical information also include "tricks to remember this word", e.g.
word roots in this word.

Yao Ziyuan

unread,
Apr 29, 2007, 10:06:42 AM4/29/07
to

Yao Ziyuan

unread,
Apr 29, 2007, 10:17:46 AM4/29/07
to

Yao Ziyuan

unread,
Apr 29, 2007, 5:57:43 PM4/29/07
to
Needless to say widely known techniques such as the Ebbinghaus
forgetting curve can be integrated into this main idea.

Yao Ziyuan

unread,
Apr 29, 2007, 6:18:51 PM4/29/07
to
Phrases made of all known words are easier to learn than individual
new words. So during a page view session we can append more phrase
translations than individual word translations.

Yao Ziyuan

unread,
Apr 29, 2007, 7:25:57 PM4/29/07
to
Of course the user should be able to choose that such appending do not
affect copying text from the browser.

Yao Ziyuan

unread,
Apr 29, 2007, 8:08:33 PM4/29/07
to
Also of course the main idea can be generalized to any occasion where
the user receives/reads a native language message, e.g. a mobile phone
message.

Yao Ziyuan

unread,
Apr 29, 2007, 8:22:26 PM4/29/07
to
Also needless to say the user can initialize and modify his vocabulary
profile by adding/removing topic categories (ontologies) so words are
added/removed in a wholesale manner.

Yao Ziyuan

unread,
Apr 29, 2007, 11:38:54 PM4/29/07
to
Also needless to say that the elements that can be annotated are by no
means only limited to individual words or phrases, they can be any
kind of linguistic expressions. The annotation information types are
also broad, including all kinds of lexical and know-what, know-how,
know-why, know-who (things contained in a LingoX ontology) knowledge
introduced in LingoX research literature.

Yao Ziyuan

unread,
Apr 30, 2007, 8:25:37 AM4/30/07
to
One important type of lexical information is the domain identifier and/
or connotations that distinguish a word's usage from its near-synonyms.

Yao Ziyuan

unread,
Apr 30, 2007, 8:40:41 AM4/30/07
to
Maybe all supplementary lexical information can be displayed in a
marquee.

Yao Ziyuan

unread,
Apr 30, 2007, 8:43:58 AM4/30/07
to
Or a timed splash box (each type of lexical information takes turns to
be displayed for a certain time)

Message has been deleted

Yao Ziyuan

unread,
May 1, 2007, 7:01:42 AM5/1/07
to
Could also insert a famous quotation after a computer-selected native
language word and this quotation involves this word's concept.

Yao Ziyuan

unread,
May 1, 2007, 9:26:15 AM5/1/07
to
Anything associated a word is a type of lexical information for this
word, e.g. a link to the Wikipedia article for this word.

Yao Ziyuan

unread,
May 1, 2007, 3:18:44 PM5/1/07
to
An interesting point is that we can provide annotation information NOT
RIGHT ON THE SOURCE WORD BUT ON ANOTHER WORD (RANDOMLY) THAT HAS A
ENCYCLOPEDIC CONNECTION WITH THE SOURCE WORD.

Yao Ziyuan

unread,
May 1, 2007, 3:19:46 PM5/1/07
to
This doesn't require that the source word's context strongly suggest
the ontology where this encyclopedic connection belongs.

Yao Ziyuan

unread,
May 1, 2007, 7:55:45 PM5/1/07
to
It is also possible that a handheld device video-capture and/or audio-
capture objects surrounding it and tell its user the English names of
these objects.

Yao Ziyuan

unread,
May 1, 2007, 8:44:05 PM5/1/07
to
On May 1, 7:01 pm, Yao Ziyuan <yaoziy...@gmail.com> wrote:
> Could also insert a famous quotation after a computer-selected native
> language word and this quotation involves this word's concept.

Or it could be a fact involving the computer-selected native language
word's concept.

Yao Ziyuan

unread,
May 3, 2007, 12:35:12 PM5/3/07
to
Not only can we add educational information about a single word/phrase/
expression, but we can do the same for multiple words (these multiple
words do not necessarily form a phrase) at once. For example, we pick
up two or three words from a recent sentence or the current paragraph
and then find an example sentence or even paragraph that includes all
of these picked words. More specifically, the example sentence/
paragraph can be a famous quotation, a historical fact or a news
abstract / news review abstract, or a knowledge fact extracted from an
encyclopedia.

Yao Ziyuan

unread,
May 3, 2007, 12:39:17 PM5/3/07
to
Not only can the "interest focus" be "multiple words" in a recent
sentence or current paragraph, it can also be a recent "topic" or
"issue". For example, if the current paragraph being read is talking
about campus killing, the computer can find relevant English material
(recent news/news analysis of this kind, general overview/history of
such events in America) and insert it.

The computer can display a small portion of the whole material and
leave a link to the whole material at the end.

Yao Ziyuan

unread,
May 3, 2007, 1:24:29 PM5/3/07
to
I propose "Rule Of Least Context Switches" or "Rule Of Least
Interruption":

Rather than insert educational information after two words (suppose
these two words are not adjacent) respectively in a sentence to be
read, we would probably better insert these two pieces of educational
information TOGETHER AT THE END OF THE ORIGINAL SENTENCE.

Yao Ziyuan

unread,
May 4, 2007, 1:48:47 PM5/4/07
to
This approach (auto-annotating native language information with
foreign language learning information), combined with an existing idea
(auto-annotating foreign language information with native language
interpretation information), can be the smooth-native-to-foreign-
language-transition solution.

Yao Ziyuan

unread,
May 5, 2007, 4:05:16 PM5/5/07
to
Words that occur frequently in an article or in a period of reading
history are of course ideal candidates for English annotations because
repetition facilitates memorization.

What's more, words that "have a trend of rapid surge" should also be
considered because their "rate of increase" is significant.

Yao Ziyuan

unread,
May 5, 2007, 4:11:39 PM5/5/07
to
Words that has a trend of "fading out" are also of our interest
because they seem to comply with the "forgetting curve".

Yao Ziyuan

unread,
May 6, 2007, 12:01:47 AM5/6/07
to
We should also consider whether a document position is suitable for
annotation. For example, the first paragraph of a document is usually
the abstract; the first sentence of the subsequent paragraphs is
usually the core idea of its paragraph. Such areas may not be suitable
for annotation.

On Apr 29, 2:51 am, Yao Ziyuan <yaoziy...@gmail.com> wrote:
> can be done by annotating some or all words in native language
> documents with corresponding foreign language equivalents via a Web
> browser plugin.


Yao Ziyuan

unread,
May 6, 2007, 2:16:23 AM5/6/07
to
Basic advertising knowledge tells us that we can annotate at a text
position with totally irrelevant information. Obviously this is useful
when we want to teach the user about a word but this word does not
repeat for enough times in the current document.

Yao Ziyuan

unread,
May 6, 2007, 2:19:58 AM5/6/07
to
We're also interested in knowing how likely an occurring word will
recur in an online text stream (where we don't get all the text at
once, e.g. an ongoing IRC chat or IM chat). Factors include:

- Main elements of a sentence, such as the subject, the verb, the
object, are more likely to recur.
- Elements introduced by 'a' or 'an' or 'some' or other indefinite
articles are likely to recur.
- Elements heavily modified by adjectives/adverbs/clauses/non-finite
verbs are likely to recur.
- Elements with certain "hint words" in their contexts are likely to
recur.

Yao Ziyuan

unread,
May 6, 2007, 2:25:23 AM5/6/07
to
Besides words/expressions that repeat to occur / tend to recur, we're
also interested in words/expressions that may be of significant
interest to the author and/or the reader (thus making a deep
impression which facilitates memorization of corresponding teaching
information if present). What words/expressions may draw the author's/
reader's interest?

- Elements heavily modified by adjectives/adverbs/clauses/non-finite

verbs.
- Elements that repeat.
- Elements (keywords) occurring in the article's first paragraph (the
abstract) or each paragraph's first sentence.

Yao Ziyuan

unread,
May 6, 2007, 2:28:33 AM5/6/07
to
- Elements in unusually punctuated/rhetorical sentences, such as
exclamations, rhetorical questions, juxtapositions.

Yao Ziyuan

unread,
May 6, 2007, 2:30:02 AM5/6/07
to
But there is also a competing theory which says if somewhere draws the
reader's interest, then it's a good opportunity to memorize something.

Yao Ziyuan

unread,
May 6, 2007, 2:34:34 AM5/6/07
to
- Elements whose meanings are inherently interesting, such as "money",
"make love", "fuck you".

Yao Ziyuan

unread,
May 6, 2007, 2:48:13 AM5/6/07
to
On May 6, 2:25 pm, Yao Ziyuan <yaoziy...@gmail.com> wrote:
> Besides words/expressions that repeat to occur / tend to recur, we're
> also interested in words/expressions that may be of significant
> interest to the author and/or the reader (thus making a deep
> impression which facilitates memorization of corresponding teaching
> information if present). What words/expressions may draw the author's/
> reader's interest?
>
> - Elements heavily modified by adjectives/adverbs/clauses/non-finite
> verbs.

In this case, the modifiers are also interesting.

Yao Ziyuan

unread,
May 6, 2007, 2:49:29 AM5/6/07
to
- Elements that are unique: elements that usually don't occur in their
current contexts.


On May 6, 2:25 pm, Yao Ziyuan <yaoziy...@gmail.com> wrote:

Yao Ziyuan

unread,
May 6, 2007, 2:50:04 AM5/6/07
to
- Elements that are unique: elements that usually don't occur in their
current contexts.

Yao Ziyuan

unread,
May 6, 2007, 2:51:45 AM5/6/07
to
- Elements in a sentence which is later commented (in the same text or
by other correspondents) with certain words such as "wonderful",
"awesome", "interesting", "OMG".

Yao Ziyuan

unread,
May 6, 2007, 3:45:06 PM5/6/07
to
- Elements that have special font styles, such as bold, italic,
underline.
- Elements with "words of emphasis", such as "the point is", "it is
*** that/who ***", "in fact", "as a matter of fact".
- Elements considered significant in their ontologies (if several
components of such ontologies also occur in the text)


On May 6, 2:25 pm, Yao Ziyuan <yaoziy...@gmail.com> wrote:

Yao Ziyuan

unread,
May 6, 2007, 4:24:14 PM5/6/07
to
Following this idea, we can make a total order for all words/
expressions for the computer to make comparisons about their relative
"interestingness".

Yao Ziyuan

unread,
May 6, 2007, 5:04:55 PM5/6/07
to
- Elements in text enclosed by quotation marks.

On May 6, 2:25 pm, Yao Ziyuan <yaoziy...@gmail.com> wrote:

Yao Ziyuan

unread,
May 6, 2007, 5:15:15 PM5/6/07
to
Also: "I think", "I believe", "I propose", "in conclusion", "in sum",
"arguably", "first and foremost"...

Yao Ziyuan

unread,
May 6, 2007, 5:56:17 PM5/6/07
to
Such a prediction job can also be done by Bayesian. But I don't like
statistical methods here.

Yao Ziyuan

unread,
May 7, 2007, 2:00:35 AM5/7/07
to
An effective incentive is to directly showing the English equivalents
of some elements (e.g. words or expressions) in an incoming/outgoing
Chinese message without prior teaching. This forces the reader to find
the meanings of these English occurrences with a point-to-lookup
dictionary software. And to avoid future lookups of the same English
elements, the reader will autonomously remember them.

Yao Ziyuan

unread,
May 8, 2007, 8:33:56 PM5/8/07
to
Considering a native language word is learned by first getting
familiar with its pronunciation and then learning its spelling (if
it's a spelling language), maybe pronunciation can be given a higher
priority in this.

Yao Ziyuan

unread,
May 8, 2007, 10:30:52 PM5/8/07
to
Repetition, interest (excitement) and association can all facilitate
memorization. That's why we are interested in words (including other
types of expressions hereinafter) that repeat often, and words that
are interesting and/or important in their contexts.

The rule of association promoting memorization can be exercised by
considering words that are literally, phonetically or semantically
similar/related to what the learner has already learned.

Yao Ziyuan

unread,
May 9, 2007, 12:48:14 AM5/9/07
to
- Elements that are posed as unusual, such as those led by "but".

Yao Ziyuan

unread,
May 9, 2007, 4:13:19 AM5/9/07
to
On May 6, 2:19 pm, Yao Ziyuan <yaoziy...@gmail.com> wrote:
> We're also interested in knowing how likely an occurring word will
> recur in an online text stream (where we don't get all the text at
> once, e.g. an ongoing IRC chat or IM chat). Factors include:
>
> - Main elements of a sentence, such as the subject, the verb, the
> object, are more likely to recur.
> - Elements introduced by 'a' or 'an' or 'some' or other indefinite
> articles are likely to recur.

Secondly, relation words (verbs and adjectives/adverbs) mentioned for
the first time are also likely to recur, but not as probably.

Yao Ziyuan

unread,
May 9, 2007, 8:31:02 PM5/9/07
to
Etymology can provide some connections between a word's formation and
that of its semantically related words, as a mnemonic aid for the word.

Yao Ziyuan

unread,
May 9, 2007, 8:38:05 PM5/9/07
to
There's another interesting method to determine the likelihood for a
word to recur:

See how many collocations and encyclopedic connections this word can
make with its context words.


On May 6, 2:19 pm, Yao Ziyuan <yaoziy...@gmail.com> wrote:

Yao Ziyuan

unread,
May 9, 2007, 8:47:49 PM5/9/07
to
Example:

<catbooted> i got another interesting idea to measure the likelihood for
a word to recur.
* vishwin60|brb is now known as vishwin60
* Mr_Gustafson has joined #wikipedia
<catbooted> that is to see how many collocations (phrases) and

encyclopedic connections this word can make with its context words.

* Gwern has quit IRC (Remote closed the connection)
* After|away is now known as After-Midnight
<catbooted> if existing text contains: a wolf, a bunch of cabbage and a
goat...
<Nihiltres> sounds like an interesting statistical project.
<catbooted> guess what, the goat is more likely to recur
<catbooted> because the goat has two connections to context words "wolf"
and "cabbage"
<Dquestions> Hi Billy Jean
<catbooted> the other two words each only have one connection to other words
* Gwern has joined #wikipedia
<Mr_Gustafson> My clone wears a brown shirt, and I seduce him when
theres no-one around / mano-e-mano / on a bed of nails / bring it on
like a storm, till I knock the wind out of his sails / And we don't make
eye -- con -- tact, when we have run-in's in town / just a barely polite
nod, and nervous stares towards the ground
<catbooted> so the goat is twice more likely to recur
* ProjektTHOR has quit IRC (Read error: 110 (Connection timed out))
* Majorly has quit IRC (Read error: 104 (Connection reset by peer))
<Crazytales2> Mr_Gustafson: te vas faire enculer ?
<Mr_Gustafson> NP - TV On The Radio - "I Was A Lover"
* Majorly has joined #wikipedia
* zombieninja666 is stupid and can't figure out how what catbooted is
saying works
<zombieninja666> a wolf, a bunch of cabbage and a goat were walking down
the road. The wolf ate the cabbage.
<catbooted> ...
<zombieninja666> That made no sense.
<zombieninja666> (What I did)
* columbo has quit IRC (Connection timed out)
<zombieninja666> Also I am aware you didn't say it was definite.
<zombieninja666> Just, twice more likely.
<zombieninja666> But...
* zombieninja666 is stupid
<Nihiltres> oo it works...
<Nihiltres>
http://en.wikipedia.org/wiki/User:Nihiltres/Userboxes/Super_contrib_meta
<Nihiltres> dare you to check out the source
* zombieninja666 screams

Yao Ziyuan

unread,
May 9, 2007, 8:48:26 PM5/9/07
to
p.s. "catbooted" is me (as in "Booted Cat").

Yao Ziyuan

unread,
May 9, 2007, 8:58:19 PM5/9/07
to
This only considers the number of possible connections a word can make
with other existing context words. Maybe we should also counter the
number of possible connections a word can make with *all words" (but
with a smaller coefficient).

Yao Ziyuan

unread,
May 9, 2007, 9:01:44 PM5/9/07
to
I think we should also limit the investigation scope to only domains
relevant to /involved in existing relations.

Yao Ziyuan

unread,
May 9, 2007, 9:06:01 PM5/9/07
to
And of course the likelihood each possible relation will be mentioned if
some other relations have been mentioned is not the same.

Yao Ziyuan

unread,
May 10, 2007, 2:52:21 AM5/10/07
to
Smartass idea:

If a pronoun occurs and the computer confidently knows what it's
referring to, it can also be a place after where the English
equivalent for its referent can be inserted.

Yao Ziyuan

unread,
May 10, 2007, 3:27:55 AM5/10/07
to
This is a common trick to increase a word's repetition.

Yao Ziyuan

unread,
May 10, 2007, 6:21:50 AM5/10/07
to
But a non-obvious variant is:

If some word A occurs and is annotated, and some text later you want
to re-annotate A but there isn't a recurrence of A here, but there is
a word B nearby and from the previous context you know a relation
between A and B. Then you can annotate B with the relation between B
and A and then A and its English equivalent.

Yao Ziyuan

unread,
May 10, 2007, 6:27:55 AM5/10/07
to
This relation must be confirmed by the context and is not just a
"possible" collocational or encyclopedic relation.

Yao Ziyuan

unread,
May 10, 2007, 8:38:29 PM5/10/07
to
Also note that synonymous recurrences of the same concept can be a
chance for annotation for that concept.

Yao Ziyuan

unread,
May 10, 2007, 8:58:15 PM5/10/07
to
If a word A already repeats in the known text, then words collocated
with it or having an encyclopedic connection with it are likely to
recur.

Yao Ziyuan

unread,
May 10, 2007, 10:18:30 PM5/10/07
to
This can be called a theory of "Chat Dynamics" or "Interest Dynamics":

The best way to start a new topic is to relate the new topic to an
existing topic which already receives popular attention, rather than
putting it forward in isolation.

Yao Ziyuan

unread,
May 11, 2007, 6:09:54 AM5/11/07
to
I failed to come up with a very good and pure-strategy method to
determine the likelihood for a word to recur in an ongoing ("online")
text stream. But a main factor seems to be that on-topic words are
likely to recur, especially those with close semantic/encyclopedic
connections with words that have already recurred.

Yao Ziyuan

unread,
May 11, 2007, 7:36:57 AM5/11/07
to
But we can rely less on the repetition of words because literally/
phonetically similar words can be seen as "partially recurring".

Yao Ziyuan

unread,
May 11, 2007, 9:16:42 AM5/11/07
to
In fact we don't need to require repetition (although repetition is
good). We just bomb (annotate) massively and see which annotated words
eventually get adequate exposure.

Yao Ziyuan

unread,
May 11, 2007, 7:38:54 PM5/11/07
to
It is possible, in some cases, for example for all unread RSS news
items and email messages in a mail and RSS client program, for the
computer to automatically rearrange the reading order of such reading
items, so as to maximize the memorization efficiency (forming the most
"forgetting curve-compliant" series of word recurrences.

Yao Ziyuan

unread,
May 11, 2007, 9:14:06 PM5/11/07
to
Sophisticated methods can also add "natural extensions" to an original
document which "naturally extend" the original document with relevant
background knowledge or inferred conclusions and at the same time
teach English equivalents to various words carried in by such
extensions.

Yao Ziyuan

unread,
May 12, 2007, 1:32:04 AM5/12/07
to
It also reminds me of the teaching of pure grammar (general grammar,
not specific to particular words, such as verb tenses). Although pure
grammar is a very limited set of rules, few students really go read
through a grammar handbook. Thus pure grammar knowledge can also be
taught in this inline manner.

Yao Ziyuan

unread,
May 15, 2007, 2:41:14 AM5/15/07
to
This is critical in teaching and retaining words that don't recur very
frequently.

Yao Ziyuan

unread,
May 19, 2007, 8:08:51 AM5/19/07
to
Words associated with a "controversial topic" are likely to recur,
such as Western vs. Eastern culture, Christianity vs. Islam...

Yao Ziyuan

unread,
May 20, 2007, 10:36:18 AM5/20/07
to
However, the dominating factor in an ongoing IRC/IM chat is not the
repetition of words, but the current speed of the chat. If the speed
slows down, more words can be annotated; otherwise those repetitive
ones are favored.

0 new messages