Revision: 465a8b83ebaf
Branch: default
Author: Alex Rudnick <
alex.r...@gmail.com>
Date: Thu May 15 06:40:33 2014 UTC
Log: small edits on the paper
http://code.google.com/p/hltdi-l3/source/detail?r=465a8b83ebaf
Modified:
/paperdrafts/lglp/lglp14.tex
=======================================
--- /paperdrafts/lglp/lglp14.tex Wed May 14 07:28:54 2014 UTC
+++ /paperdrafts/lglp/lglp14.tex Thu May 15 06:40:33 2014 UTC
@@ -71,6 +71,8 @@
of tools for the creation of simple bilingual dependency grammars for
machine translation
and computer-assisted translation into and out of under-resourced
languages.
The basic units in Hiiktuu, called \textbf{groups}, are headed multi-item
sequences,
+%% AJR: Could this be reworded? Or maybe something like "which correspond
to
+% \textit{catenae} from the dependency grammar literature" ...
technically corresponding to catenae in dependency trees.
In their simplest form, group positions consist of wordforms.
More abstract groups, generalizing across multiple sequences of specific
word forms,
@@ -145,13 +147,14 @@
researchers and language technology users to ``get off the ground''
with these languages, that is, to create rudimentary grammars and lexica
that
will permit some basic applications, and, in the case of endangered
languages,
-will facilitate the documentation process.
+facilitate the documentation process.
We are particularly interested in MT and CAT and the grammars and lexica
that they require.
We focus on MT and CAT because for most of the languages in question, the
lack of
linguistic resources correlates with a lack of written material in the
language, and
we would like to develop tools to aid human translators, including
non-professional ones,
in translating documents into these languages.
+%% AJR: maybe "non-linguist" rather than naive? ...
Our long-term goal is a system that allows naive users to write bilingual
lexicon-grammars
for low-resource languages that can also be updated on the basis of
monolingual and bilingual corpora,
to the extent these are available, and that can be easily integrated into
a CAT system.
@@ -205,7 +208,7 @@
group elements, again as in PBSMT.
Entry~\ref{entry:end} shows a simple group entry of this sort.
The English group \textit{the end of the world} with head \textit{end} has
as its Spanish translation
-the group \textit{el fin del munro} (which must have an entry in the
Spanish lexicon).
+the group \textit{el fin del mundo} (which must have an entry in the
Spanish lexicon).
In the alignment, all but the fourth word (\textit{the}) in the English
group is associated with a word in the Spanish group.
\begin{entry}
@@ -456,14 +459,14 @@
in the development of rule-based MT systems that translate low-resource
source languages into Engish.
Although it is likely we will make use of some of the insights of
Expedition,
our project differs first, in assuming bilingual informants and second, in
aiming to
-develop systems that unrestricted with respect to target language.
+develop systems that are unrestricted with respect to target language.
In fact we are more interested in MT systems with low-resource languages
as target languages
because of the lack of documents in such languages.
Although we would not want Hiiktuu to be taken seriously as a linguistic
theory, it is worth
mentioning which theories it has the most in common with.
Like Construction Grammar \cite{steels} and Frame Semantics
\cite{fillmoreFS},
-it treats linguistic knowledge as essential phrasal.
+it treats linguistic knowledge as essentially phrasal.
Hiiktuu belongs to the family of dependency grammar (DG) theories because
the heads of its
phrasal units are words or lexemes rather than non-terminals.
It has the most in common with those computational DG theories that parse
sentences using