Re: [Llg-members] nu ningau so'u se jbovlaste / updating a few jbovlaste entries

52 views

Skip to first unread message

And Rosta

unread,

Jan 20, 2015, 1:28:41 PM1/20/15

to loj...@googlegroups.com

[moving this off llg-members list]

On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llambías <jjlla...@gmail.com> wrote:

On Tue, Jan 20, 2015 at 9:47 AM, And Rosta <and....@gmail.com> wrote:
On 20 Jan 2015 08:41, "guskant" <gusni...@gmail.com> wrote:
> I still don't understand how a definition of the term "language" could
> bring any damage to Lojban,

It's because it saddles Lojban with a formal grammar, which, since formal grammars aren't ingredients of human languages, serves as an impediment, a useless encumbrance, and lacks an explicit actual grammar, possession of which should be a sine qua non for a loglang. (To Usagists, this is not really relevant, because for them the True Grammar would be the implicit actual grammar that inheres in usage.) It's a remediable situation: BPFK could write an explicit actual grammar, and the formal grammars could be discarded as the worthless junk they are. (Not everything in the formal grammar is worthless junk, of course; some of it would be the basis for the actual grammar.) Maybe the formal grammar plus Martin's Tersmu might jointly be tantamount to an actual grammar, but the formal grammar bit deviates gratuitously from the syntax of human languages and could not ever plausibly be a model of an actual speaker's syntax. (I think Robin once said he believed he did use the formal grammar when spontaneously producing and comprehending utterances, but if that is true then I think he must have been using raw brute force brain power, rather than the human language faculty.)
Would it be fair to say that what an actual grammar should do is, given some input of sound or written characters, tell us how to:

(1) convert the input into a string of phonemes
(2) convert the string of phonemes into a string of words
(3) determine a tree structure for the string of words
(4) determine which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates

Rather:

(1') convert the input into a string [or perhaps tree] of phonemes

(2') convert the string [or perhaps tree] of phonemes into a string [or perhaps (prosodic) tree] of phonological words

(3') map the tree of phonological words to a structure of syntactic 'words'/'nodes', which structure will specify which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates

> and conversely, given a list of terms and predicate relations among them, the grammar should tell us how to put all that into a string of characters or sounds such that someone else can recover the original structure of terms and their relations from it.

Yes.

> (In addition to that, the grammar has to say how to encode/decode illocutionary force, and maybe some other things.)

As you know, I take that to be part of logicosyntactic form.

> If that's more or less on track, then we can say that the YACC/EBNF formal grammars do (3). The PEG grammar does (2) and (3). Martin's tersmu is trying to do (4). I would agree that the way our formal grammars do (3) is probably not much like the way our brains do (3), but I'm not sure I see what alternative we have.

Right. So I think (3) is not a valid step. (3') should be doable, partly from Tersmu and partly by using some natural language formalism to analyse the syntax (e.g. at minimum make all phrases headed and forbid unary branching; binary branching would be a bonus if it could be managed).

> The way I understand what guskant's concern is, is that we should provide lojban definitions for words in such a way as to facilitate (4).

Yes, I think everyone would agree with that.

--And.

Jorge Llambías

unread,

Jan 20, 2015, 2:38:44 PM1/20/15

to loj...@googlegroups.com

On Tue, Jan 20, 2015 at 3:28 PM, And Rosta <and....@gmail.com> wrote:

On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llambías <jjlla...@gmail.com> wrote:

Would it be fair to say that what an actual grammar should do is, given some input of sound or written characters, tell us how to:

(1) convert the input into a string of phonemes
(2) convert the string of phonemes into a string of words
(3) determine a tree structure for the string of words
(4) determine which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates

Rather:

(1') convert the input into a string [or perhaps tree] of phonemes
(2') convert the string [or perhaps tree] of phonemes into a string [or perhaps (prosodic) tree] of phonological words
(3') map the tree of phonological words to a structure of syntactic 'words'/'nodes', which structure will specify which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates

You seem to have just merged (2) and (3) into (2'), which may be more general, but in the particular case of Lojban we know that (2') can be achieved in two independent steps, one step that takes any string of phonemes and unambiguously dissects it into a string of words (possibly including non-lojban words), and a second step that takes the resulting string of words as input and unambiguously gives a unique tree structure for them (or else rejects the string of words as ungrammatical). That probably doesn't work for natlangs in general.

> If that's more or less on track, then we can say that the YACC/EBNF formal grammars do (3). The PEG grammar does (2) and (3). Martin's tersmu is trying to do (4). I would agree that the way our formal grammars do (3) is probably not much like the way our brains do (3), but I'm not sure I see what alternative we have.

Right. So I think (3) is not a valid step.

But why is it invalid if it achieves the desired result? And what's the alternative, how else could we formalize (2')?

(3') should be doable, partly from Tersmu and partly by using some natural language formalism to analyse the syntax (e.g. at minimum make all phrases headed and forbid unary branching; binary branching would be a bonus if it could be managed).

In order to do (3'), we first need to do (2'). PEG does (2') (and so does Yacc+its preparser, with some limitations). And the resulting tree has enough detail (in the labeling of its nodes) to give us a head start with (3'). I assume Tersmu uses the output of one of these as its input.

The current PEG doesn't produce binary branching exclusively, although it can probably be tweaked to do that by adding many intermediate rules. Why is unary branching bad? There are many rules where one of the branches is optional, so that would result either in an empty leaf or a unary branch. Would you want binary branching all the way down to phonemes, or just to words?

mu'o mi'e xorxes

And Rosta

unread,

Jan 20, 2015, 6:35:52 PM1/20/15

to loj...@googlegroups.com

Jorge Llambías, On 20/01/2015 19:38:
> On Tue, Jan 20, 2015 at 3:28 PM, And Rosta <and....@gmail.com <mailto:and....@gmail.com>> wrote:

>
> On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llambías <jjlla...@gmail.com <mailto:jjlla...@gmail.com>> wrote:
>
>
> Would it be fair to say that what an actual grammar should do is, given some input of sound or written characters, tell us how to:
>
> (1) convert the input into a string of phonemes
> (2) convert the string of phonemes into a string of words
> (3) determine a tree structure for the string of words
> (4) determine which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates
>
>
> Rather:
>
> (1') convert the input into a string [or perhaps tree] of phonemes
> (2') convert the string [or perhaps tree] of phonemes into a string [or perhaps (prosodic) tree] of phonological words
> (3') map the tree of phonological words to a structure of syntactic 'words'/'nodes', which structure will specify which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates
>
>
> You seem to have just merged (2) and (3) into (2'),

No, I meant (2') to be just a restatement of (2), with the added acknowledgement that in human languages there is tree-like phonological structure above the word level -- i.e. prosodic phonology, which yields intonation phrases and so forth. (Google "prosodic phonology", but don't get sidetracked, because it's orthogonal to my point.) I phrased it hedgily because of course the formal definition of Lojban delibrately eschews phonological structure beyond mere phoneme strings. But there is nothing of (3) in (2').

> which may be more general, but in the particular case of Lojban we
> know that (2') can be achieved in two independent steps, one step
> that takes any string of phonemes and unambiguously dissects it into
> a string of words (possibly including non-lojban words),

yes

> and a second step that takes the resulting string of words as input
> and unambiguously gives a unique tree structure for them (or else
> rejects the string of words as ungrammatical).

No. The second step (my (3')) takes the string of phonological words but it doesn't give a *syntactic* tree structure whose terminal nodes are phonological words, which is what I take "gives a tree structure for them" to mean. Not every syntactic node need correspond to a phonological one (e.g. ellipsis, which Lojban uses) and a phonological word can correspond to more than one syntactic one (e.g. English _you're_ is one phonological word corresponding to a sequence of a pronoun and an auxiliary). Rather, step (3') uses the rules that define correspondences between elements of the sentence's phonology and elements of the sentence's syntax, to find a sentence syntax that -- in Lojban's case, uniquely -- licitly corresponds to the sentence's phonology.

Step (3') yields something like Tersmu output, probably augmented by some purely syntactic (i.e. without logical import) structure. I think that can and should be done without reference to the formal grammars.

It just doesn't yield a human language. And to the (considerable) extent to which Lojban counts as a human language, it is working despite (3) rather than because of it.

> And what's the alternative, how else could we formalize (2')?

I think I hadn't succeeded in making you t understood what I'd meant by (2').

> (3') should be doable, partly from Tersmu and partly by using some natural language formalism to analyse the syntax (e.g. at minimum make all phrases headed and forbid unary branching; binary branching would be a bonus if it could be managed).
>
> In order to do (3'), we first need to do (2'). PEG does (2') (and so
> does Yacc+its preparser, with some limitations). And the resulting
> tree has enough detail (in the labeling of its nodes) to give us a
> head start with (3'). I assume Tersmu uses the output of one of these
> as its input.

I hope I've now explained where the misstep is, and how the product (i.e. the supposed operation/definition of the grammar) is something that isn't a human language.

> The current PEG doesn't produce binary branching exclusively,
> although it can probably be tweaked to do that by adding many
> intermediate rules. Why is unary branching bad?

Human languages seem not to avail themselves of it; unary branching constitutes a superfluous richness of structural possibilities.

> There are many rules where one of the branches is optional, so that
> would result either in an empty leaf or a unary branch.

Say you've got an optionally transitive/intransitive verb, such as English _swallow_. When it has an object, they jointly form a binary branching phrase. When it lacks an object, then there is no need for any branching; so for example _I swallow_ could be a binary phrase whose constituents do not themselves branch. (It's true that many models of syntax do allow unary branching precisely when the daughter node is terminal, so rather than argue over that, let me instead say that it's unary branching with a nonterminal node that is superfluous.)

> Would you want binary branching all the way down to phonemes, or just
> to words?

Syntactic words and phonemes don't exist on the same plane; phonemes don't comprise syntactic words; syntactic words don't consist of phonemes. I think binary branching in syntax has many virtues, and I believe natlang syntax is binary branching (-- English for sure; other languages - probably), but it's not the case that all right-minded linguisticians share that view. I myself don't think that phonological structure above or below the word level is binary branching, but others do; either way, the nature of phonological structure is not really germane.

--And.

Jorge Llambías

unread,

Jan 21, 2015, 7:33:31 AM1/21/15

to loj...@googlegroups.com

On Tue, Jan 20, 2015 at 8:35 PM, And Rosta <and....@gmail.com> wrote:

Jorge Llambías, On 20/01/2015 19:38:

On Tue, Jan 20, 2015 at 3:28 PM, And Rosta <and....@gmail.com <mailto:and....@gmail.com>> wrote:
On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llambías <jjlla...@gmail.com <mailto:jjlla...@gmail.com>> wrote:

Would it be fair to say that what an actual grammar should do is, given some input of sound or written characters, tell us how to:

(1) convert the input into a string of phonemes
(2) convert the string of phonemes into a string of words
(3) determine a tree structure for the string of words
(4) determine which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates

Rather:

(1') convert the input into a string [or perhaps tree] of phonemes
(2') convert the string [or perhaps tree] of phonemes into a string [or perhaps (prosodic) tree] of phonological words
(3') map the tree of phonological words to a structure of syntactic 'words'/'nodes', which structure will specify which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates

You seem to have just merged (2) and (3) into (2'),

No, I meant (2') to be just a restatement of (2), with the added acknowledgement that in human languages there is tree-like phonological structure above the word level -- i.e. prosodic phonology, which yields intonation phrases and so forth. (Google "prosodic phonology", but don't get sidetracked, because it's orthogonal to my point.) I phrased it hedgily because of course the formal definition of Lojban delibrately eschews phonological structure beyond mere phoneme strings. But there is nothing of (3) in (2').

Ok, I see. Then my (3) and (4) are merged into your (3'), with the proviso that you think (3) is either useless or possibly detrimental to achieving (3').

BTW, don't the C's and V's of the traditional definition give some phonological structure beyond mere phoneme strings? The PEG morphology also makes use of syllables and their onset-nucleus-coda components. That's phonological structure, right?

which may be more general, but in the particular case of Lojban we
know that (2') can be achieved in two independent steps, one step
that takes any string of phonemes and unambiguously dissects it into
a string of words (possibly including non-lojban words),

yes

and a second step that takes the resulting string of words as input
and unambiguously gives a unique tree structure for them (or else
rejects the string of words as ungrammatical).

No. The second step (my (3')) takes the string of phonological words but it doesn't give a *syntactic* tree structure whose terminal nodes are phonological words, which is what I take "gives a tree structure for them" to mean. Not every syntactic node need correspond to a phonological one (e.g. ellipsis, which Lojban uses) and a phonological word can correspond to more than one syntactic one (e.g. English _you're_ is one phonological word corresponding to a sequence of a pronoun and an auxiliary).

In Lojban we could say that the reverse happens with "ybu", which would be one syntactic word consisting of two phonological words (defining syntactic word as those that can be quoted with "zo" according to PEG).

Rather, step (3') uses the rules that define correspondences between elements of the sentence's phonology and elements of the sentence's syntax, to find a sentence syntax that -- in Lojban's case, uniquely -- licitly corresponds to the sentence's phonology.

Step (3') yields something like Tersmu output, probably augmented by some purely syntactic (i.e. without logical import) structure. I think that can and should be done without reference to the formal grammars.

But Tersmu output is basically FOPL, which has its own formal grammar (on which Lojban's formal grammar is based). I still don't see what problems formal grammars create.

> If that's more or less on track, then we can say that the YACC/EBNF formal grammars do (3). The PEG grammar does (2) and (3). Martin's tersmu is trying to do (4). I would agree that the way our formal grammars do (3) is probably not much like the way our brains do (3), but I'm not sure I see what alternative we have.

Right. So I think (3) is not a valid step.

But why is it invalid if it achieves the desired result?

It just doesn't yield a human language. And to the (considerable) extent to which Lojban counts as a human language, it is working despite (3) rather than because of it.

I can accept that, or perhaps "regardless of (3)", but I agree not "because of (3)". But I'm not sure there's much left of Lojban if we remove (3).

The current PEG doesn't produce binary branching exclusively,
although it can probably be tweaked to do that by adding many
intermediate rules. Why is unary branching bad?

Human languages seem not to avail themselves of it; unary branching constitutes a superfluous richness of structural possibilities.

Ok. As an example, the PEG has:

statement <- statement-1 / prenex statement

statement-1 <- statement-2 (I-clause joik-jek statement-2?)*

The first rule means that a "statement" node can unary branch into a "statement-1" node, or binary branch into "prenex" and "statement" nodes. The PEG could instead just be:

   statement <- statement-2 (I-clause joik-jek statement-2?)* / prenex statement

and completely bypass the statement-1 node, which is indeed superfluous. The PEG can probably be re-written so as to eliminate all unary branching, although there may be a price in clarity.

There are many rules where one of the branches is optional, so that
would result either in an empty leaf or a unary branch.

Say you've got an optionally transitive/intransitive verb, such as English _swallow_. When it has an object, they jointly form a binary branching phrase. When it lacks an object, then there is no need for any branching; so for example _I swallow_ could be a binary phrase whose constituents do not themselves branch. (It's true that many models of syntax do allow unary branching precisely when the daughter node is terminal, so rather than argue over that, let me instead say that it's unary branching with a nonterminal node that is superfluous.)

OK, but is this more than just aesthetics? Unary branches don't do anything useful, but are they harmful other than in cluttering the tree with superfluous nodes? I'm probably asking the wrong questions anyway, because I'm not yet capable of identifying the problem.

Would you want binary branching all the way down to phonemes, or just
to words?

Syntactic words and phonemes don't exist on the same plane; phonemes don't comprise syntactic words; syntactic words don't consist of phonemes.

Ok, but in Lojban there's almost a one-to-one match between phonological and syntactic words.

I think binary branching in syntax has many virtues, and I believe natlang syntax is binary branching (-- English for sure; other languages - probably), but it's not the case that all right-minded linguisticians share that view. I myself don't think that phonological structure above or below the word level is binary branching, but others do; either way, the nature of phonological structure is not really germane.

When you say something like "I believe natlang syntax is binary branching" I realize we have a different idea about what syntax is, because I can't have any beliefs one way or the other on whether natlang syntax is binary branching or not. Let me try to explain with a simple Lojban example. One could posit several different syntactic structures for the sumti "lo broda ku":

(1) (lo broda)- -ku

(2) lo- -(broda ku)

(3) (lo- -ku) -broda-

(4) lo- -broda- -ku

For me they are all defensible. (1) probably reflects best how "ku" was born, a "spoken comma", something that separates the fully formed sumti "lo broda" from the rest of the sentence. (2) may reflect best my psychological introspective understanding of "ku" as a terminator of the sumti-tail. (3) reflects a popular take where lo...ku are brackets around a selbri that convert it into a sumti, and (4) happens to best match what PEG, YACC and BNF do, since they give a node with three branches.

If I understand you correctly, only one of those four could correctly reflect Lojban syntax, whereas for me all four are equally valid takes since in the end it makes no difference which one we choose. Now in the case of Lojban we could say that only one of these is the officially correct syntax (currently that would be 4), but if something like that happens in natlangs, does it make sense to talk of "the syntax" for the natlang as opposed to "a syntax"?

And Rosta

unread,

Jan 21, 2015, 10:08:37 AM1/21/15

to loj...@googlegroups.com

Jorge Llambías, On 21/01/2015 12:33:

>
> On Tue, Jan 20, 2015 at 8:35 PM, And Rosta <and....@gmail.com <mailto:and....@gmail.com>> wrote:
>
> Jorge Llambías, On 20/01/2015 19:38:
>

> On Tue, Jan 20, 2015 at 3:28 PM, And Rosta <and....@gmail.com <mailto:and....@gmail.com> <mailto:and....@gmail.com <mailto:and....@gmail.com>>> wrote:

> On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llambías <jjlla...@gmail.com <mailto:jjlla...@gmail.com> <mailto:jjlla...@gmail.com <mailto:jjlla...@gmail.com>>> wrote:
>
> Would it be fair to say that what an actual grammar should do is, given some input of sound or written characters, tell us how to:
>
> (1) convert the input into a string of phonemes
> (2) convert the string of phonemes into a string of words
> (3) determine a tree structure for the string of words
> (4) determine which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates
>
>
> Rather:
>
> (1') convert the input into a string [or perhaps tree] of phonemes
> (2') convert the string [or perhaps tree] of phonemes into a string [or perhaps (prosodic) tree] of phonological words
> (3') map the tree of phonological words to a structure of syntactic 'words'/'nodes', which structure will specify which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates
>
>
> You seem to have just merged (2) and (3) into (2'),
>
>
> No, I meant (2') to be just a restatement of (2), with the added acknowledgement that in human languages there is tree-like phonological structure above the word level -- i.e. prosodic phonology, which yields intonation phrases and so forth. (Google "prosodic phonology", but don't get sidetracked, because it's orthogonal to my point.) I phrased it hedgily because of course the formal definition of Lojban delibrately eschews phonological structure beyond mere phoneme strings. But there is nothing of (3) in (2').
>
>
> Ok, I see. Then my (3) and (4) are merged into your (3'), with the
> proviso that you think (3) is either useless or possibly detrimental
> to achieving (3').

Yes.

> BTW, don't the C's and V's of the traditional definition give some
> phonological structure beyond mere phoneme strings? The PEG
> morphology also makes use of syllables and their onset-nucleus-coda
> components. That's phonological structure, right?

Yes, but I am conscious of being among people more mathematically-minded than I am, so I shrink from attempting to pronounce on what sort of structure goes beyond mere patterning in a string. At any rate, yes the traditional definition does impose some phonological structure; but whether that is hierarchical rather than linear, I am uncertain.

> Step (3') yields something like Tersmu output, probably augmented by some purely syntactic (i.e. without logical import) structure. I think that can and should be done without reference to the formal grammars.
>
> But Tersmu output is basically FOPL, which has its own formal grammar
> (on which Lojban's formal grammar is based). I still don't see what
> problems formal grammars create.

(3') must certainly involve a grammar, and I can't think of any sense in which a grammar could meaningfully be called 'informal', so I'm happy to call that grammar 'formal'. But it differs from the CS (or at least the Lojban) notion primarily in not having phonological objects as any of its nodes and secondarily in not necessarily being simply a labelled bracketing of a string.

> > If that's more or less on track, then we can say that the YACC/EBNF formal grammars do (3). The PEG grammar does (2) and (3). Martin's tersmu is trying to do (4). I would agree that the way our formal grammars do (3) is probably not much like the way our brains do (3), but I'm not sure I see what alternative we have.
>
> Right. So I think (3) is not a valid step.
>
> But why is it invalid if it achieves the desired result?
>
> It just doesn't yield a human language. And to the (considerable) extent to which Lojban counts as a human language, it is working despite (3) rather than because of it.
>
> I can accept that, or perhaps "regardless of (3)", but I agree not "because of (3)". But I'm not sure there's much left of Lojban if we remove (3).

To the extent that Lojban is a language, (3) doesn't really constitute any part of Lojban (despite the mistaken belief of many Lojbanists to the contrary). Also, to the extent that Lojban is a language, there exists an implicit version of (3'), albeit not necessarily one that is coherent or unambiguous. So I would recommend removing the current Formal Grammars from the definition of Lojban, and replacing them by one -- an explicit (3') -- that more credibly represents actual human language (but is unambiguous etc.).

> The current PEG doesn't produce binary branching exclusively,
> although it can probably be tweaked to do that by adding many
> intermediate rules. Why is unary branching bad?
>
>
> Human languages seem not to avail themselves of it; unary branching constitutes a superfluous richness of structural possibilities.
>
>
> Ok. As an example, the PEG has:
>
> statement <- statement-1 / prenex statement
>
> statement-1 <- statement-2 (I-clause joik-jek statement-2?)*
>
> The first rule means that a "statement" node can unary branch into a "statement-1" node, or binary branch into "prenex" and "statement" nodes. The PEG could instead just be:
>

> statement <-statement-2 (I-clause joik-jek statement-2?)* / prenex statement
>
> and completely bypass the statement-1 node, which is indeed superfluous.The PEG can probably be re-written so as to eliminate all unary branching, although there may be a price in clarity.

Good. Also questionable is the extent to which a nonterminal node can have properties/labels not simply derived from the label of the head daughter: the range of views among syntacticians is too hard to summarize in one sentence here, but certainly one does not come across syntactic trees for natlang sentences with a pattern of labellings resembling Lojban's, i.e. where the relationship between labels on the mother and the daughters is unconstrained.

> There are many rules where one of the branches is optional, so that
> would result either in an empty leaf or a unary branch.
>
> Say you've got an optionally transitive/intransitive verb, such as English _swallow_. When it has an object, they jointly form a binary branching phrase. When it lacks an object, then there is no need for any branching; so for example _I swallow_ could be a binary phrase whose constituents do not themselves branch. (It's true that many models of syntax do allow unary branching precisely when the daughter node is terminal, so rather than argue over that, let me instead say that it's unary branching with a nonterminal node that is superfluous.)
>
> OK, but is this more than just aesthetics? Unary branches don't do
> anything useful, but are they harmful other than in cluttering the
> tree with superfluous nodes?

They're harmless clutter if there's no contrast with a version of the tree where mother and singleton daughter merge into the same node. You need to consider the branching issue together with the labelling issue. If mother and head-daughter have the same label, then the redundancy of unary branching is plain.

> Syntactic words and phonemes don't exist on the same plane; phonemes don't comprise syntactic words; syntactic words don't consist of phonemes.
>
> Ok, but in Lojban there's almost a one-to-one match between
> phonological and syntactic words.

That remains to be seen, because there isn't yet an explicit real syntax for Lojban. However, it's perfectly possible that in Lojban, phonology--syntax mismatches are rare.

> I think binary branching in syntax has many virtues, and I believe natlang syntax is binary branching (-- English for sure; other languages - probably), but it's not the case that all right-minded linguisticians share that view. I myself don't think that phonological structure above or below the word level is binary branching, but others do; either way, the nature of phonological structure is not really germane.
>
> When you say something like "I believe natlang syntax is binary
> branching" I realize we have a different idea about what syntax is,
> because I can't have any beliefs one way or the other on whether

> natlang syntax is binary branching or not.Let me try to explain with
> a simple Lojban example.

I'm not sure if choosing a simple Lojban example is going to reveal why you can't have beliefs about binary branching in natlangs. Syntax is a set of rules for combining the combinatorial units of syntax in ways that are combinatorially licit and that combine the units' phonological forms and their meanings. I suspect (but excuse me if I'm mistaken) that for you every set of rules that defines the correct set of sentences is equally valid, so that so long as the rules match the right sentence sounds to the right sentence meanings, it doesn't matter what the intermediate structure is like; if the syntactician has a job, it is to work out *a* set of rules, but there is no reason to think there is only one correct set of rules. In contrast, pretty much all linguisticians think (but not always for the same reasons) that of the sets of rules that generate the same, correct, set of sentences, some of those sets are right and some are wrong or at least some are righter and some are wronger
. In my case I think the rules matter because (i) to understand the system you need to understand its internal mechanics, and (ii) a speaker knows a certain set of rules. and it's known-rules that are my object of study.

> One could posit several different syntactic structures for the sumti
> "lo broda ku":
>
> (1) (lo broda)- -ku
> (2) lo- -(broda ku)
> (3) (lo- -ku) -broda-
> (4) lo- -broda- -ku
>
> For me they are all defensible. (1) probably reflects best how "ku" was born, a "spoken comma", something that separates the fully formed sumti "lo broda" from the rest of the sentence. (2) may reflect best my psychological introspective understanding of "ku" as a terminator of the sumti-tail. (3) reflects a popular take where lo...ku are brackets around a selbri that convert it into a sumti, and (4) happens to best match what PEG, YACC and BNF do, since they give a node with three branches.
>
> If I understand you correctly, only one of those four could correctly
> reflect Lojban syntax, whereas for me all four are equally valid
> takes since in the end it makes no difference which one we choose.
> Now in the case of Lojban we could say that only one of these is the
> officially correct syntax (currently that would be 4), but if
> something like that happens in natlangs, does it make sense to talk
> of "the syntax" for the natlang as opposed to "a syntax"?

Note that I want to distinguish between "ideas that are obviously wrong" and "ideas I don't agree with"; the main points I wanted to make in this thread pertain to the former sort, whereas my objection to non-binary-branching is of only the latter sort. But anyway, with that caveat declared, I'd say (on the basis of my tentative belief about the binarity of branching) that that (4) is invalid because there is no mechanism for building it, and (3) is, absent any additional syntactic structure, also invalid because there is no way to generate the right order of phonological words from that syntactic structure. It's unlikely that the arguments for (1) and (2) are equally strong, but still it's possible that the grammar allows both structures or that there are multiple parallel equally viable grammars. FWIW I, who was never much of a Lojban syntactician, think (1) looks to be better than (2).

--And.

Jorge Llambías

unread,

Jan 21, 2015, 11:54:28 AM1/21/15

to loj...@googlegroups.com

On Wed, Jan 21, 2015 at 12:08 PM, And Rosta <and....@gmail.com> wrote:

Jorge Llambías, On 21/01/2015 12:33:
On Tue, Jan 20, 2015 at 8:35 PM, And Rosta <and....@gmail.com <mailto:and....@gmail.com>> wrote:

Step (3') yields something like Tersmu output, probably augmented by some purely syntactic (i.e. without logical import) structure. I think that can and should be done without reference to the formal grammars.

But Tersmu output is basically FOPL, which has its own formal grammar
(on which Lojban's formal grammar is based). I still don't see what
problems formal grammars create.

(3') must certainly involve a grammar, and I can't think of any sense in which a grammar could meaningfully be called 'informal', so I'm happy to call that grammar 'formal'. But it differs from the CS (or at least the Lojban) notion primarily in not having phonological objects as any of its nodes and secondarily in not necessarily being simply a labelled bracketing of a string.

I don't understand your primary objection because the syntactic tree generated by the Lojban formal grammars doesn't rely on its terminal nodes being phonological objects. The terminal nodes of the syntax part of the grammar are the selma'o. It just happens that these can be mapped in a trivial way to the output of the morphology, but that's not important. One could implement a completely different morphology and mount the same Lojban syntax on that. The only requirement for the syntax is that each syntactic word be a member of one of the selma'o.

The secondary objection I accept, but that's why I had (4), to complement the labelled bracketing generated by (3). That's what Martin's Tersmu is meant to do, because as I understand it it doesn't start from scratch with just a string of syntactic words, it starts from the output of (3).

To the extent that Lojban is a language, (3) doesn't really constitute any part of Lojban (despite the mistaken belief of many Lojbanists to the contrary). Also, to the extent that Lojban is a language, there exists an implicit version of (3'), albeit not necessarily one that is coherent or unambiguous. So I would recommend removing the current Formal Grammars from the definition of Lojban, and replacing them by one -- an explicit (3') -- that more credibly represents actual human language (but is unambiguous etc.).

The only problem with that is that we don't have anyone else besides yourself competent enough to give an explicit (3'). I wouldn't even know what (3') has to look like. We can only do what we know how to do.

Also questionable is the extent to which a nonterminal node can have properties/labels not simply derived from the label of the head daughter: the range of views among syntacticians is too hard to summarize in one sentence here, but certainly one does not come across syntactic trees for natlang sentences with a pattern of labellings resembling Lojban's, i.e. where the relationship between labels on the mother and the daughters is unconstrained.

I certainly don't want to claim that Lojban's syntactic trees are naturalistic. Let's say that they are to a natlang tree as Frankenstein is to a human.

Unary branches don't do
anything useful, but are they harmful other than in cluttering the
tree with superfluous nodes?

They're harmless clutter if there's no contrast with a version of the tree where mother and singleton daughter merge into the same node. You need to consider the branching issue together with the labelling issue. If mother and head-daughter have the same label, then the redundancy of unary branching is plain.

That's why Lojban parsers usually throw away a lot of the output provided by the formal grammar, and keep only the labels of key nodes for presentation purposes.

Ok, but in Lojban there's almost a one-to-one match between
phonological and syntactic words.

That remains to be seen, because there isn't yet an explicit real syntax for Lojban. However, it's perfectly possible that in Lojban, phonology--syntax mismatches are rare.

The only mismatch I'm aware of is "ybu", which is treated as a syntactic word even though phonologically it would break down into the hesitation "y" and the phonological word "bu".

I'm not sure if choosing a simple Lojban example is going to reveal why you can't have beliefs about binary branching in natlangs.

What I meant to say is that I can't see a syntax as an intrinsic feature of a natlang, as opposed to being just a model, which can be a better or worse fit, but it can never be the language. So I can accept that binary branching syntaxes are more elegant, more perspicuous, etc, I just can't believe they are a feature of the language, just like the description of a house is not a feature of the house. Maybe that's just me not being a linguist.

Syntax is a set of rules for combining the combinatorial units of syntax in ways that are combinatorially licit and that combine the units' phonological forms and their meanings. I suspect (but excuse me if I'm mistaken) that for you every set of rules that defines the correct set of sentences is equally valid, so that so long as the rules match the right sentence sounds to the right sentence meanings, it doesn't matter what the intermediate structure is like; if the syntactician has a job, it is to work out *a* set of rules, but there is no reason to think there is only one correct set of rules. In contrast, pretty much all linguisticians think (but not always for the same reasons) that of the sets of rules that generate the same, correct, set of sentences, some of those sets are right and some are wrong or at least some are righter and some are wronger

That sounds reasonable.

. In my case I think the rules matter because (i) to understand the system you need to understand its internal mechanics, and (ii) a speaker knows a certain set of rules. and it's known-rules that are my object of study.

Yes, but can't those rules, or rather a part of those rules, be presented as a CS type grammar? I understand that the Lojban formal grammars as they are are something of a monstrosity, but what if they were cleaned up and made more human compatible? You seem to be saying that the very idea of a PEG/YACC/BNF type grammar is counter to a proper grammar, not just the particular poor choices made for the Lojban grammar.

guskant

unread,

Jan 23, 2015, 12:51:06 AM1/23/15

to lojban

(doi la.piier: la.and has moved the thread to lojban-list because the
topic is technical and not suitable for private list.)

2015-01-22 12:40 GMT+09:00 Pierre Abbat <ph...@phma.optus.nu>:
> On Tuesday, January 20, 2015 01:52:26 guskant wrote:
>> Besides, it may be too much advanced thought, but I think all official
>> gismu and cmavo should be defined in Lojban.
>>
>> A language consists of only sequences of symbols regulated by a
>> grammar, but the universe expressed by a language depends on
>> definitions of words. As long as the words of a language are defined
>> by another language, the universe is restricted to that can be
>> expressed by the language used for the definitions. I think the
>> universe expressed by Lojban should be liberated from the other
>> languages.
>
> Defining all words of a language in that language necessarily produces a
> circular sequence of definitions, which makes it impossible for someone who
> doesn't already know the language to figure out what they mean. I'm not against
> defining Lojban words in Lojban, but they should also be defined in all six
> source languages.
>

I welcome any circular definitions:

2015-01-20 8:58 GMT+09:00 guskant <gusni...@gmail.com>:
>
> Your idea will be helpful in our actual group work on Lojban
> definitions, though my suggestion was rather more simple, permitting
> any circular definitions between words in Lojban. My suggestion was
> only to construct a network of Lojban-only definitions between Lojban
> words, so that the network could refine possible models of the
> language Lojban.

The point of my suggestion is to liberate Lojban from any other
language in the following sense:

2015-01-20 8:58 GMT+09:00 guskant <gusni...@gmail.com>:
> In my
> definition, A language is a subset of sequences of symbols. If you
> make any modification to the subset, it forms another language. Two
> different subsets (languages) may have different or isomorphic models.
> A "universe" I meant is a set on which a model is constructed. If each
> of languages P and Q has a model, and if the words of P are defined by
> Q, then the definitions fix a mapping from a universe of a model of Q
> to a universe of a model of P. If words of P are not defined by Q,
> such a mapping is not fixed. That is what I meant by the word
> "liberated".
>

Furthermore, this liberation is not much related to SWH, but rather to
Quine's theory of indeterminacy of translation. The translation
between two independent languages is indeterminate. As long as Lojban
is defined by another language, indeterminacy is meaningless, because
any translation between Lojban and the defining language is
determinable by the definitions. In such conditions, there are no room
to bring up signifiés peculiar to Lojban.

In this aspect, Lojban dictionary is different from any other
dictionaries between natural languages. Each natural language has
already its own signifiés, and compilers of dictionaries between two
languages should somehow find out plausible translations consistent
with many texts, though they are essentially indeterminate. On the
other hand, Lojban dictionary is indeed a list of definitions, because
Lojban has naturally no signifiés before being defined by the
dictionary. My suggestion is to let Lojban have its own signifiés by
defining the words in Lojban itself, including many circular
definitions. If it is implemented, "definitions" of Lojban words in
other languages will become only plausible translations, just as same
as dictionaries between natural languages.

For what? It's only for independence of signifiés of Lojban from
signifiés of other languages, but I also imagine giving AI a set of
Lojban PEG, Martin's tersmu and a network of definitions in Lojban,
and observing how it will develop sentences (at this point, it might
or might not be related to SWH as an experiment of it.) I don't have
skill of it, but I wish someone would do.

mu'o

And Rosta

unread,

Feb 4, 2015, 10:45:02 AM2/4/15

to loj...@googlegroups.com

Jorge Llambías, On 21/01/2015 16:54:

>
> On Wed, Jan 21, 2015 at 12:08 PM, And Rosta <and....@gmail.com <mailto:and....@gmail.com>> wrote:
>
> Jorge Llambías, On 21/01/2015 12:33:
>

> On Tue, Jan 20, 2015 at 8:35 PM, And Rosta <and....@gmail.com <mailto:and....@gmail.com> <mailto:and....@gmail.com <mailto:and....@gmail.com>>> wrote:
>
>
> Step (3') yields something like Tersmu output, probably augmented by some purely syntactic (i.e. without logical import) structure. I think that can and should be done without reference to the formal grammars.
>
> But Tersmu output is basically FOPL, which has its own formal grammar
> (on which Lojban's formal grammar is based). I still don't see what
> problems formal grammars create.
>
>
> (3') must certainly involve a grammar, and I can't think of any sense in which a grammar could meaningfully be called 'informal', so I'm happy to call that grammar 'formal'. But it differs from the CS (or at least the Lojban) notion primarily in not having phonological objects as any of its nodes and secondarily in not necessarily being simply a labelled bracketing of a string.
>
>
> I don't understand your primary objection because the syntactic tree
> generated by the Lojban formal grammars doesn't rely on its terminal
> nodes being phonological objects. The terminal nodes of the syntax
> part of the grammar are the selma'o. It just happens that these can
> be mapped in a trivial way to the output of the morphology, but
> that's not important. One could implement a completely different
> morphology and mount the same Lojban syntax on that. The only
> requirement for the syntax is that each syntactic word be a member of
> one of the selma'o.

My primary objection is not so much the phonologicality of the
terminal nodes as their nonsyntacticality: if they were syntactic then
they would contain logical structure, and ellipsed elements.

> The secondary objection I accept, but that's why I had (4), to
> complement the labelled bracketing generated by (3). That's what
> Martin's Tersmu is meant to do, because as I understand it it doesn't
> start from scratch with just a string of syntactic words, it starts
> from the output of (3).

Well, I've already said that even tho the 'Formal Grammar' must be
discarded, it can still be recycled into the actual grammar. Building
the actual grammar simply by bolting together the Formal Grammar and
Tersmu isn't going to resemble anything whose innards resemble human
language, but at least it would be functionally equivalent to a human
language syntax module.

> To the extent that Lojban is a language, (3) doesn't really constitute any part of Lojban (despite the mistaken belief of many Lojbanists to the contrary). Also, to the extent that Lojban is a language, there exists an implicit version of (3'), albeit not necessarily one that is coherent or unambiguous. So I would recommend removing the current Formal Grammars from the definition of Lojban, and replacing them by one -- an explicit (3') -- that more credibly represents actual human language (but is unambiguous etc.).
>
> The only problem with that is that we don't have anyone else besides
> yourself competent enough to give an explicit (3'). I wouldn't even
> know what (3') has to look like. We can only do what we know how to
> do.

Even if this is true, the goal of formulating an explicit (3') is
surely one the community should have, even if unable to achieve it
yet.

But starting to tackle (3') is not so daunting:
Step 1: What is the least clunky way of getting unambiguously from
phonological words to logical form -- from the phonological words of
Lojban sentences to the logical forms of Lojban sentences (with the
notion of Lojban sentence defined by usage or consensus)? Any
loglanger could have a stab at tackling this.
Step 2: Identify any devices that are absent from natlangs.
Step 3: Redo Step 1, without using devices identified in Step 2.

Reflecting on this further, during the couple of weeks it's taken for
me to find the time to finish this reply, I would suggest that
*official*, *definitional* specification of the grammar consist only
of a set of sentences defined as pairings of phonological and logical
forms (ideally, consistent with the 'monoparsing' precept that to
every phonological form there must correspond no more than one logical
form). Then, any rule set that generates that set of pairings would be
deemed to count as a valid grammar of Lojban, and then from among the
valid grammars we could seek the one(s) that are closest to those
internalized by human speakers.

> Ok, but in Lojban there's almost a one-to-one match between
> phonological and syntactic words.
>
>
> That remains to be seen, because there isn't yet an explicit real syntax for Lojban. However, it's perfectly possible that in Lojban, phonology--syntax mismatches are rare.
>
>
> The only mismatch I'm aware of is "ybu", which is treated as a syntactic word even though phonologically it would break down into the hesitation "y" and the phonological word "bu".

We currently don't have a clear idea of what syntactic words Lojban
has, where by "syntactic word" I mean ingredients of logicosyntactic
form, the form that encodes logical structure. Some phonological words
seem to correspond to chunks of logical structure rather than single
nodes, and there will be instances of nodes in logical structure that
don't correspond to anything in phonology (-- the most obvious example
is ellipsis, which Lojban sensibly makes heavy use of).

> I'm not sure if choosing a simple Lojban example is going to reveal why you can't have beliefs about binary branching in natlangs.
>
>
> What I meant to say is that I can't see a syntax as an intrinsic feature of a natlang, as opposed to being just a model, which can be a better or worse fit, but it can never be the language.

Are holding for natlangs the view that I propose above for Lojban,
namely that a language is a set of sentences, i.e. form--meaning
correspondences, and although in practice there must be some system
for generating that set, it doesn't matter what the system is, so long
as it generates the right set, and therefore in that sense the system
is not intrinsic to language?

If Yes, I don't agree, but I think the position is coherent enough
that I won't try to dissuade you from it.

If not, do explain again what you mean.

> So I can accept that binary branching syntaxes are more elegant, more perspicuous, etc, I just can't believe they are a feature of the language, just like the description of a house is not a feature of the house. Maybe that's just me not being a linguist.

But could a description of an architectural plan of a house be an
architectural plan of a house? Could a comprehensive explcit
description of a code be a code? Surely yes, and the same for
language.

> . In my case I think the rules matter because (i) to understand the system you need to understand its internal mechanics, and (ii) a speaker knows a certain set of rules. and it's known-rules that are my object of study.
>
>
> Yes, but can't those rules, or rather a part of those rules, be presented as a CS type grammar? I understand that the Lojban formal grammars as they are are something of a monstrosity, but what if they were cleaned up and made more human compatible? You seem to be saying that the very idea of a PEG/YACC/BNF type grammar is counter to a proper grammar, not just the particular poor choices made for the Lojban grammar.

I don't know how suitable PEG/YACC/BNF are for natlangs. I must
ruefully confess I know nothing about PEG, despite all the work you've
done with it. AFAIK linguists in the last half century haven't found
BNF necessary or sufficient for their rules, but my meagre knowledge
doesn't extend to knowing the mathematical properties of BNF and other
actually used formalisms, and the relationships between them.

In denouncing the suitability of PEG/YACC/BNF, I was really meaning to
denounce treating phonological stuff (e.g. phonological words) as
constituents of terminal nodes in syntactic structures. You said that
terminal nodes are actually selmaho and (iirc?) that the 1--1
correspondence between phonological words and selmaho terminal nodes
is not essential. So in that case my objection would not be to CS
grammars per se but only to the idea that a CS grammar can model a
whole grammar rather than just, say, the combinatorics of syntax. So I
reserve judgement on PEG et al: if they can represent logicosyntactic
structure in full, then they have my blessing.

--And.

John E Clifford

unread,

Feb 4, 2015, 11:41:35 AM2/4/15

to loj...@googlegroups.com

As a minor practical suggestion, I note that starting from the logicl form and working back to the phonological realization is, in one repect easier than 'tother way 'round, since you have a certainty for a start. Of course, you then have a variety of ways that things can go, since one form gives rise to a large number of different phonological strings. But each difference is achieved (ideally) in a unique way and a suitable grammar will require marks for exactly those unique steps and no others, for a significant net savings compared to the present systems (it seems to me, who am often annoyed by what seem to be irrelevant requirements put in because they ae useful somewhere sometime).

--And.

--
You received this message because you are subscribed to the Google Groups "lojban" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lojban+unsub...@googlegroups.com.
To post to this group, send email to loj...@googlegroups.com.
Visit this group at http://groups.google.com/group/lojban.
For more options, visit https://groups.google.com/d/optout.

Jorge Llambías

unread,

Feb 4, 2015, 5:05:40 PM2/4/15

to loj...@googlegroups.com

On Wed, Feb 4, 2015 at 12:45 PM, And Rosta <and....@gmail.com> wrote:

But starting to tackle (3') is not so daunting:
Step 1: What is the least clunky way of getting unambiguously from
phonological words to logical form -- from the phonological words of
Lojban sentences to the logical forms of Lojban sentences (with the
notion of Lojban sentence defined by usage or consensus)? Any
loglanger could have a stab at tackling this.

The least clunky (and only) way we have today to do this is parsers+Tersmu.

Step 2: Identify any devices that are absent from natlangs.
Step 3: Redo Step 1, without using devices identified in Step 2.

We have done some of Step 2 by way of reforming our parsing grammars, albeit mostly unofficially for now, though usually the motivation is not explicitly so much that the devices are absent from natlangs but that we dislike them (unfortunately there's also sometimes a tendency to add to the weirdness, but we can hope that common sense will prevail in the end).

Reflecting on this further, during the couple of weeks it's taken for
me to find the time to finish this reply, I would suggest that
*official*, *definitional* specification of the grammar consist only
of a set of sentences defined as pairings of phonological and logical
forms (ideally, consistent with the 'monoparsing' precept that to
every phonological form there must correspond no more than one logical
form).

But how do we identify those sentences if not through some generating algorithm? Or do you mean just a finite list of sample sentences, in which case, where do we get them from?

Then, any rule set that generates that set of pairings would be
deemed to count as a valid grammar of Lojban, and then from among the
valid grammars we could seek the one(s) that are closest to those
internalized by human speakers.

Would it have to be a rule set that generates that set of pairings and only that set, or could it also generate new sentences? I'm not clear on whether you mean the initial set to be a finite sample from which to generalize, or the complete language.

We currently don't have a clear idea of what syntactic words Lojban
has, where by "syntactic word" I mean ingredients of logicosyntactic
form, the form that encodes logical structure. Some phonological words
seem to correspond to chunks of logical structure rather than single
nodes, and there will be instances of nodes in logical structure that
don't correspond to anything in phonology (-- the most obvious example
is ellipsis, which Lojban sensibly makes heavy use of).

Could you give an example of a phonological word that would correspond to a chunk of logical structure? Do you mean something like "pe" possibly being logically equivalent to "poi ke'a co'e" for example? Would that mean that "pe" does not correspond to a syntactic word?

I don't see a problem in considering the empty phonological string as corresponding to a syntactic word, and in fact some of the parsers do exactly that in dealing with terminators. (Not sure if any parser does that yet in dealing with "zo'e", but then current parsers don't know the number of arguments that a predicate has.)

> What I meant to say is that I can't see a syntax as an intrinsic feature of a natlang, as opposed to being just a model, which can be a better or worse fit, but it can never be the language.

Are holding for natlangs the view that I propose above for Lojban,
namely that a language is a set of sentences, i.e. form--meaning
correspondences, and although in practice there must be some system
for generating that set, it doesn't matter what the system is, so long
as it generates the right set, and therefore in that sense the system
is not intrinsic to language?

If Yes, I don't agree, but I think the position is coherent enough
that I won't try to dissuade you from it.

If not, do explain again what you mean.

I don't think a natlang can be a set of sentences because a set is much too precise an object to accurately describe a natlang, which would have to be fuzzy. In any case, I don't know what a natlang is, but I do think that a syntactic theory can only be a model for it and not it.

> So I can accept that binary branching syntaxes are more elegant, more perspicuous, etc, I just can't believe they are a feature of the language, just like the description of a house is not a feature of the house. Maybe that's just me not being a linguist.

But could a description of an architectural plan of a house be an
architectural plan of a house? Could a comprehensive explcit
description of a code be a code? Surely yes, and the same for
language.

Certainly, but there could be two different adequate architectural plans of the same house.

I don't know how suitable PEG/YACC/BNF are for natlangs. I must
ruefully confess I know nothing about PEG, despite all the work you've
done with it. AFAIK linguists in the last half century haven't found
BNF necessary or sufficient for their rules, but my meagre knowledge
doesn't extend to knowing the mathematical properties of BNF and other
actually used formalisms, and the relationships between them.

PEG is basically equivalent to BNF for present purposes, it's just an algorithm for providing a tree structure to a string of terminals. One nice thing about it is that PEGs are necessarily unambiguous, basically by prioritizing the rules that BNF gives unprioritized.

In denouncing the suitability of PEG/YACC/BNF, I was really meaning to
denounce treating phonological stuff (e.g. phonological words) as
constituents of terminal nodes in syntactic structures. You said that
terminal nodes are actually selmaho and (iirc?) that the 1--1
correspondence between phonological words and selmaho terminal nodes
is not essential.

The 1-1 correspondence would be between classes of phonological words and selmaho, since for example "mi" and "do" are two phonological words belonging to the same selmaho KOhA. The correspondence between phonological words and selmaho is irrelevant from the point of view of the "syntax" (in scare quotes), which doesn't care at all about phonological form. The "syntax" only works with selmaho. (Perhaps the ZOI delimiter is an exception to this, since the "syntax" has to identify the final delimiter as being the same phonological word as the initial delimiter.)

So in that case my objection would not be to CS
grammars per se but only to the idea that a CS grammar can model a
whole grammar rather than just, say, the combinatorics of syntax. So I
reserve judgement on PEG et al: if they can represent logicosyntactic
structure in full, then they have my blessing.

They can only model the combinatorics and parse trees, they can't model things like co-referentiality.

Jorge Llambías

unread,

Feb 4, 2015, 5:22:49 PM2/4/15

to loj...@googlegroups.com

On Wed, Feb 4, 2015 at 1:41 PM, 'John E Clifford' via lojban <loj...@googlegroups.com> wrote:

As a minor practical suggestion, I note that starting from the logicl form and working back to the phonological realization is, in one repect easier than 'tother way 'round, since you have a certainty for a start.

Yes, going from the logical form (FOPL?) to _one_ phonological realization in Lojban is trivial.

Of course, you then have a variety of ways that things can go, since one form gives rise to a large number of different phonological strings.

That's the tricky part. For example, it is trivial to go from, say P(x,y) to "ko'a broda ko'e", but then giving rules to generate all the logically equivalent sentences "broda fa ko'a ko'e", "fe ko'e fa ko'a broda", "ko'a cu se se broda fe ko'e", and so on ad infinitum, is non-trivial.

But each difference is achieved (ideally) in a unique way and a suitable grammar will require marks for exactly those unique steps and no others, for a significant net savings compared to the present systems (it seems to me, who am often annoyed by what seem to be irrelevant requirements put in because they ae useful somewhere sometime).

It's not obvious to me that this direction is easier, but nothing is stopping anyone who wants to try it.

And Rosta

unread,

Feb 5, 2015, 2:35:57 AM2/5/15

to loj...@googlegroups.com

Jorge Llambías, On 04/02/2015 22:05:

> On Wed, Feb 4, 2015 at 12:45 PM, And Rosta <and....@gmail.com <mailto:and....@gmail.com>> wrote:
> Reflecting on this further, during the couple of weeks it's taken for
> me to find the time to finish this reply, I would suggest that
> *official*, *definitional* specification of the grammar consist only
> of a set of sentences defined as pairings of phonological and logical
> forms (ideally, consistent with the 'monoparsing' precept that to
> every phonological form there must correspond no more than one logical
> form).
>
> But how do we identify those sentences if not through some generating algorithm?

Yes, through a generating algorithm. But only the output of the algorithm is official, not the algorithm itself.

Also, you could decide that any superset of the official set is also eligible for being decreed official.

> Then, any rule set that generates that set of pairings would be
> deemed to count as a valid grammar of Lojban, and then from among the
> valid grammars we could seek the one(s) that are closest to those
> internalized by human speakers.
>
>
> Would it have to be a rule set that generates that set of pairings
> and only that set, or could it also generate new sentences? I'm not
> clear on whether you mean the initial set to be a finite sample from
> which to generalize, or the complete language.

I mean the initial set is the complete language. But a rule set that generates a superset of the complete language should still be potentially okay.

> Could you give an example of a phonological word that would
> correspond to a chunk of logical structure? Do you mean something
> like "pe" possibly being logically equivalent to "poi ke'a co'e" for
> example?

Yes. Or "ko" fusing imperativity and "do". Or "tu'a X" being "lo su'u X co'e" (or some indefinite counterpart of co'e).

> Would that mean that "pe" does not correspond to a syntactic word?

Yes, it would. Or, it it is an inflectional variant of "poi" before ellipsed "ke'a co'e".

> I don't see a problem in considering the empty phonological string as
> corresponding to a syntactic word, and in fact some of the parsers do
> exactly that in dealing with terminators. (Not sure if any parser
> does that yet in dealing with "zo'e", but then current parsers don't
> know the number of arguments that a predicate has.)

Don't brivla have infinite sumti places currently? It would take a grammar change to know where zo'e has to go.

> I don't think a natlang can be a set of sentences because a set is
> much too precise an object to accurately describe a natlang, which
> would have to be fuzzy.

Well, it is a fuzzy set then. (As a linguistician rather than a mathematician, I tend to assume that sets are fuzzy by default.)

> In any case, I don't know what a natlang is, but I do think that a
> syntactic theory can only be a model for it and not it.

So if I, at the age of 500 after a lifetime spent diligently on this task, present you with a full explication of the rules of English, what is the difference between English and that explication?

> In denouncing the suitability of PEG/YACC/BNF, I was really meaning to
> denounce treating phonological stuff (e.g. phonological words) as
> constituents of terminal nodes in syntactic structures. You said that
> terminal nodes are actually selmaho and (iirc?) that the 1--1
> correspondence between phonological words and selmaho terminal nodes
> is not essential.
>
> The 1-1 correspondence would be between classes of phonological words
> and selmaho, since for example "mi" and "do" are two phonological
> words belonging to the same selmaho KOhA. The correspondence between
> phonological words and selmaho is irrelevant from the point of view
> of the "syntax" (in scare quotes), which doesn't care at all about
> phonological form. The "syntax" only works with selmaho.

Right. So technically the "syntax" is separate from the phonology. But of course in fact the "syntax" isn't a syntax, for all it does is generate a labelled tree with selmaho as its leaves; it doesn't encode logical form. Furthermore, since every selmaho leaf (with the possible exception of terminators) corresponds to a phonological words, the "syntax" looks like it's driven by the sentence phonology.

> So in that case my objection would not be to CS
> grammars per se but only to the idea that a CS grammar can model a
> whole grammar rather than just, say, the combinatorics of syntax. So I
> reserve judgement on PEG et al: if they can represent logicosyntactic
> structure in full, then they have my blessing.
>
> They can only model the combinatorics and parse trees, they can't
> model things like co-referentiality.

The most important thing to model is predicate--argument and binding relations; nothing else really matters, or at least, whatever else there is simply serves the purpose of facilitating the encoding of predicate--argument and binding relations.

--And.

Jorge Llambías

unread,

Feb 5, 2015, 4:14:49 PM2/5/15

to loj...@googlegroups.com

On Thu, Feb 5, 2015 at 4:35 AM, And Rosta <and....@gmail.com> wrote:

So if I, at the age of 500 after a lifetime spent diligently on this task, present you with a full explication of the rules of English, what is the difference between English and that explication?

First I'd have to know what English is, in order to compare, but it seems unlikely that English is the same as a full explication of its rules.

If a zoologist presented me with a full explication of a tiger I would be able to tell it apart from the tiger immediately. I wouldn't be as scared of the explication as of the tiger.

But more importantly, if someone else presented me with their own full explication of the rules of English, using different terminology and different analytic tools, I don't think it would be necessarily the case that one explication had to be better than the other, they could just be two different explications.

The correspondence between
phonological words and selmaho is irrelevant from the point of view
of the "syntax" (in scare quotes), which doesn't care at all about
phonological form. The "syntax" only works with selmaho.

Right. So technically the "syntax" is separate from the phonology. But of course in fact the "syntax" isn't a syntax, for all it does is generate a labelled tree with selmaho as its leaves; it doesn't encode logical form.

Right. But the question is whether it's an aid, an impediment, or neutral in our quest to encode logical form. My impression is that it is an aid.

Furthermore, since every selmaho leaf (with the possible exception of terminators) corresponds to a phonological words, the "syntax" looks like it's driven by the sentence phonology.

Either it's driven by, or drives it, or both. Given a phonological sentence, the "syntax" can break it apart in such a way as to facilitate the identification of the corresponding logical form. And conversely, given a logical form, the "syntax" can be used as a partial guide in the generation of an appropriate phonological form that encodes it.

The most important thing to model is predicate--argument and binding relations; nothing else really matters, or at least, whatever else there is simply serves the purpose of facilitating the encoding of predicate--argument and binding relations.

The "syntax" is pretty good with predicate-argument relations, but poor with binding relations. One important type of binding relation is achieved by repetition of phonological form, but the "syntax" is completely blind to phonological form (in the sense that it can't tell "da" and "de" apart). But it can tell that a given KOhA is an argument of a given BRIVLA for example.

And Rosta

unread,

Feb 6, 2015, 5:36:44 AM2/6/15

to loj...@googlegroups.com

On 5 Feb 2015 21:14, "Jorge Llambías" <jjlla...@gmail.com> wrote:
>
>
>
> On Thu, Feb 5, 2015 at 4:35 AM, And Rosta <and....@gmail.com> wrote:
>>>
>>>
>> So if I, at the age of 500 after a lifetime spent diligently on this task, present you with a full explication of the rules of English, what is the difference between English and that explication?
>
>
> First I'd have to know what English is, in order to compare,

So although you're not sure what it is, you have an idea of what it is that is good enough for you to know what it isn't?

> but it seems unlikely that English is the same as a full explication of its rules.

How about if English is the same as a full explication of a family of sets of rules, one set per idiolect? Or you feel that even an idiolect is not the same as a full explication of its rules? Is the game of chess different from a full explication of its rules? If Yes, is that because there are many different possible explications, or because chess, like tigers, is very different from a set of rules?

> If a zoologist presented me with a full explication of a tiger I would be able to tell it apart from the tiger immediately. I wouldn't be as scared of the explication as of the tiger.

This is easy to explain, for tigers are material and bite, whereas, like language, explications are abstract, immaterial and don't bite.

>
> But more importantly, if someone else presented me with their own full explication of the rules of English, using different terminology and different analytic tools, I don't think it would be necessarily the case that one explication had to be better than the other, they could just be two different explications.

Would it necessarily be the case that neither is better than the other?

Are preferential criteria such as simplicity and knownness (by the speaker's mind) valid?

>> So technically the "syntax" is separate from the phonology. But of course in fact the "syntax" isn't a syntax, for all it does is generate a labelled tree with selmaho as its leaves; it doesn't encode logical form.
>
>
> Right. But the question is whether it's an aid, an impediment, or neutral in our quest to encode logical form. My impression is that it is an aid.

I don't disagree with that; I've noted already that although the "syntax" isn't a syntax, much of it could be recycled into an actual syntax.

>> Furthermore, since every selmaho leaf (with the possible exception of terminators) corresponds to a phonological words, the "syntax" looks like it's driven by the sentence phonology.
>
>
> Either it's driven by, or drives it, or both.

Yes. The important point is that it's not first and foremost driven by the requirement that it should encode logical form; though, it is of course the case that the "syntax" has also been shaped by the idea that in some cloudily understood way it should contribute to the encoding of logical form, hence the pretty good job it does with the (very easy) task of encoding predicate--argument relations.

> The "syntax" is pretty good with predicate-argument relations, but poor with binding relations. One important type of binding relation is achieved by repetition of phonological form, but the "syntax" is completely blind to phonological form (in the sense that it can't tell "da" and "de" apart). But it can tell that a given KOhA is an argument of a given BRIVLA for example.

That's where you'd start with the work of converting "syntax" into syntax.

--And.

Jorge Llambías

unread,

Feb 6, 2015, 4:10:51 PM2/6/15

to loj...@googlegroups.com

On Fri, Feb 6, 2015 at 7:36 AM, And Rosta <and....@gmail.com> wrote:

On 5 Feb 2015 21:14, "Jorge Llambías" <jjlla...@gmail.com> wrote:
>
> First I'd have to know what English is, in order to compare,

So although you're not sure what it is, you have an idea of what it is that is good enough for you to know what it isn't?

Yes, but that's not saying much. I know it's not a dishwasher or a lawn mower for example.

> but it seems unlikely that English is the same as a full explication of its rules.

How about if English is the same as a full explication of a family of sets of rules, one set per idiolect? Or you feel that even an idiolect is not the same as a full explication of its rules? Is the game of chess different from a full explication of its rules? If Yes, is that because there are many different possible explications, or because chess, like tigers, is very different from a set of rules?

I can accept that the game of chess is fully described by its rules. But even for an idiolect, it doesn't seem likely that a finite set of rules would describe it, unless we define the idiolect as being just that which is described by the chosen set of rules, or unless the rules were vague enough to allow for a different set of rules to also describe it well enough..

> But more importantly, if someone else presented me with their own full explication of the rules of English, using different terminology and different analytic tools, I don't think it would be necessarily the case that one explication had to be better than the other, they could just be two different explications.

Would it necessarily be the case that neither is better than the other?

No, one could be a lousy explication, or even obviously wrong.

Are preferential criteria such as simplicity and knownness (by the speaker's mind) valid?

Not sure what you mean by knownness, but sure there are many reasons why one could prefer one explication over another.

> The "syntax" is pretty good with predicate-argument relations, but poor with binding relations. One important type of binding relation is achieved by repetition of phonological form, but the "syntax" is completely blind to phonological form (in the sense that it can't tell "da" and "de" apart). But it can tell that a given KOhA is an argument of a given BRIVLA for example.

That's where you'd start with the work of converting "syntax" into syntax.

Which is where Tersmu and the like come in, a set of rules for converting the phonological sentences for which we don't already have a clear idea what logical form they correspond to into ones for which we do.

And Rosta

unread,

Feb 7, 2015, 11:47:47 AM2/7/15

to loj...@googlegroups.com

On 6 Feb 2015 21:10, "Jorge Llambías" <jjlla...@gmail.com> wrote:
>
>
> On Fri, Feb 6, 2015 at 7:36 AM, And Rosta <and....@gmail.com> wrote:
>>
>> On 5 Feb 2015 21:14, "Jorge Llambías" <jjlla...@gmail.com> wrote:
>> >
>> > First I'd have to know what English is, in order to compare,
>>
>> So although you're not sure what it is, you have an idea of what it is that is good enough for you to know what it isn't?
>
> Yes, but that's not saying much. I know it's not a dishwasher or a lawn mower for example.

Still, you also think you know it's not what I think it is, which I think requires a fuller degree of knowledge than knowing it's not a dishwasher does.

>>
>> > but it seems unlikely that English is the same as a full explication of its rules.
>>
>> How about if English is the same as a full explication of a family of sets of rules, one set per idiolect? Or you feel that even an idiolect is not the same as a full explication of its rules? Is the game of chess different from a full explication of its rules? If Yes, is that because there are many different possible explications, or because chess, like tigers, is very different from a set of rules?
>
> I can accept that the game of chess is fully described by its rules. But even for an idiolect, it doesn't seem likely that a finite set of rules would describe it,

Do have a sense of where the problem lies?

You accept that some explications are better than others, but think no single explication can be solely right. So could you accept that a family of similar explications could be right?

What about Lojban? What's the relationship between it and an explication of it? Is it more like English or more like chess?

--And.

Jorge Llambías

unread,

Feb 7, 2015, 12:26:44 PM2/7/15

to loj...@googlegroups.com

On Sat, Feb 7, 2015 at 1:47 PM, And Rosta <and....@gmail.com> wrote:

On 6 Feb 2015 21:10, "Jorge Llambías" <jjlla...@gmail.com> wrote:
> On Fri, Feb 6, 2015 at 7:36 AM, And Rosta <and....@gmail.com> wrote:
>>
>> So although you're not sure what it is, you have an idea of what it is that is good enough for you to know what it isn't?
>
> Yes, but that's not saying much. I know it's not a dishwasher or a lawn mower for example.

Still, you also think you know it's not what I think it is, which I think requires a fuller degree of knowledge than knowing it's not a dishwasher does.

I suspect rather than think I know. I only know that I know nothing. It just seems extremely unlikely that a finite set of rules can be a complete description of what we call a natlang.

> I can accept that the game of chess is fully described by its rules. But even for an idiolect, it doesn't seem likely that a finite set of rules would describe it,

Do have a sense of where the problem lies?

We don't have an a priori definition of what English is, so any set of rules purporting to be it is subject to be bettered by a new set of rules, by the usual mechanisms of science.

You accept that some explications are better than others, but think no single explication can be solely right. So could you accept that a family of similar explications could be right?

What about Lojban? What's the relationship between it and an explication of it? Is it more like English or more like chess?

Lojban design is more like chess, but if and when it becomes an actual human language, it will (I think inevitably) stop being like chess and be more like English.

Gleki Arxokuna

unread,

Feb 7, 2015, 1:07:02 PM2/7/15

to loj...@googlegroups.com

2015-02-07 20:26 GMT+03:00 Jorge Llambías <jjlla...@gmail.com>:

Lojban design is more like chess, but if and when it becomes an actual human language, it will (I think inevitably) stop being like chess and be more like English.

Why not both? An intellectual game like all languages to some degree.

Jorge Llambías

unread,

Feb 7, 2015, 1:14:38 PM2/7/15

to loj...@googlegroups.com

Because once the human language ability is engaged, it takes over from any externally imposed rules.

Reply all

Reply to author

Forward

0 new messages