On Tue, Jan 20, 2015 at 9:47 AM, And Rosta <and....@gmail.com> wrote:On 20 Jan 2015 08:41, "guskant" <gusni...@gmail.com> wrote:
> I still don't understand how a definition of the term "language" could
> bring any damage to Lojban,It's because it saddles Lojban with a formal grammar, which, since formal grammars aren't ingredients of human languages, serves as an impediment, a useless encumbrance, and lacks an explicit actual grammar, possession of which should be a sine qua non for a loglang. (To Usagists, this is not really relevant, because for them the True Grammar would be the implicit actual grammar that inheres in usage.) It's a remediable situation: BPFK could write an explicit actual grammar, and the formal grammars could be discarded as the worthless junk they are. (Not everything in the formal grammar is worthless junk, of course; some of it would be the basis for the actual grammar.) Maybe the formal grammar plus Martin's Tersmu might jointly be tantamount to an actual grammar, but the formal grammar bit deviates gratuitously from the syntax of human languages and could not ever plausibly be a model of an actual speaker's syntax. (I think Robin once said he believed he did use the formal grammar when spontaneously producing and comprehending utterances, but if that is true then I think he must have been using raw brute force brain power, rather than the human language faculty.)
Would it be fair to say that what an actual grammar should do is, given some input of sound or written characters, tell us how to:(1) convert the input into a string of phonemes(2) convert the string of phonemes into a string of words(3) determine a tree structure for the string of words(4) determine which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates
On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llambías <jjlla...@gmail.com> wrote:Would it be fair to say that what an actual grammar should do is, given some input of sound or written characters, tell us how to:(1) convert the input into a string of phonemes(2) convert the string of phonemes into a string of words(3) determine a tree structure for the string of words(4) determine which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicatesRather:(1') convert the input into a string [or perhaps tree] of phonemes(2') convert the string [or perhaps tree] of phonemes into a string [or perhaps (prosodic) tree] of phonological words(3') map the tree of phonological words to a structure of syntactic 'words'/'nodes', which structure will specify which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates
> If that's more or less on track, then we can say that the YACC/EBNF formal grammars do (3). The PEG grammar does (2) and (3). Martin's tersmu is trying to do (4). I would agree that the way our formal grammars do (3) is probably not much like the way our brains do (3), but I'm not sure I see what alternative we have.Right. So I think (3) is not a valid step.
(3') should be doable, partly from Tersmu and partly by using some natural language formalism to analyse the syntax (e.g. at minimum make all phrases headed and forbid unary branching; binary branching would be a bonus if it could be managed).
Jorge Llambías, On 20/01/2015 19:38:
On Tue, Jan 20, 2015 at 3:28 PM, And Rosta <and....@gmail.com <mailto:and....@gmail.com>> wrote:
On Tue, Jan 20, 2015 at 2:59 PM, Jorge Llambías <jjlla...@gmail.com <mailto:jjlla...@gmail.com>> wrote:
Would it be fair to say that what an actual grammar should do is, given some input of sound or written characters, tell us how to:
(1) convert the input into a string of phonemes
(2) convert the string of phonemes into a string of words
(3) determine a tree structure for the string of words
(4) determine which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates
Rather:
(1') convert the input into a string [or perhaps tree] of phonemes
(2') convert the string [or perhaps tree] of phonemes into a string [or perhaps (prosodic) tree] of phonological words
(3') map the tree of phonological words to a structure of syntactic 'words'/'nodes', which structure will specify which nodes of the tree are terms, which nodes are predicates, which terms are co-referring, and which terms are arguments of which predicates
You seem to have just merged (2) and (3) into (2'),
No, I meant (2') to be just a restatement of (2), with the added acknowledgement that in human languages there is tree-like phonological structure above the word level -- i.e. prosodic phonology, which yields intonation phrases and so forth. (Google "prosodic phonology", but don't get sidetracked, because it's orthogonal to my point.) I phrased it hedgily because of course the formal definition of Lojban delibrately eschews phonological structure beyond mere phoneme strings. But there is nothing of (3) in (2').
which may be more general, but in the particular case of Lojban we
know that (2') can be achieved in two independent steps, one step
that takes any string of phonemes and unambiguously dissects it into
a string of words (possibly including non-lojban words),
yes
and a second step that takes the resulting string of words as input
and unambiguously gives a unique tree structure for them (or else
rejects the string of words as ungrammatical).
No. The second step (my (3')) takes the string of phonological words but it doesn't give a *syntactic* tree structure whose terminal nodes are phonological words, which is what I take "gives a tree structure for them" to mean. Not every syntactic node need correspond to a phonological one (e.g. ellipsis, which Lojban uses) and a phonological word can correspond to more than one syntactic one (e.g. English _you're_ is one phonological word corresponding to a sequence of a pronoun and an auxiliary).
Rather, step (3') uses the rules that define correspondences between elements of the sentence's phonology and elements of the sentence's syntax, to find a sentence syntax that -- in Lojban's case, uniquely -- licitly corresponds to the sentence's phonology.
Step (3') yields something like Tersmu output, probably augmented by some purely syntactic (i.e. without logical import) structure. I think that can and should be done without reference to the formal grammars.
> If that's more or less on track, then we can say that the YACC/EBNF formal grammars do (3). The PEG grammar does (2) and (3). Martin's tersmu is trying to do (4). I would agree that the way our formal grammars do (3) is probably not much like the way our brains do (3), but I'm not sure I see what alternative we have.
Right. So I think (3) is not a valid step.
But why is it invalid if it achieves the desired result?
It just doesn't yield a human language. And to the (considerable) extent to which Lojban counts as a human language, it is working despite (3) rather than because of it.
The current PEG doesn't produce binary branching exclusively,
although it can probably be tweaked to do that by adding many
intermediate rules. Why is unary branching bad?
Human languages seem not to avail themselves of it; unary branching constitutes a superfluous richness of structural possibilities.
The first rule means that a "statement" node can unary branch into a "statement-1" node, or binary branch into "prenex" and "statement" nodes. The PEG could instead just be:
statement <- statement-2 (I-clause joik-jek statement-2?)* / prenex statement
and completely bypass the statement-1 node, which is indeed superfluous. The PEG can probably be re-written so as to eliminate all unary branching, although there may be a price in clarity.
There are many rules where one of the branches is optional, so that
would result either in an empty leaf or a unary branch.
Say you've got an optionally transitive/intransitive verb, such as English _swallow_. When it has an object, they jointly form a binary branching phrase. When it lacks an object, then there is no need for any branching; so for example _I swallow_ could be a binary phrase whose constituents do not themselves branch. (It's true that many models of syntax do allow unary branching precisely when the daughter node is terminal, so rather than argue over that, let me instead say that it's unary branching with a nonterminal node that is superfluous.)
Would you want binary branching all the way down to phonemes, or just
to words?
Syntactic words and phonemes don't exist on the same plane; phonemes don't comprise syntactic words; syntactic words don't consist of phonemes.
I think binary branching in syntax has many virtues, and I believe natlang syntax is binary branching (-- English for sure; other languages - probably), but it's not the case that all right-minded linguisticians share that view. I myself don't think that phonological structure above or below the word level is binary branching, but others do; either way, the nature of phonological structure is not really germane.
Jorge Llambías, On 21/01/2015 12:33:On Tue, Jan 20, 2015 at 8:35 PM, And Rosta <and....@gmail.com <mailto:and....@gmail.com>> wrote:
Step (3') yields something like Tersmu output, probably augmented by some purely syntactic (i.e. without logical import) structure. I think that can and should be done without reference to the formal grammars.
But Tersmu output is basically FOPL, which has its own formal grammar
(on which Lojban's formal grammar is based). I still don't see what
problems formal grammars create.
(3') must certainly involve a grammar, and I can't think of any sense in which a grammar could meaningfully be called 'informal', so I'm happy to call that grammar 'formal'. But it differs from the CS (or at least the Lojban) notion primarily in not having phonological objects as any of its nodes and secondarily in not necessarily being simply a labelled bracketing of a string.
To the extent that Lojban is a language, (3) doesn't really constitute any part of Lojban (despite the mistaken belief of many Lojbanists to the contrary). Also, to the extent that Lojban is a language, there exists an implicit version of (3'), albeit not necessarily one that is coherent or unambiguous. So I would recommend removing the current Formal Grammars from the definition of Lojban, and replacing them by one -- an explicit (3') -- that more credibly represents actual human language (but is unambiguous etc.).
Also questionable is the extent to which a nonterminal node can have properties/labels not simply derived from the label of the head daughter: the range of views among syntacticians is too hard to summarize in one sentence here, but certainly one does not come across syntactic trees for natlang sentences with a pattern of labellings resembling Lojban's, i.e. where the relationship between labels on the mother and the daughters is unconstrained.
Unary branches don't do
anything useful, but are they harmful other than in cluttering the
tree with superfluous nodes?
They're harmless clutter if there's no contrast with a version of the tree where mother and singleton daughter merge into the same node. You need to consider the branching issue together with the labelling issue. If mother and head-daughter have the same label, then the redundancy of unary branching is plain.
Ok, but in Lojban there's almost a one-to-one match between
phonological and syntactic words.
That remains to be seen, because there isn't yet an explicit real syntax for Lojban. However, it's perfectly possible that in Lojban, phonology--syntax mismatches are rare.
I'm not sure if choosing a simple Lojban example is going to reveal why you can't have beliefs about binary branching in natlangs.
Syntax is a set of rules for combining the combinatorial units of syntax in ways that are combinatorially licit and that combine the units' phonological forms and their meanings. I suspect (but excuse me if I'm mistaken) that for you every set of rules that defines the correct set of sentences is equally valid, so that so long as the rules match the right sentence sounds to the right sentence meanings, it doesn't matter what the intermediate structure is like; if the syntactician has a job, it is to work out *a* set of rules, but there is no reason to think there is only one correct set of rules. In contrast, pretty much all linguisticians think (but not always for the same reasons) that of the sets of rules that generate the same, correct, set of sentences, some of those sets are right and some are wrong or at least some are righter and some are wronger
. In my case I think the rules matter because (i) to understand the system you need to understand its internal mechanics, and (ii) a speaker knows a certain set of rules. and it's known-rules that are my object of study.
But starting to tackle (3') is not so daunting:
Step 1: What is the least clunky way of getting unambiguously from
phonological words to logical form -- from the phonological words of
Lojban sentences to the logical forms of Lojban sentences (with the
notion of Lojban sentence defined by usage or consensus)? Any
loglanger could have a stab at tackling this.
Step 2: Identify any devices that are absent from natlangs.
Step 3: Redo Step 1, without using devices identified in Step 2.
Reflecting on this further, during the couple of weeks it's taken for
me to find the time to finish this reply, I would suggest that
*official*, *definitional* specification of the grammar consist only
of a set of sentences defined as pairings of phonological and logical
forms (ideally, consistent with the 'monoparsing' precept that to
every phonological form there must correspond no more than one logical
form).
Then, any rule set that generates that set of pairings would be
deemed to count as a valid grammar of Lojban, and then from among the
valid grammars we could seek the one(s) that are closest to those
internalized by human speakers.
We currently don't have a clear idea of what syntactic words Lojban
has, where by "syntactic word" I mean ingredients of logicosyntactic
form, the form that encodes logical structure. Some phonological words
seem to correspond to chunks of logical structure rather than single
nodes, and there will be instances of nodes in logical structure that
don't correspond to anything in phonology (-- the most obvious example
is ellipsis, which Lojban sensibly makes heavy use of).
> What I meant to say is that I can't see a syntax as an intrinsic feature of a natlang, as opposed to being just a model, which can be a better or worse fit, but it can never be the language.
Are holding for natlangs the view that I propose above for Lojban,
namely that a language is a set of sentences, i.e. form--meaning
correspondences, and although in practice there must be some system
for generating that set, it doesn't matter what the system is, so long
as it generates the right set, and therefore in that sense the system
is not intrinsic to language?
If Yes, I don't agree, but I think the position is coherent enough
that I won't try to dissuade you from it.
If not, do explain again what you mean.
> So I can accept that binary branching syntaxes are more elegant, more perspicuous, etc, I just can't believe they are a feature of the language, just like the description of a house is not a feature of the house. Maybe that's just me not being a linguist.
But could a description of an architectural plan of a house be an
architectural plan of a house? Could a comprehensive explcit
description of a code be a code? Surely yes, and the same for
language.
I don't know how suitable PEG/YACC/BNF are for natlangs. I must
ruefully confess I know nothing about PEG, despite all the work you've
done with it. AFAIK linguists in the last half century haven't found
BNF necessary or sufficient for their rules, but my meagre knowledge
doesn't extend to knowing the mathematical properties of BNF and other
actually used formalisms, and the relationships between them.
In denouncing the suitability of PEG/YACC/BNF, I was really meaning to
denounce treating phonological stuff (e.g. phonological words) as
constituents of terminal nodes in syntactic structures. You said that
terminal nodes are actually selmaho and (iirc?) that the 1--1
correspondence between phonological words and selmaho terminal nodes
is not essential.
So in that case my objection would not be to CS
grammars per se but only to the idea that a CS grammar can model a
whole grammar rather than just, say, the combinatorics of syntax. So I
reserve judgement on PEG et al: if they can represent logicosyntactic
structure in full, then they have my blessing.
As a minor practical suggestion, I note that starting from the logicl form and working back to the phonological realization is, in one repect easier than 'tother way 'round, since you have a certainty for a start.
Of course, you then have a variety of ways that things can go, since one form gives rise to a large number of different phonological strings.
But each difference is achieved (ideally) in a unique way and a suitable grammar will require marks for exactly those unique steps and no others, for a significant net savings compared to the present systems (it seems to me, who am often annoyed by what seem to be irrelevant requirements put in because they ae useful somewhere sometime).
So if I, at the age of 500 after a lifetime spent diligently on this task, present you with a full explication of the rules of English, what is the difference between English and that explication?
The correspondence between
phonological words and selmaho is irrelevant from the point of view
of the "syntax" (in scare quotes), which doesn't care at all about
phonological form. The "syntax" only works with selmaho.
Right. So technically the "syntax" is separate from the phonology. But of course in fact the "syntax" isn't a syntax, for all it does is generate a labelled tree with selmaho as its leaves; it doesn't encode logical form.
Furthermore, since every selmaho leaf (with the possible exception of terminators) corresponds to a phonological words, the "syntax" looks like it's driven by the sentence phonology.
The most important thing to model is predicate--argument and binding relations; nothing else really matters, or at least, whatever else there is simply serves the purpose of facilitating the encoding of predicate--argument and binding relations.
On 5 Feb 2015 21:14, "Jorge Llambías" <jjlla...@gmail.com> wrote:
>
>
>
> On Thu, Feb 5, 2015 at 4:35 AM, And Rosta <and....@gmail.com> wrote:
>>>
>>>
>> So if I, at the age of 500 after a lifetime spent diligently on this task, present you with a full explication of the rules of English, what is the difference between English and that explication?
>
>
> First I'd have to know what English is, in order to compare,
So although you're not sure what it is, you have an idea of what it is that is good enough for you to know what it isn't?
> but it seems unlikely that English is the same as a full explication of its rules.
How about if English is the same as a full explication of a family of sets of rules, one set per idiolect? Or you feel that even an idiolect is not the same as a full explication of its rules? Is the game of chess different from a full explication of its rules? If Yes, is that because there are many different possible explications, or because chess, like tigers, is very different from a set of rules?
> If a zoologist presented me with a full explication of a tiger I would be able to tell it apart from the tiger immediately. I wouldn't be as scared of the explication as of the tiger.
This is easy to explain, for tigers are material and bite, whereas, like language, explications are abstract, immaterial and don't bite.
>
> But more importantly, if someone else presented me with their own full explication of the rules of English, using different terminology and different analytic tools, I don't think it would be necessarily the case that one explication had to be better than the other, they could just be two different explications.
Would it necessarily be the case that neither is better than the other?
Are preferential criteria such as simplicity and knownness (by the speaker's mind) valid?
>> So technically the "syntax" is separate from the phonology. But of course in fact the "syntax" isn't a syntax, for all it does is generate a labelled tree with selmaho as its leaves; it doesn't encode logical form.
>
>
> Right. But the question is whether it's an aid, an impediment, or neutral in our quest to encode logical form. My impression is that it is an aid.
I don't disagree with that; I've noted already that although the "syntax" isn't a syntax, much of it could be recycled into an actual syntax.
>> Furthermore, since every selmaho leaf (with the possible exception of terminators) corresponds to a phonological words, the "syntax" looks like it's driven by the sentence phonology.
>
>
> Either it's driven by, or drives it, or both.
Yes. The important point is that it's not first and foremost driven by the requirement that it should encode logical form; though, it is of course the case that the "syntax" has also been shaped by the idea that in some cloudily understood way it should contribute to the encoding of logical form, hence the pretty good job it does with the (very easy) task of encoding predicate--argument relations.
> The "syntax" is pretty good with predicate-argument relations, but poor with binding relations. One important type of binding relation is achieved by repetition of phonological form, but the "syntax" is completely blind to phonological form (in the sense that it can't tell "da" and "de" apart). But it can tell that a given KOhA is an argument of a given BRIVLA for example.
That's where you'd start with the work of converting "syntax" into syntax.
--And.
On 5 Feb 2015 21:14, "Jorge Llambías" <jjlla...@gmail.com> wrote:
>
> First I'd have to know what English is, in order to compare,So although you're not sure what it is, you have an idea of what it is that is good enough for you to know what it isn't?
> but it seems unlikely that English is the same as a full explication of its rules.
How about if English is the same as a full explication of a family of sets of rules, one set per idiolect? Or you feel that even an idiolect is not the same as a full explication of its rules? Is the game of chess different from a full explication of its rules? If Yes, is that because there are many different possible explications, or because chess, like tigers, is very different from a set of rules?
> But more importantly, if someone else presented me with their own full explication of the rules of English, using different terminology and different analytic tools, I don't think it would be necessarily the case that one explication had to be better than the other, they could just be two different explications.
Would it necessarily be the case that neither is better than the other?
Are preferential criteria such as simplicity and knownness (by the speaker's mind) valid?
> The "syntax" is pretty good with predicate-argument relations, but poor with binding relations. One important type of binding relation is achieved by repetition of phonological form, but the "syntax" is completely blind to phonological form (in the sense that it can't tell "da" and "de" apart). But it can tell that a given KOhA is an argument of a given BRIVLA for example.
That's where you'd start with the work of converting "syntax" into syntax.
On 6 Feb 2015 21:10, "Jorge Llambías" <jjlla...@gmail.com> wrote:
>
>
> On Fri, Feb 6, 2015 at 7:36 AM, And Rosta <and....@gmail.com> wrote:
>>
>> On 5 Feb 2015 21:14, "Jorge Llambías" <jjlla...@gmail.com> wrote:
>> >
>> > First I'd have to know what English is, in order to compare,
>>
>> So although you're not sure what it is, you have an idea of what it is that is good enough for you to know what it isn't?
>
> Yes, but that's not saying much. I know it's not a dishwasher or a lawn mower for example.
Still, you also think you know it's not what I think it is, which I think requires a fuller degree of knowledge than knowing it's not a dishwasher does.
>>
>> > but it seems unlikely that English is the same as a full explication of its rules.
>>
>> How about if English is the same as a full explication of a family of sets of rules, one set per idiolect? Or you feel that even an idiolect is not the same as a full explication of its rules? Is the game of chess different from a full explication of its rules? If Yes, is that because there are many different possible explications, or because chess, like tigers, is very different from a set of rules?
>
> I can accept that the game of chess is fully described by its rules. But even for an idiolect, it doesn't seem likely that a finite set of rules would describe it,
Do have a sense of where the problem lies?
You accept that some explications are better than others, but think no single explication can be solely right. So could you accept that a family of similar explications could be right?
What about Lojban? What's the relationship between it and an explication of it? Is it more like English or more like chess?
--And.
On 6 Feb 2015 21:10, "Jorge Llambías" <jjlla...@gmail.com> wrote:
> On Fri, Feb 6, 2015 at 7:36 AM, And Rosta <and....@gmail.com> wrote:
>>
>> So although you're not sure what it is, you have an idea of what it is that is good enough for you to know what it isn't?
>
> Yes, but that's not saying much. I know it's not a dishwasher or a lawn mower for example.Still, you also think you know it's not what I think it is, which I think requires a fuller degree of knowledge than knowing it's not a dishwasher does.
> I can accept that the game of chess is fully described by its rules. But even for an idiolect, it doesn't seem likely that a finite set of rules would describe it,
Do have a sense of where the problem lies?
You accept that some explications are better than others, but think no single explication can be solely right. So could you accept that a family of similar explications could be right?
What about Lojban? What's the relationship between it and an explication of it? Is it more like English or more like chess?
Lojban design is more like chess, but if and when it becomes an actual human language, it will (I think inevitably) stop being like chess and be more like English.