My first thought, in reading this, is that what you really want is
to have the state of the program (parse tree and token stream) saved
at the place right before the place the person makes the edit.
Then, after the edit is made, you restore the program to the state
it was right before it parsed the text just edited, and continue
parsing, picking up the newly edited text. This is called a
continuation.[1]
Of course, you don't know where the user is going to be editing, so
you have to save multiple continuations during the parse, and restart
whichever one is closest and earlier than the edit, and throw away
the continuations that are now invalid because they occured later in
the text.
This doesn't have to be an excessive number of continuations. You only
have to save as many continuations as you need to make your program fast
enough. You should have some idea about how much text is "too much" for
performance--how much text makes your parser too slow. So you can
create continuations every N bytes knowing you'll only have to parse M
bytes (where M<=N) after any edit.
This saves you the work of having three different parsers in
exchange for whatever overhead the continuation has. That is going to
depend on other factors in your design, but you'd essentially be
trading memory for speed.
Edits early in the document are of course going to need to reparse
everything after the edit, but:
* you'll parse the most recently changed text first, and you can
provide feedback on this before the parse fully finishes.
* Once you get to the next N boundary in your text (accounting for
additions and deletions made, of course), you can compare the results
from the current parse to the parse tree you got last time, and decide
to stop once they start being identical.
I can't really imagine implementing this in any other way, but I'm
curious to read everyone else's ideas.
1: http://en.wikipedia.org/wiki/Continuation
-Alan
--
.i ko djuno fi le do sevzi
* Should the person make the third parser anyway while making LIhU,
TOI, and TUhE *required and non-elidable*?
* Is there another practical solution for the editor?
Remember, the problem is that the hypothetical text editor is getting
slow because otherwise it needs to parse the entire text for every
edit.
Those are elidable for exactly the reason that every other terminator
in the language is elidable, and in exactly the same way. The only
usual thing is that those can include {.i} inside them, while most
others cannot. (LEhU and ZOI are the other considerations; not also ZO
and the rest of the Magic Words).
.i mi lu mi prami do cusku
Is completely grammatical text, and parses exactly as though a {li'u}
had been included between {do} and {cusku}. You may not like the
style, but I assert that that is only because you have not
internalized the grammar.
Nonetheless, it's probably legitimate to assume that those cases are
rare. Particularly, it seems completely fair (hypothetically) to make
a parser that exhibits sub-optimal performance in those unusual cases
(reparsing all of the above bridi, instead of just the {mi prami do}
part, for instance).
The continuations approach feels more right in general, though.
>
> Based on that, and the fact that it's expected the user is going to be
> typing more, it's reasonable to assume for the sake of as-you-type parsing,
> they aren't elided if they aren't in the text, as it's assumed that the end
> of current input is not the end of text.
>
> In something like {lu ko'a broda to brodi ko'e li'u}, the {li'u} marks the
> end of the quoted text, so you'd have to allow for that....
>
>>
>> * Should the person make the third parser anyway while making LIhU,
>> TOI, and TUhE *required and non-elidable*?
>
> I say yes, but since that's not official, I should say no. Then again, if
> the third parser /assumes/ non-elidability, I doubt it will cause problems.
>
> Alternatively, you can cause the third parser to assume current-end-of-input
> is always equal to terminate-everything-unterminated, and that should work
> out fine.
>
>>
>> * Is there another practical solution for the editor?
>
> .alyn.'s idea sounds pretty good to me.
>
>>
>> Remember, the problem is that the hypothetical text editor is getting
>> slow because otherwise it needs to parse the entire text for every
>> edit.
>
> Something tells me this "hypothetical" parser isn't very hypothetical. :D
>
> --
> mu'o mi'e .aionys.
>
> .i.a'o.e'e ko cmima le bende pe lo pilno be denpa bu .i doi.luk. mi patfu do
> zo'o
> (Come to the Dot Side! Luke, I am your father. :D )
>
> --
> You received this message because you are subscribed to the Google Groups
> "lojban" group.
> To post to this group, send email to loj...@googlegroups.com.
> To unsubscribe from this group, send email to
> lojban+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/lojban?hl=en.
>
.i mi lu mi prami do cusku
Is completely grammatical text, and parses exactly as though a {li'u}
had been included between {do} and {cusku}.
The ONLY usage I can recall seeing of this type of sentence is when
experts are showing off their mastery of the obscurities! I can parse
them fine, ever since being put through that rite of passage myself,
but I doubt the language would be much poorer without them. OTOH
they'd be very useful on the forum I've been considering making
(perhaps called "lujyjbo") for intentionally convoluted and obscure
Lojban! I thought it could be fun sport if we gathered somewhere far
away from the confused newbies and made Lojban dance with strange
elisions and deep embedding and the layered SE shell game, etc. Even
for that purpose, though, I'm not sure there's anything especially
worthwhile about this quirk! It's not like Lojban has any shortage of
obscurations to play with.
mi'e .telselkik. mu'o
--
http://groups.google.com/group/lujyjbo <-- there it is
> And what is this "rite of passage" of which you
> speak, and how would one go about attempting
> to survive said rite?
I just mean when someone who knows that li'u & toi are elidable says
sentences that elide them, just to watch someone who doesn't know that
yet be confused! I think you've just survived it. Now you can go
subject new people to the knowledge. But, um, maybe you shouldn't!
They are not elided a lot, if only because they need very special
contexts in order to be elidable. I would make them non-elidable if it
was my choice.
> But, is it worth making a text editor *at all* if it
> can't parse *all* Lojban, including Lojban that validly elides LIhU,
> etc.?
I think so, yes. But, there may be another problem you may also have
to deal with. Things like "zo ni'o" and ".i bu" are not sentence
separators. So, will they be handled by the lexer? But not every "zo
ni'o" sequence is a sumti. In "zo zo ni'o", the "ni'o" does start a
new paragraph. You may have to parse the whole thing anyway before you
can tell whether a given "ni'o", ".i", "lu", etc. is "active" or not.
mu'o mi'e xorxes
I'm sorry to report, after reviewing Chapter 3 of the CLL a couple
days ago, I could read that with little trouble. :-(
> Still, as just a simple example of annoying practices (you call it amusing art, I call it annoying practice, agree to disagree, Stela), someone should post in .lokadin.'s evil no space writing style.
>
> .imiCUSkubaula.lojban. , to provide a rather simple example, although to truly do it justice, a more complex jufra should be used.
But but but - isn't it how you are suppoed to understand Lojban from hearing? A constant stream of phonemes, some stressed, some not? Does that mean that speaking Lojban is an annoying pratice? ;)
--
Ecce Jezuch
"In the darkest hole, you'd be well advised
Not to plan my funeral before the body dies" - J. Cantrell
The PEG parser I'm using requires you to write a generator for token
input, which is the first place I'd try putting continuations:
http://gazette.call-cc.org/issues/5.html
Meaning, I'd manage my continuations on the input side, rather than
the output side, because you're right--stuff pops out fully formed.
I suspect I'd have to hack at the parser some after this to make
continuations work as I expected, but I already expected I'd be
learning and fixing this particular parser anyway.
If you can save to and seek back to the input position for a
particular parse, save the state variables of the parser, and save
the syntax tree, it really should be possible.
While I have extensive experience with parsers, I have very little
experience with PEG parsers, so I accept that something about these
parsers may make that difficult. I wouldn't try to do something
like this with a recursive descent parser, because it saves state on
the stack. I might be motivated enough to convert a recursive descent
parser into a continuation passing style parser, which would allow
me to save the stack-based representation of the parse on the heap
if I needed to create a continuation.
> Comparing parse trees and stopping re-parsing when they're
> sufficiently similar is risky, if there is no way to guarantee that
> the syntax tree is exactly the same all the way to the end *without re-
> parsing the whole thing anyway*. As far as I can tell, just because a
> new parse tree starts to look similar to the original tree, the new
> parse tree is not necessarily identical till the end. (Or is that
> actually a property of the Lojban grammar? If it is, only then should
> early stopping by comparison be used.)
>
You're correct. Sufficiently similar is a heuristic that won't work
for all cases. I was suggesting that this trade-off was better than
the other suggested trade-offs for solving this problem. As you're
the one writing this thing, you get to decide which trade-off you
want to deal with. ;-)
What I was thinking is the the parser would run as a separate
thread, and the parse tree in the main thread would contain in it a
marshall object at the current parse location. Occassionally this
marshall object would receive more of the parse tree and a new
marshall object, replacing the old marshall object with the new bit
of parse tree and the place it was still working.
I wasn't thinking of the "all-or-nothing" property of PEG parsers,
as this idea does require the PEG parser to check back in with the
caller from time to time.
> However, continuations *are* indeed the only way that I can now think
> of to implement practically such a text editor *without* requiring
> LIhU, etc. to be at the end of multi-utterance texts.
>
I don't personally elide LIhU, LUhU, TOI, &c myself.
I know that Eclipse can parse python (and presumably other languages)
in its editor. I also know that this parser is not identical to the
way that python parsers itself, as my coworkers have complained to
me that Eclipse flagged a perfectly valid python construct as
invalid when in fact python did something quite useful with the same
construct.
This doesn't stop people from using Eclipse to write python, and
serves as prior art on how people cope (and programs don't cope) with
this sort of thing.
In our case we add little tokens in the code that Eclipse recognizes
but python doesn't, so Eclipse will shut up and python will work.
I am curious to hear how this project goes for you. What platform
are you targeting? What language are you using? What PEG parser
are you trying? No one has tried doing anything like this with
Lojban before, which makes me quite excited.
I believe RATS! is a particular implementation of PEG parsing. So
PEG parsing is a parsing technique, and Rats! is one of the
implementations of that technique.
Robin talks about this issue as it pertains to Lojban with some depth
here:
http://www.digitalkingdom.org/~rlpowell/hobbies/lojban/grammar/
Bryan Ford's Master's Thesis on Packrat Parsers is suprisingly
accessible, including what makes PG (Parsing Grammars) more
expressive than CFG's (Context-free Grammars):
http://pdos.csail.mit.edu/~baford/packrat/thesis/
He does make the assumption you're familiar with parsers in general.
The Abstract on that page is a great summary of what trade-offs you
make with packrat parsers vs Yacc-style parsers.
The book I learned parsing from is the "Dragon Book": _Compilers:
Principles, Techniques, and Tools_. There may be a better book these
days, but I won't be surprised if there isn't.