Hi Martin,
In step 1, I don't think it's good to make the assumption that the
user will know the prefix and suffix replacement tables. Typically,
those are among the last things a Plover user learns. The canonical
dictionary file should assume that users are not relying on the lookup
tables, but are simply sounding everything out by ear.
I think step 2 puts too strong an emphasis on syllabification. I know
linguistics is your field Marin, but I think steno has some natural
characteristics that require letting go of certain basic linguistic
ideas, such as syllabification.
In steno, the 'most predictable' stroke is not the one that follows
syllabification, it's the one that jams the most possible sounds into
a single stroke, without dropping any vowels or inverting any
consonants. In other words, I think a stroke is 'most predictable' if
the user can place each finger on the keyboard from left to right,
asking themselves, "is there another sound I can fit before I run out
of fingers?" and placing each finger down if there is another
available sound that can be added. That's just the nature of steno -
it's not truly a syllable-based thing.
In step 2.2 (and related steps), I don't think you should bother
trying to algorithmically implement disambiguators. Any disambiguator
is most properly thought of as a 'brief'. Disambiguation is what steno
people actually consider to be their steno 'theory'. This is an idea I
got wrong originally when I started learning from Mirabai, when I
thought that the keyboard layout was the 'theory'. In fact, briefs are
the theory. So the difference between Plover and other systems such as
Phoenix is simply their approach to disambiguation. Because of that,
I'd leave that whole question out, and simply come up with the
canonical set of dictionary entries that would be essentially the same
for all theories that use the same keyboard layout.
So basically I'm suggesting that you simplify your algorithm, and
don't even try to resolve any ambiguities. Just construct the
canonical dictionary file, and let each user design their own theory
of brief forms on their own. This would have the benefit of being
extremely useful for generating dictionaries for other languages.
If you want to go beyond that and also calculate disambiguation
entries, I'd suggest making your approach generic, i.e. just give
people the option to define their own set of disambiguation rules,
such as using spelling to disambiguate between 'heart' and 'hart', and
have your code implement their rules to produce a working dictionary
file. This would have the tremendous benefit for English, of making it
easy for people to design and test their own theories of brief forms.
But it would also make it easy for people using other languages to
construct their own complete steno theory.
The problem with designing a new steno theory by hand (i.e. a theory
of brief forms) is that you can think everything's going so well,
until you suddenly run up against some horrible contradiction that
invalidates some of your basic rules. That's one reason why Plover and
the other steno systems are so great - for the most part they manage
to resolve conflicts and produce briefs without contradicting too many
of their own rules.
Be well,
Zack
> --
> You received this message because you are subscribed to the Google Groups
> "Plover" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
ploversteno...@googlegroups.com.
> For more options, visit
https://groups.google.com/d/optout.
--
Zack Brown