How to recognize command <dictation> command <dictation>...?

148 views
Skip to first unread message

Jason Veldicott

unread,
Feb 2, 2012, 2:39:43 AM2/2/12
to dragonf...@googlegroups.com
Hi,

I am wanting to recognize:

command <dictation> command <dictation> ...

The code I've used to do this appears below.

The difficulty found in recognizing this is that once the first dictation is recognized, it matches everything as dictation, including subsequent commands.  ie when a word is encountered that matches a command, it doesn't switch back into a command state - the dictation mode gobbles everything.

A solution might be to specify a set of stop words somehow (the commands) to exit dictation, but this would not seem possible unless the Dictation class was extended perhaps.

Another possibility is to replace dictation with a set of choices, but this assumes the word choices are known in advance, which they are not in this case.

If someone could comment on this it would be appreciated.

Thanks

Jason




r1 mapping
# recognize command

command :  action


r2 compound
# recognize command <dictation>

spec = <r1> dictation

extras...


element = repetition( ruleref(r2) )


r3 rule
# recognize repeat( command <dictation> )

__init__(element=element)


grammar.add_rule(r3)

Jason Veldicott

unread,
Feb 2, 2012, 4:04:30 PM2/2/12
to dragonf...@googlegroups.com
Placing repetition within compound spec, why does this match properly:

"<commands_rule><text><commands_rule><text>"

the words c t c t where c is a command and t is text, as (action dictation action dictation), but not this:

"<commands_rule><text>[<commands_rule><text>]"

which recognizes (action dictation), the dictation gobbling the last 3 input words t c t.

It appears it will only invoke the optional as a last resort, greedily processing as much of the input as possible with the non-optional clause.

Thanks

Jason

Jason Veldicott

unread,
Feb 3, 2012, 4:31:21 AM2/3/12
to dragonf...@googlegroups.com
Another problem:

spec: "<text><commands_rule_REPEAT><text>"

will recognize words t c c c t (where t is text and c is a command word) as (dictation command dictation) where the first dictation gobbles t c c, treating the commands as dictation words.  It should instead be recognized as (dictation (command x 3) dictation). 

Is it possible in recognition then to give the commands of a mapping rule priority over text, so that commands are recognized as such rather than as dictation?

Feel free to reply.
 
 

Charlie Tango

unread,
Feb 4, 2012, 6:56:56 PM2/4/12
to dragonf...@googlegroups.com
Hello Jason,

You raise an interesting point here: when is the priority or
precedence of "fixed literal command words" and "dynamic dictation
elements" when the combination is ambiguously specified in a grammar.

To investigate further, I wrote a test/showcase file. It's attached to
this e-mail (although I'm not sure Google Groups can handle e-mails
with attachments). Please take a look at its contents, they are very
much self-documenting, because I made use of doctests.

My conclusion is that both Dragonfly and DNS behave correctly and as
expected for normal (non-ambiguous) grammars. For ambiguous grammars
(i.e. those where a recognized word could either be a literal command
word or part of a dynamic dictation) DNS appears to give the dynamic
dictation precedence over fixed command words.

DNS passes the recognized words in combination with the associated
"rule IDs" to Dragonfly, which then correctly processes them. The
recognition processing logic within Dragonfly is therefore good, and
does what it should do. I can however see no way to affect the
priority DNS gives to the recognized words. The (low-level binary)
grammar definition language used by DNS doesn't appear to contain any
way of changing priorities or precedences.

Best regards,
Charlie T

PS: Don't you like the way I'm testing Dragonfly functionality in the
attachment? No more need for painstakingly reloading a command module
and re-speaking the phrases you want to test over and over again...!

test_multiple_dictation.py

Jason Veldicott

unread,
Feb 8, 2012, 4:27:19 AM2/8/12
to dragonf...@googlegroups.com
Hi Charlie,

Thanks for your clarifying response and source code snippet.  

I'm not sure exactly of the role of DNS in recognition, other than it sending command words or dictation words, which are labelled as such presumably according to the defined grammar. 

However, with "decoding" of these words occurring on the Dragonfly side according to the grammar defined, here it has been possible to modify Dictation (grammar.elements_basic) to give precedence to commands based on the stopword idea (ignores word codes), the source code for which is included below in case it's of use to anyone.

Below that, some code for quickly simulating a recognition.

Regards

Jason



class Dictation2(Dictation):
    
    def __init__(self, name=None, format=True, stopwords = None ):
        Dictation.__init__(self, name, format)  
        self.stopwords = stopwords 
    
    def decode(self, state):
        state.decode_attempt(self)
        
        # Check that at least one word has been dictated, otherwise feel.
 
        if (state.rule() != "dgndictation") or (state.word_rule(0)[0] in self.stopwords):
            state.decode_failure(self)
            return
        
        # Determine how many words have been dictated.
        count = 1
        while (state.rule(count) == "dgndictation") and (not(state.word_rule(count)[0] in self.stopwords)):
            count += 1


etc...




def simRecog():

    s = state_.State([ ("the",1000000),("wizard",1000000),("of",1000000),("oz",1000000)] , grammar._rule_names, get_engine())
    
    for r in grammar._rules:
        if not r.active: continue
        s.initialize_decoding()
        for result in r.decode(s):
            if s.finished():
                root = s.build_parse_tree()
                r.process_recognition(root)
                return root
 

Charles J. Daniels

unread,
Feb 13, 2015, 1:32:10 PM2/13/15
to dragonf...@googlegroups.com
I have been working on this exact issue (among many). So far it's working out pretty nicely. 

- I can add an option to accept and process any further speech beyond a simple command in order to form a chain
- I have a single stop word that will end any free form dictation, allowing the rest to be processed
- I can nest forms two deep using two stop words, one toplevel, one nested

So for instance:

string camel many people bomb space three say people
= string(camel(many people))   people
= "manyPeople"   people

string camel many people then space three say people
= string(camel(many peoople)   people)
= "manyPeople   people"

I'm not entirely sure how well this nests beyond two levels.

One nice thing is that I'm able to add these features through decorators.

def BombRule(CompoundRuleClass):
    CompoundRuleClass.spec += " [bomb [<chain>]]"
    CompoundRuleClass.extras += (Dictation("chain"),)
    _orig_process_recognition = CompoundRuleClass._process_recognition
    def _new_process_recognition(self, node, extras):
        _orig_process_recognition(self, node, extras)
        if extras.has_key("chain"):
            Mimic(*extras["chain"].words).execute()
    CompoundRuleClass._process_recognition = _new_process_recognition
    return CompoundRuleClass

def OptionalBombRule(CompoundRuleClass):
    orig_spec = CompoundRuleClass.spec
    CompoundRuleClass = BombRule(CompoundRuleClass)
    CompoundRuleClass.spec = orig_spec + " [[bomb] [<chain>]]"
    return CompoundRuleClass

def ChainedRule(CompoundRuleClass):
    CompoundRuleClass.spec += " [<chain>]"
    CompoundRuleClass.extras += (Dictation("chain"),)
    _orig_process_recognition = CompoundRuleClass._process_recognition
    def _new_process_recognition(self, node, extras):
        _orig_process_recognition(self, node, extras)
        if extras.has_key("chain"):
            Mimic(*extras["chain"].words).execute()
    CompoundRuleClass._process_recognition = _new_process_recognition
    return CompoundRuleClass

def BombChain(CompoundRuleClass):
    CompoundRuleClass = BombRule(CompoundRuleClass)
    _orig_process_recognition = CompoundRuleClass._process_recognition
    def _new_process_recognition(self, node, extras):
        for i, word in enumerate(extras["bombChain"].words):
            if word == "then":
                extras["bombChain"].words[i] = "bomb"        
        _orig_process_recognition(self, node, extras)
    CompoundRuleClass._process_recognition = _new_process_recognition
    return CompoundRuleClass

In my example above, "string" is a BombChain, because it has its own bomb stopword, and the interpretation is greedy, so the first bomb ends the string, but "string" wishes to holds variable length dictations that need to "bomb", so "string" becomes a ChainBomb, and I say "then" internally. Maybe it's a little rough for some people, because you have to keep mental track, but I can work with that kind of thing.

Here's my "string" rule:

@GrammarRule
@BombChain
class StringRule(CompoundRule):
    spec = "string <bombChain>"
    extras = Dictation("bombChain"),
    def _process_recognition(self, node, extras):
        (Text ("\"") + Mimic(*extras["bombChain"].words) + Text("\"")).execute()

@GrammarRule justs instantiates and adds to the grammar.

I'm not sure if my approach will come across easily to others, but it may at least suggest some ideas.

synkarius

unread,
Feb 15, 2015, 9:28:09 PM2/15/15
to dragonf...@googlegroups.com
Charles, this BombChain technique is pretty interesting. Could you give a few more examples of its usage?

Charles J. Daniels

unread,
Feb 17, 2015, 1:44:08 AM2/17/15
to dragonf...@googlegroups.com
I would like to respond, but I'm running into problems. Like this rule:

@GrammarRule
class MyTest(CompoundRule):
    spec = "my test"
    extras = (Dictation("dictation"), )
    def _process_recognition(self, node, extras):
        sent = "words I have a bomb to give bomb to give bomb space"
        words = sent.split()
        print words
        Mimic(*words).execute()

I made this test so that I could make sure the input was always the same -- I say "my test" and it will mimic that text exactly the same each time. And you know what? I get different results!!! It either takes the first bomb or the last bomb, but it switches when I have no clue why it would. There are fairly large pauses between each test run, and it tends to favor the first over the last. But I can get both. Here watch -- I'll do a test run right now, one on each line:

I have a to give bomb to give bomb space
I have a to give bomb to give bomb space
I have a to give bomb to give bomb space
I have a to give bomb to give bomb space
I have a to give bomb to give bomb space
I have a to give bomb to give bomb space
I have a to give bomb to give bomb space
aI have a bomb to give bomb to give 
I have a to give bomb to give bomb space
I have a to give bomb to give bomb space
 I have a to give bomb to give bomb space
aI have a bomb to give bomb to give 
a
a
I have a to give bomb to give bomb space
I have a to give bomb to give bomb spaceI have a to give bomb to give bomb space I have a to give bomb to give bomb space

and my live test just showed that I can get different results, but it's based on word boundaries!!! However, when I was getting both values it was in an app that is not dragon aware. Okay wait, I seem to be getting some similarly consistant results in that other app (eclipse ide luna) 

oh wait!!! look at this:

aI have a bomb to give bomb to give 
bI have a to give bomb to give bomb space
aI have a bomb to give bomb to give 
bI have a to give bomb to give bomb space

It cares about a vs. b right here in this dragon-aware text box!!! That makes it a bit hard to respond with clear examples, because the whole thing is a little rickety at the moment if it cares about a vs b. Let's try the alphabet:

aI have a bomb to give bomb to give 
bI have a to give bomb to give bomb space
cI have a to give bomb to give bomb space
dI have a to give bomb to give bomb space
eI have a to give bomb to give bomb space
fI have a bomb to give bomb to give bomb space
gI have a bomb to give bomb to give 
hI have a to give bomb to give bomb space
iI have a bomb to give bomb to give bomb space
jI have a to give bomb to give bomb space
kI have a to give bomb to give bomb space
lI have a to give bomb to give bomb space
mI have a to give bomb to give bomb space
nI have a to give bomb to give bomb space
oI have a bomb to give bomb to give 
pI have a to give bomb to give bomb space
qI have a to give bomb to give bomb space
rI have a to give bomb to give bomb space
sI have a to give bomb to give bomb space
tI have a to give bomb to give bomb space
uI have a bomb to give bomb to give bomb space
vI have a to give bomb to give bomb space
wI have a to give bomb to give bomb space
xI have a to give bomb to give bomb space
yI have a to give bomb to give bomb space
zI have a to give bomb to give bomb space

So I realize, it's actually giving THREE different results
ago
fiu
aI have a to give bomb to give bomb space
gI have a to give bomb to give bomb space
oI have a bomb to give bomb to give 
bcdehjklmnpqrstvwxyz
fI have a bomb to give bomb to give bomb space

Nevermind -- I've literally just seen the a prefix give two different results. So I get three different results, and have high consistency but not perfect.

fI have a bomb to give bomb to give bomb space
uI have a bomb to give bomb to give bomb space
aI have a to give bomb to give bomb space
aI have a to give bomb to give bomb space

aI have a to give bomb to give bomb space

(NOTICE-- the last three a-prefix lines are different from my first. I'm starting to think the formatting code has a bug in it)

Context does seem to matter.

With non-deterministic or shadow-y variables involved, it's hard to forumalize these rules.

I'll keep looking into it.

Still, phrases that use a single bomb tend to work well. I've yet to see a bomb be completely skipped if one is present -- what I'm seeing change is the bomb it terminates on.

Charles J. Daniels

unread,
Mar 5, 2015, 3:38:53 AM3/5/15
to dragonf...@googlegroups.com
Okay, so I looked into this more. As I showed above I don't trust rules to be parsed consistently, so I'm taking the parsing into my own hands.

Here is an updated version of my decorator BombRule:


#decorator
def BombRule(CompoundRuleClass):

    _orig_process_recognition
= CompoundRuleClass._process_recognition
   
def _new_process_recognition(self, node, extras):

        words
= []
        bombCount
= 0
       
       
#seek out bomb
       
for name, value in extras.items():
           
if type(value) == NatlinkDictationContainer:
               
if value.words.count("bomb") == 0:
                   
continue # try to find another dictation container with a bomb
               
else:
                    words
= value.words
                    bombCount
= words.count("bomb")
                   
break
               
       
if bombCount:
            bombIndex
= words.index("bomb")
            extras
[name] = NatlinkDictationContainer(words[0:bombIndex])
           
        _orig_process_recognition
(self, node, extras)
       
       
if bombCount:
           
Mimic(*words[bombIndex + 1:]).execute()

           
   
CompoundRuleClass._process_recognition = _new_process_recognition
   
return CompoundRuleClass


This decorator assumes a lot of things, namely it expects that your command ends with a dictation that you're just not sure when it should end. If there is more than one dictation within a single rule decorated with this decorator, the results are undefined and likely not what you want. But if the rule ends with a single dictation, then if you say the word bomb during that dictation, it will in that dictation, so the bomb, and mimic the rest. So then, this does not address any desire to nest such commands, because upon encountering the first of bomb, everything that follows is just starting from scratch with no memory or context of what preceded.

It does address the original desire of command <dictation> command <dictation>, in the form:

command "a dictation block that stops at" bomb command "another dictation block" bomb command "chained as long as you wish" bomb period 


The original poster had a lot of top level rules is my guess but if you had something like this:

<OnlyTopRule> exported = (<RuleOne> | <RuleTwo> | <RuleThree>)+
<RuleOne> = ...
<RuleTwo> = ...
<RuleThree> = ...

Then the parsing might be much better at determining rule boundaries -- I don't know. I also don't trust enough to take the time to find out. My current goal is to just grab whatever long dictation string I manage to build up, and then parse then entire thing myself. By doing that, I won't have to say "bomb" to end dictations, I will just end at known command words, and then use a "literal" command to escape command words.
Reply all
Reply to author
Forward
0 new messages