Potential Ideas and Usability

72 views
Skip to first unread message

Rav

unread,
Jun 7, 2024, 7:29:20 PM6/7/24
to utterlyvoiceusers
Hi there,

Not sure if these are user issues by myself, but I have the following I'd like to share from my experience. I say these already liking this much more than Window's own software! A much quicker response time to voice inputs too
  1. When a command isn't recognized, it seems to almost always just scroll down as a result (i.e. I mis-pronounce open new tab, it just scrolls down on the current page)
  2. I think the most trouble I have is trying to edit mis-understood text. I'm working on the pronunciation for a better input, but is there a way to navigate around entered text?
    1. For example) If wanted to say "Hi mate, the item only hit 1 player instead of two" but it instead entered "High mate, theitemonly hit 1 player instead of two" I'd love a way to jump to the start of a paragraph (with keyboard I always used Ctrl + Shift + Up Arrow to select the prior paragraph and then I just arrow to the side to quickly move the cursor)  Or I'd prefer to quickly move X words to the left or right to more easily just edit the one word there's issues with  . Having to count the words and delete all of it works, just trying to find a way to get it closer to the efficiency of typing
  3. Is there a way to trigger windows key commands? The Ctrl + Shift + Up arrow is a great example above. I also have some applications (i.e. Greenshot for taking screenshots) bond to 2-3 keyboard inputs to open and start. I tried reading the help text, but didn't see how to add my own custom commands of "say XXX" to trigger "X Y Z"
  4. Is there a way to de-clutter some of the #'s when you say "Show"? I.e. if something is within 3-5 pixels of something else it only shows once? I'm assuming the coding is looking for things that are clickable, but pictures or even the icons on the google home page are blasted with #'s
    1. An alternative that may help is if there's a way to enlarge the numbers. I feel like I'm slouchy into my screen each time I say "show" to see what the 4 digit # is
  5. Is there a specific command to use to ensure "Open tab 5" still opens the tab in a browser (I use chrome) rather than tabbing 5 times? I verified I was getting "open tab 5" recognized by the Recognizer, but occasionally it just starts tabbing instead. I haven't found anything related when it occurs or doesn't occur yet
As someone who can't really use the computer for over a year without pain, I'm really excited that your tool may give me the ability to go back to work again! Happy to help share any usability things I find!

Utterly Voice

unread,
Jun 7, 2024, 8:55:36 PM6/7/24
to Rav, utterlyvoiceusers
Hello, thank you for the feedback! It is great to hear that you are finding it useful. Making this application powerful enough to use at work is definitely something we're focused on. Responses to your feedback below.

Page down triggered for failed recognition

I'm assuming this is happening to you within the browser. Most browsers will trigger a page down when you press the space key. This means that when you say something like "open new tub", each of the two spaces in that text trigger a page down. We are considering a "commands-only" interpreter state that you could easily turn on and off while dictating. In the meantime, if there is something in particular that is misrecognized frequently for you, you can update your settings to include command alternates. See the "Command" section of this help file: https://utterlyvoice.com/help/settings-files.

Editing text

The basic and windows modes provide basic text editing commands that work in most applications. For example, "go left word four", "clear right word two", "go left fifteen". However, if you use a particular text editor frequently, and the text editor supports many keyboard shortcuts for editing, you can get much more efficient with your voice commands.You can create a custom mode file with any voice commands that you like. Many good editors support commands for jumping up/down paragraphs, sentences, words, etc. To learn how to create a custom mode, see the "Customize" group of help documents starting with YAML: https://utterlyvoice.com/help/yaml

Triggering keyboard shortcuts

Yes, you can absolutely create any voice commands of your choosing to trigger any keyboard shortcuts. As mentioned above, check out the "Customize" group of help documents. These documents take you step by step towards learning how to customize everything. Here is a high level description:
  • You can define new mode files in your config/modes directory. You can activate and deactivate these modes while you are dictating. Each mode is a collection of commands.
  • Each command has a name that triggers it's execution, and as mentioned above, you can also provide alternates that will trigger execution.
  • Each command lists a sequence of function calls. For a simple keyboard shortcut, this will just be one function call to the keyPress function. However, you can define complex commands that trigger a sequence of actions.
The documentation goes into complete detail for describing how to customize. However, if you take a look at the existing files in config\modes, you can see how all of the default commands are defined as an example to quickly grasp how it works. You will see many commands like the following:

  - name: "go left word"
    description: >-
      Moves the cursor one or more words to the left.
      The optional utterance argument is the number of times the key should be pressed.
      If the argument is not provided,
      the key is pressed once.
    biasFactor: 1.2
    alternates:
      - "go leftward"
    functions:
      - name: "keyRepeat"
        fixedArguments:
          - "control"
          - "left"
        utteranceArguments: 1


You will find details for each of the fields for this command in the documentation, but here is a summary:
  • Either "go left word" or "go leftward" will trigger the command.
  • It triggers a single function called keyRepeat. This function will execute the provided keyboard shortcut a certain number of times according to the single utterance argument provided to the command. See all function descriptions at https://utterlyvoice.com/help/functions.
Many number labels for "show"

Yes, you are noticing some of the tradeoffs for the algorithm we chose. The algorithm is looking for any isolated elements on your screen. When they are very close to each other, they are not recognized as unique. When images are on the screen, many isolated elements will be found. We do believe this is the best algorithm we have seen in any dictation program, but there is definitely room for improvement. We are always looking for ways to improve it. You should definitely use "show" along with "show links" when browsing the web. "Show links" does a better job at finding each clickable link on a webpage.

If you're having trouble reading the labels, it would probably be good for us to add some way to adjust label size in the settings. We added this to our task list for a future version.

Open tab five

I believe you're just using the wrong command. Try "go to tab five". Note that this command only works for values up to 8, because that is the limit of the keyboard shortcut. When you have a lot of tabs open, you might find the "go right/left tab" commands useful. For example, "go right tab twenty".

--
You received this message because you are subscribed to the Google Groups "utterlyvoiceusers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to utterlyvoiceus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/utterlyvoiceusers/5864ad0d-4d99-47f1-8060-a915fe7208c1n%40googlegroups.com.

Rav

unread,
Jun 8, 2024, 1:21:22 PM6/8/24
to utterlyvoiceusers
Thanks for the timely reply!

I was finally able to create my own custom command after about an hour with this info! I think it took me a while originally to understand the documentation (especially as I'm not a software engineer). Here are the things I struggled with so far
  • Understanding which YAML file is active at which time during navigation. I originally built the rule out in the Basic file assuming that was always used, turns out I had to build it in chrome
    • Turns out this is visible in the upper left of my screen where it says "auto-spacing, chrome, etc". I was finally getting an error that said line 1 in the yaml was incorrect but it was actually a new line I made that had 1 extra space between the "-" and the "name
  • I'd love to have a thread for sharing commands to add! I think it's helpful to see other examples as well since I understood how to do most of this by reading + copy pasting your other commands. Plus there may be other shortcuts/commands that we don't know about until someone else posts theirs!
    • I added a few commands below that make editing text for me much easier in addition to understanding your "go left/right word command as well" as well. I'm not sure if there was a default way to delete things that are highlighted so I also added a "backspace" command. I used these to much more quickly select multiple mistyped words to delete. I think your option of "go left word" + "clear left/right word" still works well now that I understand it! I'm just biased as these are the keyboard shortcuts I'm familiar with from typing
      • For example, If it typed "Hi everyone, I'm go tunnel to the party tonight", I would say "select left words 6 to get my cursor quickly to the start of "go tunnel". I'd then say "go left" to unhighlight the other text and have the cursor right before the "g" in "go". Then I'd say "right words 2", then say "backspace", and then I can add the correct text by saying it "going to". It's another command more than yours for getting there, so I'll optimize to use the "go left words" command, but I do like it for highlighting what's about to be deleted
        • "select left words X" = Control + Shift + Left +#
          • Is it possible for me to set this up with the uterranceArgument to be before the "words"? i.e. a command like "select left # words" rather than "select left words #"?
        • "select right words X" = Control + Shift + Right +#
        • "undo" = control + z
        • "redo" = control + y
        • "backspace" = backspace key
  • You were right, I was indeed using the command incorrectly in numerous ways for opening the right tab. BUT now I understand how to add more alternatives for myself in the Chrome document :D
I'll keep working on creating some more commands for other shortcuts I'm used to and sharing here, especially for helping navigate creating text (i.e. Ctrl + B, finding out how to create bullet points, Ctrl + Shift + Up/down arrows for selecting an entire line/paragraph)

My custom commands so far in "Basic"
 - name: "select left words"
    description: >-
      Presses the Ctrl + Shift + Left Arrow key one or more times.

      The optional utterance argument is the number of times the key should be pressed.
      If the argument is not provided,
      the key is pressed once.
    biasFactor: 1.2
    alternates:
      - "control shift left"
      - "control left"
      - "select leftwards"
      - "select the left words"
      - "select the leftwards"
      - "select words left"

    functions:
      - name: "keyRepeat"
        fixedArguments:
          - "control"
          - "shift"
          - "left"
        utteranceArguments: 1
  - name: "select right words"
    description: >-
      Presses the Ctrl + Shift + Right Arrow key one or more times.

      The optional utterance argument is the number of times the key should be pressed.
      If the argument is not provided,
      the key is pressed once.
    biasFactor: 1.2
    alternates:
      - "control shift right"
      - "control right"
      - "select rightwards"
      - "select the right words"
      - "select the rightwards"
      - "select words right"

    functions:
      - name: "keyRepeat"
        fixedArguments:
          - "control"
          - "shift"
          - "right"
        utteranceArguments: 1
  - name: "redo"
    description: >-
      Presses the Ctrl + y key one or more times.

      The optional utterance argument is the number of times the key should be pressed.
      If the argument is not provided,
      the key is pressed once.
    biasFactor: 1.2
    functions:
      - name: "keyRepeat"
        fixedArguments:
          - "control"
          - "y"
        utteranceArguments: 1
  - name: "undo"
    description: >-
      Presses the Ctrl + z keys one or more times.

      The optional utterance argument is the number of times the key should be pressed.
      If the argument is not provided,
      the key is pressed once.
    biasFactor: 1.2
    functions:
      - name: "keyRepeat"
        fixedArguments:
          - "control"
          - "z"
        utteranceArguments: 1
  - name: "backspace"
    description: >-
      Presses the backspace key once.
    biasFactor: 1.2

    functions:
      - name: "keyRepeat"
        fixedArguments:
          - "backspace"
        utteranceArguments: 1

Utterly Voice

unread,
Jun 8, 2024, 2:31:30 PM6/8/24
to Rav, utterlyvoiceusers
Thank you for those details. Understanding things that users get confused about helps us improve the documentation.

Here are some points on your details:
  • The utterance arguments for commands do need to come after the command name in an utterance. This occasionally requires a slightly awkward utterance, but it keeps the behavior of commands, fixed arguments, and utterance arguments easy to understand.
  • Many of the commands you have added to the basic mode actually already exist in the windows mode. For example, say "open help windows", then scroll down to "redo that", "clear left", "select left word", etc. Alternatively, you can just open the config\modes\windows.yaml file. These commands are in the windows mode, because they are windows-specific. Everything in the basic mode will work on any operating system. Later we will provide support for mac/linux, and they will have their own mode files. 
  • Rather than adding many commands to the mode files that we provide, it will be easier for you in the long term if you create your own mode files. You can name these files anything you like; they just have to be in the config\modes directory and use the YAML format. As we release newer versions of the application, we also release newer versions of the mode files. This means that when you upgrade to the latest version, you may need to merge content from your updates to basic mode with the new version. If you keep most of your changes in your own mode files, you will just need to copy your custom mode files in the new version's directory.
  • We are hopeful that users over time will use this group to share custom mode files with each other as attachments. The ability to attach files was a primary motivator for choosing Google Groups for this forum.

Rav

unread,
Jun 14, 2024, 1:05:43 PM6/14/24
to utterlyvoiceusers
Let me rephrase what I was thinking.

Is there a centralized location with all commands? i.e. basic + chrome + windows, etc? I keep making commands that end up already existing because they have a different name

In addition, is there a way to rename a command? For example, I tried creating one named "snap left" wither an alternative of "snap window left". I then realized under windows mode there is one already called "snap window left" BUT it doesn't have an alias of snap left as well. I'd update the Windows mode to include the 2nd alias, but as you stated before, that update would just get wiped during an update. Would the best fix just be to have a custom command (identical function wise) in my own mode but only with the new name instead?

Utterly Voice

unread,
Jun 14, 2024, 2:57:00 PM6/14/24
to Rav, utterlyvoiceusers
There is no way to list all commands in all modes in one list. However, you can list all modes with "open help", and you can list all commands for a given mode with "open help mode name". See this documentation: https://utterlyvoice.com/help/mode-command-help. If you are only using the modes that are active by default, you can see all commands with the following:
  • "open help global"
  • "open help basic"
  • "open help windows"
  • "open help chrome"
To rename a command, you can either edit the command directly in the existing mode file, or create a custom mode with the new name. It is really a matter of preference.

If you edit the existing mode, you will need to reapply this change for future versions. An easy way to do this is to add a comment (perhaps # MY_EDITS) to each command that you edit, so you can easily find each command you changed. When you upgrade to a newer version, nothing will be automatically wiped. You will have both the old and new versions of the application and it's settings. You can just delete the old version once you are happy with the configurations of the new version.

If you create a custom mode with commands for different names, it does make upgrading a little easier. You will just need to copy your custom mode file from your old version to your new version.

We are planning on providing a settings merge tool at some point. This will make it easier to upgrade and keep your old command changes.

Utterly Voice

unread,
Aug 16, 2024, 9:38:06 PM8/16/24
to Rav, utterlyvoiceusers
Update:
You can now adjust the size of labels in the latest version of Utterly Voice. See "labelFontHeight" in https://utterlyvoice.com/help/settings-files.

Utterly Voice

unread,
Aug 16, 2024, 9:56:28 PM8/16/24
to Rav, utterlyvoiceusers
Another update here:
The latest version has a commands-only interpreter state. This works very well to prevent commands that are not recognized correctly from typing content.
Reply all
Reply to author
Forward
0 new messages