Hi All—
I think I’ve nearly got the timing issues worked out with speech and 10 Hz logging. It was a bit harder than I thought, because the microSD card occasionally pauses for about 75 ms before allowing the next command to continue. There’s no way to avoid that pause, so our audio buffer has to last at least 75 ms. Tom’s suggestion to go to a lower rate gets us most of the way there—up to 64 ms. Other changes I’ve been experimenting with will allow us, I think, to change to the “tiny” FatFS mode, which frees up enough memory to double the size of the audio buffer, giving 128 ms. I think this will be sufficient.
So—if the timing issues are actually solved, then the next question in my mind is how to make the speech mechanism as versatile as possible. Right now, in response to messages from the GPS module, the firmware creates a string of characters, e.g., “123.4”. This string is then handled one character at a time in UBX_Task, where it is turned into a series of words, e.g., “one two three point four”. The asynchronous nature of this system—that the string is created in one place and then handled without blocking in another place—allows us to keep everything else running while the FlySight is speaking.
One thing I like about the way the string is being handled right now is that it’s very easy to use for some of the most common cases, i.e., if you want the FlySight to speak a number, then you need only write that number into the string in the obvious way—there is no need to add a routine for converting numbers into a series of filenames to be spoken.
However, what I’m wondering is how that will extend to navigational messages. If we want the FlySight, for example, to say, “Turn left 90 degrees,” then we need to turn those words into a series of symbols. One way to do it might be to use letters. For example, we could set the string to “tl90d” and add a few files to the AUDIO folder:
t.wav “turn”
l.wav “left”
d.wav “degrees”
Then without modifying the way we parse the string, we would hear “turn left nine zero degrees”. Pretty close. I suppose my only concern is how this might be extended, e.g., if we wanted the FlySight to say “turn left ninety degrees” instead. Maybe we could come up with single-character symbols for each word we want it to say, but I wonder if that is ultimately going to be limiting.
That said, single-character symbols are a very simple way of handling things, and until we start to run out of symbols, maybe there’s no good reason to go to a more complex system. These limitations really only apply to “procedurally generated” speech. There will be a lot more flexibility, e.g., with spoken alarms—the user should be able to specify any filename and a condition under which it should be spoken, without having the same limitations we’re talking about above. If you wanted the FlySight to read a poem to you at pull time, you’d just have to record the poem as a single file and tell the FlySight to play it at a particular altitude.
Any thoughts?
Michael
However, what I’m wondering is how that will extend to navigational messages. If we want the FlySight, for example, to say, “Turn left 90 degrees,” then we need to turn those words into a series of symbols. One way to do it might be to use letters. For example, we could set the string to “tl90d” and add a few files to the AUDIO folder:
t.wav “turn”
l.wav “left”
d.wav “degrees”
Then without modifying the way we parse the string, we would hear “turn left nine zero degrees”. Pretty close. I suppose my only concern is how this might be extended, e.g., if we wanted the FlySight to say “turn left ninety degrees” instead. Maybe we could come up with single-character symbols for each word we want it to say, but I wonder if that is ultimately going to be limiting.
Any thoughts?
--
You received this message because you are subscribed to the Google Groups "FlySight Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flysight-dev...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I’ve just merged a pull request that includes a lot of changes related to audio/log timing. The full description is here:
https://github.com/flysight/flysight/pull/35
I set up a few pins on PORTF to indicate two major error conditions:
1. An audio buffer overrun. In the past, these have resulted in occasional “clipped” speech.
2. A log buffer overrun. Essentially, this is what was causing invalid lines in the CSV files previously.
With these pins set up, I was able to hook the board up to a logic analyzer and run a long “torture” test with a particularly long audio file being played repeatedly. In about 3 hours of logging, there were no errors at all with the logging rate set to 10 Hz, so I think we’ve finally nailed down the timing issues.
As part of these changes, I’ve reduced the audio sample rate from 31250 Hz down to 7812 Hz (one quarter). With interpolation, I don’t think there is a significant change in audio quality, but I would welcome any input you guys have. The updated audio files are included in the GitHub repository.
I’m going to keep speech very simple for now. I’ll produce a couple of extra files aimed at navigation, and then we’ll see if there is any need to make things more complex.
Michael