Notes on Volumio, Zigbee and speech recognition

Skip to first unread message

John Kenyon

Sep 21, 2021, 10:06:15 PM9/21/21

Here are a few notes from my recent experiments which I shared last Saturday. I posted these on Telegram, but James suggested I cross post here so that the information is searchable for the future.

Music player: Volumio (
Running on raspberry pi 2 with a decent USB DAC. The built-in raspberry pi sound card is very poor. Luckily even cheap USB headphone dongles can provide quite reasonable quality nowadays. This is connected to my main hifi system. If you have a paid Spotify subscription then Volumio can work as a Spotify Connect server. This allows you to play music from any Spotify client (phones, laptops) to Volumio. Volumio also has a simple API which can be used to change volume, skip tracks and return what is currently playing (

Controller 1: IKEA Tradfi Synfonisk - Zigbee remote control (
This is a rotary encoder with a push button which can detect, single, double and triple clicks. (
I installed zigbee2mqtt on a raspberry pi 4 using a cc2531 ( USB stick to interface with the zigbee devices.
It was interesting flashing the cc2531 without a CC debugger, but these instructions were very helpful (
I then wrote a very simple python script which listens for messages from the Symfonisk remote and sends HTTP requests to Volumio.

Controller 2: Voice controlled
The idea here was to create a simple voice controller so that I could control Volumio hands free (mainly for cooking). We came up with the name Octavia as a trigger word so phrases like "octavia volume up" or "octavia skip" could be used to control the music. This was built on the same raspberry pi 4 as the zigbee controller just using a cheap usb mic (
I initially tried PocketSphinx for the speech recognition, but it seemed quite slow and inaccurate. I then discovered vosc ( which is fast and offers remarkable accuracy even when running on the pi 4.
I then modified the python script I had written for the IKEA remote to use simple regular expressions to process the text from vosk. The code doesn't really use a trigger word as such, it more relies on a trigger phrase as it is checking for the presence of all the words before it performs an action. This means "louder octavia" is just as effective as "octavia louder", or even "please make the volume a little louder my dear octavia"
The final addition was to get it to announce the name of the currently playing song ("octavia, what song is this?"). I tried both flyte and mimic for text-to-speech but they were both quite robotic.
Eventually I settled on using google cloud text-to-speech as the voices are fantastic ( The free limit is 1 million characters per month, so I think I should be okay as otherwise it is expensive. The google API returns a base64 encoded sample which I then write to the file system and play with the default mp3 player.
I also modified the IKEA remote script so a triple click tells you the name of the song.

Now that I have zigbee working I noticed that there are a lot of other interesting Tradfi devices at IKEA including lights and power-points which might be interesting to play with next.

All the best, John
Reply all
Reply to author
0 new messages