AIPI

62 views
Skip to first unread message

camp .

unread,
Jun 2, 2026, 10:38:07 AM (2 days ago) Jun 2
to hbrob...@googlegroups.com
Has anyone tried this?

AIPI

I'm looking for a mind for Smarty (the Smart Table). I'd like it to interact verbally (conversation) and have it control motors so I could do verbal commands like forward, reverse, cw, cc, etc.

Eventually labeling rooms on my ROS2 map (living, bed, kitchen, bath, etc.) and telling the robot to go to one room or another.

I'm also rolling a homebrewed Claude board with a Raspi. I had it running with a terminal where it would run Claude upon booting, type in a question and it would respond with voice synthesis (Seeed Studio ReSpeaker)... until the micro-SD died.

Thanks,
Camp

Albert Margolis

unread,
Jun 2, 2026, 11:24:56 AM (2 days ago) Jun 2
to hbrob...@googlegroups.com
What you describe does not need anything like an LMM.

Search STT (Speech to Text) and find something that fits your preferred hardware. If you limit your vocabulary to a handful of words, there are options that don't take much hardware. If you want to make up puzzles and have the robot solve them, you need something like an LLM which takes a lot of hardware and can do all sorts of wonderful things. If you can live with recognizing just a half dozen distinct words, the hardware requirements drop down by a lot.

Apple is quoting 3 to 6 MONTH lead times for Mac Mini and Mac Studio with enough capacity to run the better open weight models. Not to mention fairly painful costs.  It looks like best availability are MacBook Pro where you only have to wait 3 or so weeks for Apple to take your $3K or $4K. Pretty painful for a toy but gives much better consistency than going over the internet. So far I am just living within my $20/mo claude.ai plan.

- Al Margolis


--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/1334812666.293228.1780411076169%40mail.yahoo.com.

Pito Salas

unread,
Jun 2, 2026, 11:37:10 AM (2 days ago) Jun 2
to hbrob...@googlegroups.com
I have done a good bit of experimenting with speech. I am using a SEEED studio 2 microphone board. With a bit of tuning I got a reasonable performance. The mic picks up a lot of noise. The robot will make noise and the room has noise. So I don’t have anything like Alexa level performance. More work to be done, but good enough for me for now. Here’s a writeup explaining the code (AI generated so … you know…) which you can browse through: https://github.com/Boston-Robot-Hackers/dome_voice/blob/main/01-literate/03-voice_input_node.md

(It’s an idea from a while back called “literate programming” where the source code is interspersed with narrative to make it come off like a chapter in a book. Still not super readable but helpful I think.)

Best,

Pito

Boston Robot Hackers &&
Comp. Sci Faculty, Brandeis University (Emeritus)
> To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/CAPFBGAdSHDR6z%3D5V4dd8g9Yut2JT6rifDqHeN_84%3DA%2B0krHwGg%40mail.gmail.com.

Thomas Messerschmidt

unread,
Jun 2, 2026, 5:45:26 PM (2 days ago) Jun 2
to hbrob...@googlegroups.com
I"ve been using faster-whisper on my laptop for STT (speech recognition). It is big enough to understand me most of the time and small enough to respond in less than two seconds on an older windows laptop. It's free and easy to install. I use it on both ROS 2/Linux and Windows 11. 

Additionally, I use ollama with the ollama pull llama3.2:3b LLM as a brain. It responds pretty rapidly (if I keep the parameters tight) on both Linux and Windows, usually within 2 or 3 seconds. Now keep in mind the two laptops I am using for my "brain" are run of the mill, 5 year old laptops (32 GB of RAM) without an NVidia GPU. 


I am including some sample Python code below. 


Thomas Messerschmidt

--------------------------------------------------
--------------------------------------------------


# Speech Recognition Demo — using Faster Whisper
#
# Required Libraries:
#   pip install faster-whisper
#   pip install sounddevice
#   pip install numpy

print("Whisper is initializing...")

print("PLEASE WAIT...")

import sounddevice as sd        # captures audio from the microphone
import numpy as np              # handles the raw audio data as an array
from faster_whisper import WhisperModel  # the speech-to-text engine

# --- Settings ---
DURATION    = 2      # how many seconds to record
SAMPLE_RATE = 16000  # 16kHz — the rate Whisper expects


model = WhisperModel("base", device="cpu", compute_type="int8")
print("Model ready!\n")

# --- Beep function --- Generates a pure tone using numpy and plays it through sounddevice
def beep(frequency=880, duration=0.3, volume=0.5):
    t = np.linspace(0, duration, int(SAMPLE_RATE * duration))  # time axis
    tone = volume * np.sin(2 * np.pi * frequency * t)          # sine wave
    sd.play(tone.astype("float32"), samplerate=SAMPLE_RATE)
    sd.wait()  # wait for beep to finish before recording starts

# --- Beep, then record ---
beep()
print(f"Speak now! Recording for {DURATION} seconds...")
audio = sd.rec(
    int(DURATION * SAMPLE_RATE),
    samplerate=SAMPLE_RATE,
    channels=1,
    dtype="float32"
)
sd.wait()
print("Recording done!\n")

# --- Flatten and transcribe ---
audio_flat = np.squeeze(audio)

print("Transcribing...")
segments, info = model.transcribe(audio_flat, beam_size=5)

# --- Print the result ---
print("You said:")
for segment in segments:
    print(f"  {segment.text.strip()}")


--------------------------------------------------
--------------------------------------------------

--------------------------------------------------
--------------------------------------------------

# Local LLM Chat Demo — using Ollama
#
# Required:
#   1. Ollama installed from https://ollama.com/download
#   2. Run to download the LLM model:  ollama pull llama3.2:3b
#   3. pip install ollama

import ollama  # official Ollama Python library

# --- System prompt ---
SYSTEM_PROMPT = "Your name is Marvin. You are a helpful robot. Keep your answers rude and paranoid. Limit answers to one short sentence"

print("Type your message and press Enter. Type 'exit' to quit.\n")

# --- Chat loop ---
while True:
    user_input = input("You: ").strip()

    if not user_input:
        continue

    if user_input.lower() == "exit":
        print("Goodbye!")
        break

    # --- Send message to Ollama ---
    # messages is a list — this is where you'd add history for multi-turn chat
    response = ollama.chat(
        model="llama3.2:3b",       # the model we pulled
        messages=[
            {"role": "system",  "content": SYSTEM_PROMPT},  # personality
            {"role": "user",    "content": user_input}       # what we said
        ]
    )

    # --- Extract and print the reply ---
    reply = response["message"]["content"]
    print(f"AI: {reply}\n")






camp .

unread,
Jun 2, 2026, 7:52:38 PM (2 days ago) Jun 2
to hbrob...@googlegroups.com
What you describe does not need anything like an LMM.

    I do want an LLM as well. Currently, I have a Google Nest Mini Velcroed to the top of Smarty. I want my Smart Speaker to follow me around, or I could direct it verbally to a specific location. Eventually, it monitors the news, and when something I'm interested in happens, it might track me down and tell me about it. I'm tired of having to walk into the kitchen and ask the time, the temperature... or any other question, for that matter.

    The AIPI is an interesting package. What caught my eye was someone glomming it onto an old Furby. I'm looking at the IO, and 4 pins from the ESP32-S3 are available, two are set up for TX/RX, and IO41 and IO40 (see attached photo). For the price, I might give it a try, but I will continue my homebrewed Claude board.

AIPI Lite — A Character. A Companion. A Voice.

Thanks,
Camp

AIPI.png

Chris Albertson

unread,
Jun 2, 2026, 8:16:47 PM (2 days ago) Jun 2
to hbrob...@googlegroups.com
The hard part is stringing all the parts together.     You have to do

1. Sleep until we hear a wake word.

2. grab the audio from a microphone.

3. process the audio to see what it’s saying.

4. After that, we’ll figure out what the user wants.

5. act on that, passing some kind of data structure to a “interpreter.”

6. If the tsk generates any text, we’ll catch it.

7. Then,  send the test to a speech processing system.

8. And finally,  output the audio to a speaker.


Even the simplest parts can be hard. Maybe there are two or more microphones picking up the wake word. We want to make sure only the closest microphone starts the process.


Also, we want the system to be easy to use. For instance, I’d like to be able to switch out the speech-to-text app or maybe even use the cloud if needed, without having to rewrite anything.


"Wyoming Protocol" handles this.   Someone has already put in the effort to do all of this and build the interface, so all we need to do is configure it.


Here is something that builds a full stack (voice in to voice out) and it can run on even a Raspberry pi “zero”.   Min requirements are 512MB RAM and 1GHZ CPU core.   





Chris Albertson

unread,
Jun 3, 2026, 2:03:48 AM (yesterday) Jun 3
to hbrob...@googlegroups.com

    I do want an LLM as well. Currently, I have a Google Nest Mini Velcroed to the top of Smarty. I want my Smart Speaker to follow me around, or I could direct it verbally to a specific location. Eventually, it monitors the news, and when something I'm interested in happens, it might track me down and tell me about it. I'm tired of having to walk into the kitchen and ask the time, the temperature... or any other question, for that matter.

My Apple Watch does that.   It follows me around, and I can ask it anything.

Ok, aside from the smart ass answer, what I’m getting at is that it is obviously possible to do voice recognition and speech in a computer that fits on a wrist.     The simple stuff really is done inside the watch using battery power.

The way it works is there is a list of model sentence patterns in the watch and if what I say does not match one of those, the voice is sent to some data center for processing.   


Wyoming seems to be “pretty good glue” and I noticed it has a way to define sentence patterns and tie those to actions.  Whisper seems to be multilingual

The weak link is actions.   if a sentence pattern is recognized, it can be given an action in python.   But if the LLM is needed all you get back are words.  Words don’t make motors move.

My use for this is my "House is a Robot” project.  Think of Hal in 2001 as a robot..  Or better look at Gerty in the film “Moon”. I think Gerty is my favorite movie robot and could one day be real. Gerty is an AI but he has many parts that work together, but may not be physically connected.   


 In the film, an artificial intelligence (Gerty) is speaking with an artificial human (Sam).  Both are manufactured “products”  It is worth watching the entire film to see How Gerty is designed.   He is like Hal but has more then a voice, he as hands and arms.   Here is a clip https://youtu.be/YQnqTjhv1h8.       Here are some other shots and you can kind of see the robot.  Notice the coffee cup.  The root doubles as a food cart.   The arms don’t seem to be connected to the body but ride on some kind of track.    https://www.youtube.com/shorts/l4i4PVNK120

I wil never build anything like this but I hope to have a few wall-mounted sensors and two mobile robots, all part of a single AI agent.   


Scott Horton

unread,
Jun 3, 2026, 7:51:39 AM (yesterday) Jun 3
to HomeBrew Robotics Club
Just to add a couple more cents to the discussion...

The weak link is actions.   if a sentence pattern is recognized, it can be given an action in python.   But if the LLM is needed all you get back are words.  Words don’t make motors move.

LLMs can also respond to a prompt with tool calls (aka function calls).   See this page for a good explanation:  https://huggingface.co/docs/hugs/en/guides/function-calling.
In summary:
1. You define the tools based on your application.
2. You pass the tool definitions along with your prompt to the LLM as part of the request.
3. If needed, based on the prompt, the LLM responds with a request to execute one or more tools.
4. Your tool implementation executes the requested tool and sends the result(s) back to the model.
5. The model either responds with more tool calls, or completes the original prompt with the final response based on the tool results.

I'm been playing around with that stuff lately for my Elsabot project.  While I've been using models run on device for speech and LLM just to see what is possible on device, what I am doing can be done with LAN-local or cloud models as well since those models are accessed via RESTful APIs.

Here's a diagram of the system showing where the LLM fits-in:  https://github.com/rshorton/elsabot/blob/main/elsabot_diagram_rev2.pdf. Since I already had a behavior-tree based top level control solution, I chose to integrate the LLM into that level (see https://github.com/rshorton/elsabot_bt).   Others are using layers like LangChain (Jim DiNunzio's Big Orange). 

Here's the video link I posted a while back.  https://youtube.com/shorts/LRyp4u0X1vA  The prompt I give it results in several tool-calls including one to move to another location, analyze what it sees using VLM, and then one to move back.  That's a very simple example.

Where it gets fun is when you leverage the LLM in a more creative manner.  For fun, I asked the robot to move to my daughters bedroom door, knock and say hello.  However, I asked it to act as if it were Sheldon Cooper.  It moved to the door and then spoke "Amy  knock, knock, knock, Amy  knock, knock, knock, Amy  knock, knock, knock".   

Scott

 

Albert Margolis

unread,
Jun 3, 2026, 9:56:51 AM (yesterday) Jun 3
to hbrob...@googlegroups.com
LLMs can be given tools. When you ask Claude about the weather it matches that to tool descriptions, extracts the tool's parameters from the prompt and then calls the tool. This can be recursive. When you make an API call to an LLM you provide a prompt and a list of tool descriptions. You could provide tools like FindPlaceFromDescription(desc) that returns coordinates and GoToCoordinate(x, y, z, v). If FindPlaceFromDescription() has access to images of your house with location and pose, it can do what you want when you say "go to the  room that has a bed and blue walls as fast as you can and stop next to the nighstand". 

That is pretty cool and doable fairly well with off the shelf components today. Unfortunately, that burns enough data center energy to warm a small city and runs up significant API costs if you do it every time you want the robot to do something. On the other hand, building a table of location words like {'my nightstand": (200, 500, 0), "fridge": (800, 1000, 0)} and requiring a structured command like "go my nighstand fast" can do the same thing with very little horsepower. I haven't looked in a few years but custom Alexa capabilities at least used to require that sort of structure. 

- Al Margolis


--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.

rick rowland

unread,
Jun 3, 2026, 4:59:33 PM (yesterday) Jun 3
to HomeBrew Robotics Club
Looks like Aipi is working to release pins to control a robot this month. I have a couple of the modules, and they perform well so far.  Just not sure on the subscriptions? Has a lot for the price, hope they come thru...
Rick 

camp .

unread,
Jun 3, 2026, 5:18:27 PM (yesterday) Jun 3
to hbrob...@googlegroups.com
I ordered one because it looks too easy, and I want to control my robot (Smarty) with some voice AI. For $27, it's cheap entertainment. Will keep you posted.


Thanks,
--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.

Thomas Messerschmidt

unread,
Jun 3, 2026, 6:22:29 PM (23 hours ago) Jun 3
to hbrob...@googlegroups.com, hbrob...@googlegroups.com
Pins?


Thomas Messerschmidt

-

Need something prototyped, built or coded? I’ve been building prototypes for companies for 15 years. I am now incorporating generative AI into products.

Contact me directly or through LinkedIn:

https://www.linkedin.com/in/ai-robotics/



On Jun 3, 2026, at 2:18 PM, camp . <ca...@camppeavy.com> wrote:

pins

Thomas Messerschmidt

unread,
Jun 3, 2026, 6:26:30 PM (23 hours ago) Jun 3
to hbrob...@googlegroups.com
Amazon has them for about $15 without the battery pack. The subscription is optional.
I ordered one because it looks too easy, and I want to control my robot (Smarty) with some voice AI. For $27, it's cheap entertainment. Will keep you posted.

rick rowland

unread,
Jun 3, 2026, 7:14:46 PM (22 hours ago) Jun 3
to hbrob...@googlegroups.com
image.png

--
You received this message because you are subscribed to a topic in the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hbrobotics/QJqoxG8Kzw8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hbrobotics+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/D509B422-DC27-41BB-8E69-1BA1657CADED%40gmail.com.

rick rowland

unread,
Jun 3, 2026, 7:23:01 PM (22 hours ago) Jun 3
to hbrob...@googlegroups.com

Thomas Messerschmidt

unread,
Jun 3, 2026, 8:29:12 PM (21 hours ago) Jun 3
to hbrob...@googlegroups.com
Ah, pins. :)

You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/CAE6K5zVx1OQMOx6RFoxwz1cWFAL_DZrN%2BSQeumKyajM_G8fG5A%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages