ROS Discussion Group, tonight! May 12 @ 7pm Pacific Time

23 views
Skip to first unread message

camp .

unread,
May 12, 2026, 1:59:59 PM (3 days ago) May 12
to hbrob...@googlegroups.com
  It's Tuesday again! Whether you're a beginner or an expert, please join us if you're interested in ROS.  :-]  

Join Zoom Meeting
Meeting ID: 889 8347 8865, Passcode: 935511

Thanks,
Camp

camp .

unread,
May 12, 2026, 9:01:36 PM (3 days ago) May 12
to hbrob...@googlegroups.com
Tonight! - cp

--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/1428150331.1153589.1778608793420%40mail.yahoo.com.

Scott Horton

unread,
May 12, 2026, 11:49:54 PM (2 days ago) May 12
to HomeBrew Robotics Club
Here's the link to the youtube video I played in the meeting.  https://youtube.com/shorts/LRyp4u0X1vA

For this video I told Elsabot to go to the playroom TV and see what it was showing and then come back and tell me.  It then used these LLM tool calls I have implemented for Elsabot:
1. Requested the list of known locations (maps location name to x,y,yaw).  From that it determined the Nav2 map location of the TV.
2. Requested the robot move to the playroom TV location.
3. Once the move request finished, it requested a camera frame and VLM analysis of the frame.
4. Requested the robot move back to the original location.
5. Described what it saw after it arrived back.

In addition to describing the scene on the TV, it also noticed the hand-written note I had placed in view.

It is using Gemma 4 26B which is pretty amazing.

Scott

Sergei Grichine

unread,
May 13, 2026, 12:45:05 AM (2 days ago) May 13
to hbrob...@googlegroups.com
Thanks for sharing, Scott!

While trying to understand the big picture, I took the liberty of uploading Scott's zipped  repository to ChatGPT as a "source". I then provided the following prompt:
  • "Analyze the zip archive sources and describe the architecture of this ROS2 robot. Focus on interactions between the LLM and Behavior Trees. Try to distill the interface used in these interactions."
Here is the response: https://chatgpt.com/share/6a03fca5-59e8-83ea-92cf-1574c3ee4770 -  I hope this helps clarify the system structure.

I look forward to exploring the code further. It is a fabulous piece of work - miles ahead of my own experiment with face-and-gesture sensor.

Best Regards,
-- Sergei


--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.

Scott Horton

unread,
May 13, 2026, 9:07:49 AM (2 days ago) May 13
to HomeBrew Robotics Club
Sergei,
Thanks for your nice comments.

It is definitely a work in progress.  Lots of iteration over the last month or so to see what you can do with an LLM integrated in that way.  Still lots of areas to refine as ChatGPT pointed-out.   

Jim pointed out recently how he uses LangChain for similar high-level control.  Not clear at this point where the wheels fall-off with my approach vs. a more structured one like LangChain or similar.  However, I like the tight integration I am seeing with the path I am taking.  

I'm still learning how to improve the integration with the model especially for VLM.  I found a fix the other day for the tool parser that is being used by vllm for Gemma that was causing images to be rejected when included in a tool call response.  With that fix applied, the model can request a frame using the 'get camera frame' tool call and then directly analyze it in the same LLM context.  Without that fix I needed to provide a tool that grabbed a frame, sent it to a separate LLM context, and then returned the VLM result to the primary context.  That fix speeds up the response and allows the image results to be included in the context.

Based on a chat with Gemini on how to best use/integrate Gemma for some of the use cases I have in mind (hide and seek, laser tag), I learned there are additional ways to improve the responsiveness of VLM (and manage context usage) by more carefully managing the size of the images passed to the model.  Still lots to learn and test.

Regards,
Scott
Reply all
Reply to author
Forward
0 new messages