Alfie build thread

94 views
Skip to first unread message

Alan Timm

unread,
Apr 15, 2025, 11:38:01 PMApr 15
to RSSC-List
Hey there!

I'm getting closer to (re)assembling alfie.  The 12 20a buck converter is working well, although I think it's time to shorten a whole bunch of cables so that everything fits (it doesn't right yet).

Also I've fallen into a bit of a rabbit hole wrt on-board processing.  I rage-quit my indiedroid nova SBC and have moved on to the Radxa Rock 5C with 16gb ram.

There are some compelling options for on-device speech synthesis, speech recognition?!, and large/small language models?!  It's crazy that you can run these on a raspberry pi sized device.
I think? the qwen models are capable of tool use, but you can run several combinations of these on an 8gb ram sbc, and the whole stack with room to spare on a 16gb device.

Here's a sample of libretts_r_medium voice 4 (there's 903 total voices available) linked in the message.

PXL_20250416_005108390.jpg
assistant.mp3

Gmail

unread,
Apr 18, 2025, 10:18:33 PMApr 18
to Alan Timm, RSSC-List
Alan,

ChatGPT 4o says, 

 I can identify and classify thousands of object types in uploaded photos, common categories include:

  • People (faces, age/gender estimates, activities)
  • Animals (species, breeds)
  • Plants (trees, flowers, leaves)
  • Food (types of dishes, ingredients)
  • Text (printed/handwritten, languages)
  • Vehicles (cars, planes, bikes)
  • Buildings (types, landmarks)
  • Everyday objects (furniture, tools, electronics)
  • Clothing (styles, colors, accessories)
  • Signs and labels (road signs, logos, warnings)”

Can you recommend a similar (free) on-device image classification model? I mean something more like chatgpt and less like YOLO. I am ok if it requires a standard or even a gaming laptop with a high end GPU. 


Thomas

-  

Need something prototyped, built or coded? I’ve been building prototypes for companies for 15 years. I am now incorporating generative AI into products.

-

Need a great hardworking engineer? I am currently looking for a new job opportunity in robotics and/ or AI. 

Contact me directly or through LinkedIn:   


On Apr 15, 2025, at 8:38 PM, Alan Timm <gest...@gmail.com> wrote:

on-device

Chris Albertson

unread,
Apr 18, 2025, 11:21:22 PMApr 18
to Gmail, gestalt73, RSSC-list

On Apr 18, 2025, at 7:18 PM, Gmail <thomas...@gmail.com> wrote:

Alan,

ChatGPT 4o says, 

 I can identify and classify thousands of object types in uploaded photos, common categories include:

  • People (faces, age/gender estimates, activities)
  • Animals (species, breeds)
  • Plants (trees, flowers, leaves)
  • Food (types of dishes, ingredients)
  • Text (printed/handwritten, languages)
  • Vehicles (cars, planes, bikes)
  • Buildings (types, landmarks)
  • Everyday objects (furniture, tools, electronics)
  • Clothing (styles, colors, accessories)
  • Signs and labels (road signs, logos, warnings)”


Can you recommend a similar (free) on-device image classification model? I mean something more like chatgpt and less like YOLO. I am ok if it requires a standard or even a gaming laptop with a high end GPU. 

What you need is not so much the powerfull GPU, but one with huge amounts of VRAM.  The models that can identify all those things are huge, many billions of parameters.  And it really has to be vram that the GPU can access.      Even a “‘small” 20 billion parameter model will require 20GB VRAM.   Not to be found on a noteboook PC GPU.     Possibly an Apple Mac could work because of its unified RAM model.

But if you are after effecency the YOLO-like model that is trained on the images you care about is the best as it can run on a Raspbnerry Pi with one of those “AI Chips” attached.

OK, but you want a publically available open source LLM,….   Go to Hugging Face and serch for one.  

Alan Timm

unread,
Apr 19, 2025, 1:18:36 PMApr 19
to RSSC-List
Hey Thomas,

Good morning!

There are a couple of ways to answer your question, and it all depends on how much iron you're willing to throw at the problem.

My current rabbit hole involves running these models on an rk3588 sbc with 16gb of ram, so this 3.8B llava-phi3 model caught my eye:

It's generating text responses at about 6 tokens per second, but I haven't tried the image capabilities yet.   It's taking up about 9GB of ram at the moment

as well as this rkllama library that purports to run models on the rk3588 npu:

I'm not sure if/how much faster that will be than taking up all 8 cpu cores. I'll probably take a closer look this week.

But...  There's probably a near future when I need to add in an nvidia jetson board for some more GPU horsepower, in which case you might be looking at the 16gb orin nx and carrier board:


I'd probably start with that llava-phi3 model and work your way upwards from there.

Alan

Alan Timm

unread,
Apr 19, 2025, 1:28:41 PMApr 19
to RSSC-List
Here's the result of passing in the attached image and asking "What's in the image?" on my Radxa Rock 5C, 15GB ram 8 core sbc @ 1.8Ghz
The round trip time was almost 2 minutes.  So not fast, but maybe useful?

>>> what is in /home/alfie/Pictures/homeoffice.jpg
Added image '/home/alfie/Pictures/homeoffice.jpg'
The image shows an old school desktop computer setup with a yellow plastic chair in front of it. The laptop
screen displays "03:58" and the mouse is black. There are two mugs next to the keyboard - one is green and
the other is white. On the desk, there is also a potted plant with green leaves.

total duration:       1m57.419420595s
load duration:        4.535755612s
prompt eval count:    716 token(s)
prompt eval duration: 1m38.395394584s
prompt eval rate:     7.28 tokens/s
eval count:           73 token(s)
eval duration:        14.425655452s
eval rate:            5.06 tokens/s
homeoffice.jpg

Chris Albertson

unread,
Apr 19, 2025, 2:55:05 PMApr 19
to gestalt73, RSSC-list
Here is the problem or really the choise you have.  

(1) you can use LLM-based technology and, after two minutes get a written paragraph that nicely describes the image.  You would then have to process the paragraph to extract information.   This is good because it shows the model can accept just about anything you show it. Or,…

(2) you can run a version of YOLO and it will return a list of objects with bounding box coordinates but it will only see objects that it is trained to see.   But it runs on modest hardware.  I was able to get 30 frames per second on a Linux PC.   This means YOLO was able to process live video in real time. (my test data was a Hollywood action film)   The objects and the boxes were stored in a database-like list that could be queried.

I think what you do depends on what the task is.   A navigation task need the coordinates in (x,y) of each object and can’t wait 2 minutes,  By “navigation” I mean not just rolling on wheels but an arm grasping an object.

But perhaps the robot’s job is to answer questions like. “Robbie, did UPS deliver my package? is it on the porch?” Then the LLM would be ideal   But to open the door and pick up that box, you need more classic vision, photogrammetry, not AI.

It is interesting to see how Tesla handles this.  The cameras run at about 30 FPS and then data is sent to about 5 different models and each model is run independently, in parallel. Each model turns ther image frames into data.  This may be the solution for robots.  Don’t choose.  The correct answer is “all of the above”.

On Apr 19, 2025, at 10:28 AM, Alan Timm <gest...@gmail.com> wrote:

Here's the result of passing in the attached image and asking "What's in the image?" on my Radxa Rock 5C, 15GB ram 8 core sbc @ 1.8Ghz
The round trip time was almost 2 minutes.  So not fast, but maybe useful?

>>> what is in /home/alfie/Pictures/homeoffice.jpg
Added image '/home/alfie/Pictures/homeoffice.jpg'
The image shows an old school desktop computer setup with a yellow plastic chair in front of it. The laptop 
screen displays "03:58" and the mouse is black. There are two mugs next to the keyboard - one is green and 
the other is white. On the desk, there is also a potted plant with green leaves.

total duration:       1m57.419420595s
load duration:        4.535755612s
prompt eval count:    716 token(s)
prompt eval duration: 1m38.395394584s
prompt eval rate:     7.28 tokens/s
eval count:           73 token(s)
eval duration:        14.425655452s
eval rate:            5.06 tokens/s
<homeoffice.jpg>
-- 
You received this message because you are subscribed to the Google Groups "RSSC-List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rssc-list+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/rssc-list/9afb46ba-07e8-49fc-a4f2-56cfe9083706n%40googlegroups.com.
<homeoffice.jpg>

Jim DiNunzio

unread,
Apr 19, 2025, 3:04:12 PMApr 19
to Alan Timm, RSSC-List
I got a nice Easter present after a 4 month wait.  I definitely want to try out a vision model like that with this running 67 TOPs and max of 25 watts. After that figure out a robot to wield it. 
Jim

--
20250419_113901.jpg

Alan Timm

unread,
Apr 19, 2025, 9:02:40 PMApr 19
to RSSC-List
Hey Chris & Thomas,
   Yep, it all depends on what problem(s) you're trying to solve, how fast you need the feedback, and ultimately where you want the processing to occur.  Usually you optimize for two and put up with whatever's left for the third.

For alfie, I'll host a handful of these SLM models on the SBC for a bit to see if there's any practical use for them.  so far piper tts is faster than real time with < 1 second latency to first utterance.  I'll check out faster-whisper next.

Hey Jim,
   Ohmygosh, you got one?!?!?  Nice!  There's a software unlock to update all the jetson nano and orin boards to super status with a corresponding increase in power use and TOPs.

For alfie, after I complete systems integration and get the ros scaffolding up it'll be time for "operation: hi five!" to train a neural model for him that gives high fives whenever someone holds up their hand the right way.  That'll tell me alot more about what type of processing power i need to have on board, and I have the orin and carrier board on a wishlist.  It'll connect to the rock 5c over 1gb ethernet and will be nestled on the base under the battery. 

Alan

Gmail

unread,
Apr 21, 2025, 4:07:27 PMApr 21
to Alan Timm, RSSC-List
Hey Alan,

Thanks for this and your other replies. When I get a few minutes (Hours? Days?) I will attempt to download, install, configure, and try out that model sometime later this week. 

Did you say that you found it took about 2 minutes for analysis of a photo? 

I’m going to be running this on my gaming laptop with its 4070 gpu. 

  • Intel® Core™ i9-14900HX Processor
  • NVIDIA® GeForce RTX™ 4070 Laptop GPU,
  • 8GB GDDR6
  • 32GB DDR5 Memory
  • 1TB NVMe PCle SSD
  • 16" 16:10 FHD+ (1920x1200), 144Hz
  • Thunderbolt™ 4



Thanks again! Wish me luck 🍀! 


Thomas


Thomas

-  

Need something prototyped, built or coded? I’ve been building prototypes for companies for 15 years. I am now incorporating generative AI into products.

-

Need a great hardworking engineer? I am currently looking for a new job opportunity in robotics and/ or AI. 

Contact me directly or through LinkedIn:   


On Apr 19, 2025, at 10:18 AM, Alan Timm <gest...@gmail.com> wrote:

Hey Thomas,
--
You received this message because you are subscribed to the Google Groups "RSSC-List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rssc-list+...@googlegroups.com.

Alan Timm

unread,
Apr 21, 2025, 8:59:01 PMApr 21
to RSSC-List
Running on an rtx 4070?  That 3.8b vision model will run ALOT faster.  It took two minutes on my raspberry pi.  :-)

Alan

Alan Timm

unread,
Apr 21, 2025, 9:05:38 PMApr 21
to RSSC-List
Alfie can shrug now!

The tales that I could tell (and probably will next month) about what I ran into while getting this to work.

The initialize procedure uses one of the three opto switches from the delta printer to detect max down position, then travels 390mm to the top position.

That shrug at the top?  That's a flourish.  Totally unnecessary, but that's why I go up to 390mm and not 400mm.  You have to leave a little bit of room for the occasional shrug.  :-)

screenshot_21042025_180100.jpg

Gmail

unread,
Apr 21, 2025, 9:20:19 PMApr 21
to Alan Timm, RSSC-List
Well, I don’t know. 🤷🏻 
😆



Thomas

-  

Need something prototyped, built or coded? I’ve been building prototypes for companies for 15 years. I am now incorporating generative AI into products.

-

Need a great hardworking engineer? I am currently looking for a new job opportunity in robotics and/ or AI. 

Contact me directly or through LinkedIn:   


On Apr 21, 2025, at 6:05 PM, Alan Timm <gest...@gmail.com> wrote:

Alfie can shrug now!

The tales that I could tell (and probably will next month) about what I ran into while getting this to work.

The initialize procedure uses one of the three opto switches from the delta printer to detect max down position, then travels 390mm to the top position.

That shrug at the top?  That's a flourish.  Totally unnecessary, but that's why I go up to 390mm and not 400mm.  You have to leave a little bit of room for the occasional shrug.  :-)

<screenshot_21042025_180100.jpg>


On Tuesday, April 15, 2025 at 8:38:01 PM UTC-7 Alan Timm wrote:
Hey there!

I'm getting closer to (re)assembling alfie.  The 12 20a buck converter is working well, although I think it's time to shorten a whole bunch of cables so that everything fits (it doesn't right yet).

Also I've fallen into a bit of a rabbit hole wrt on-board processing.  I rage-quit my indiedroid nova SBC and have moved on to the Radxa Rock 5C with 16gb ram.

There are some compelling options for on-device speech synthesis, speech recognition?!, and large/small language models?!  It's crazy that you can run these on a raspberry pi sized device.
I think? the qwen models are capable of tool use, but you can run several combinations of these on an 8gb ram sbc, and the whole stack with room to spare on a 16gb device.

Here's a sample of libretts_r_medium voice 4 (there's 903 total voices available) linked in the message.

PXL_20250416_005108390.jpg

--
You received this message because you are subscribed to the Google Groups "RSSC-List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rssc-list+...@googlegroups.com.

Gmail

unread,
Apr 21, 2025, 9:24:49 PMApr 21
to Alan Timm, RSSC-List
I hope “a LOT faster” means under 15 seconds. 



Thomas








-  

Need something prototyped, built or coded? I’ve been building prototypes for companies for 15 years. I am now incorporating generative AI into products.

-

Need a great hardworking engineer? I am currently looking for a new job opportunity in robotics and/ or AI. 

Contact me directly or through LinkedIn:   


On Apr 21, 2025, at 5:59 PM, Alan Timm <gest...@gmail.com> wrote:

Running on an rtx 4070?  That 3.8b vision model will run ALOT faster.  It took two minutes on my raspberry pi.  :-)

Alan Timm

unread,
Apr 21, 2025, 11:33:46 PMApr 21
to RSSC-List
I don't know what the performance difference is between a laptop rtx 4070 and a desktop rtx 3090..

But on my desktop rtx 3090 it was *a bit* faster...
like ~ 0.5 seconds total.

  ❯❯ /home/alansrobotlab : ollama run llava-phi3 --verbose
>>> what is in /home/alansrobotlab/Pictures/homeoffice.jpg
Added image '/home/alansrobotlab/Pictures/homeoffice.jpg'
1. A wooden desk with a laptop on it, next to two black coffee mugs and a plant. The time displayed on the laptop
screen is 03:58. There is also a yellow plastic chair with wheels tucked underneath the desk.
2. A window in the room that shows a view of trees outside.

total duration:       463.085074ms
load duration:        14.85164ms
prompt eval count:    589 token(s)
prompt eval duration: 23.863499ms
prompt eval rate:     24682.05 tokens/s
eval count:           76 token(s)
eval duration:        417.033728ms
eval rate:            182.24 tokens/s

Gmail

unread,
Apr 22, 2025, 12:36:53 AMApr 22
to Alan Timm, RSSC-List
Well, for basic world knowledge , < 3 seconds would be fine. For real time robot navigating (navigating through a home by camera alone is one of my goals/ use cases), .5 seconds might be a bit too slow. 

Assuming 1.5 MPH, a robot would go a bit more than a foot in a second. I suppose then that the robot would have to stop every so often and check its path. I have been doing a lot experimentation with uploading videos to ChatGPT 4o using the API. ChatGPT 4o has gotten a lot better over the last few months. They are teasing us about version 5. I can’t wait! 

OpenAI has also released (beta) the “live vision“ video and audio analysis. I have been using it for the last several months, and I find it to be laggy. It falls behind as much as 30 seconds after only three or four minutes of use. Also, OpenAI limits me to using it for about 15 minutes a day. Still, it’s truly amazing for interactive conversations. All of a sudden, your robot is no longer blind. You want your robot to have conversations similar to the sci-fi robots Robby, C3PO, Johnny 5, or Rosie? This is your answer!  BUT, I have tried for robot navigation and unfortunately when it comes to navigation, I found that this feature is not yet ready for prime time. 😆



Thomas

-  

Need something prototyped, built or coded? I’ve been building prototypes for companies for 15 years. I am now incorporating generative AI into products.

-

Need a great hardworking engineer? I am currently looking for a new job opportunity in robotics and/ or AI. 

Contact me directly or through LinkedIn:   


On Apr 21, 2025, at 8:33 PM, Alan Timm <gest...@gmail.com> wrote:

I don't know what the performance difference is between a laptop rtx 4070 and a desktop rtx 3090..
--
You received this message because you are subscribed to the Google Groups "RSSC-List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rssc-list+...@googlegroups.com.

Alan Timm

unread,
May 15, 2025, 11:30:47 PMMay 15
to RSSC-List
For those of you that attended last weekend's meeting you heard Alfie's voice. Using piper tts and voice 65 out of 906 is faster than real time.  He sounds pretty good and natural-ish for on-device generation.

More recently nvidia quietly released a new ASR automatic speech recognition model called parakeet v2 0.6b.  It also runs much faster than real time and out performs whisper is both speed and accuracy.

The default 16 bit model transcribes speech at twice real-time (3.6 seconds for 7 seconds of speech)

There's also an onnx-asr project and converted onnx model that transcribes speech at 4 times real time (1.5 seconds for 7 seconds of speech).

I'll still need a speech detector, maybe a wake word detector?  and a diarazation model but i'm amazed about how well all this works on a raspberry pi 5 type sbc.

Alan
this_is_alfie.wav

Alan Timm

unread,
Jun 5, 2025, 5:32:40 PMJun 5
to RSSC-List
I've made alot of progress with on-device functionality for Alfie.  Here's a quick demo of silero speech detection and nvidia parakeet asr on the radxa rock 5c.

We'll talk alot more about it next next weekend!

screenshot_05062025_143051.jpg

Alan Timm

unread,
Jun 17, 2025, 12:09:21 AMJun 17
to RSSC-List
Following Brian's lead I've started syncing my work with a github repository:

Among other things that keeps a copy of the code safe in case I do something dumb, which is known to happen.  :-)

Also I think Jim makes a convincing argument for using a wakeword.

Hey Jim, what was that shiny new wakeword program you're using?  


Alan

Jim DiNunzio

unread,
Jun 17, 2025, 2:35:44 AMJun 17
to Alan Timm, RSSC-List

Hi Alan,

 

I’m using Porcupine Wake Word by Pico Voice. It runs locally on your machine and is free for non-commercial projects. You can create one wake word per month. Sign up and click the non-commercial option, and agree not to aspire to make any money with it  (at least while using their tech!)

 

https://picovoice.ai/platform/porcupine/

https://picovoice.ai/docs/quick-start/porcupine-python/

 

You can see my example code utilizing two wake words:

This is a simple test which only requires pvporcupine and pyaudio and your wake word ppn file you get from picovoice:

https://github.com/jimdinunzio/big-orange/blob/Python-3.9/python/tests/test_porcupine_wake_word.py

 

As a career software guy, I’m a big fan of github and development records. All Big Orange code (and my other projects’ code) has been on github since 2020.

 

https://github.com/jimdinunzio/big-orange/

 

Jim

--

You received this message because you are subscribed to the Google Groups "RSSC-List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rssc-list+...@googlegroups.com.

Sergei G

unread,
Jun 17, 2025, 11:27:41 AMJun 17
to Alan Timm, RSSC-List, j...@dinunzio.com
One of the overlooked useful features of GitHub is the ability to create and edit formatted README.md files right from the browser.

You can organize your notes and share your finds with the world (well, with the Club ;-)) - for free. It is probably the most durable/reliable storage of documentation and code at the moment.

Here is my frequently updated collection: https://github.com/slgrobotics/robots_bringup/tree/main


Best Regards,
-- Sergei


From: rssc...@googlegroups.com <rssc...@googlegroups.com> on behalf of Jim DiNunzio <j...@dinunzio.com>
Sent: Tuesday, June 17, 2025 1:35 AM
To: 'Alan Timm' <gest...@gmail.com>; 'RSSC-List' <rssc...@googlegroups.com>
Subject: RE: [RSSC-List] Alfie build thread
 

Thomas Messerschmidt

unread,
Jun 17, 2025, 8:23:33 PMJun 17
to j...@dinunzio.com, Alan Timm, RSSC-List
Thanks for sharing the links Jim. 


Thomas



On Jun 16, 2025, at 11:35 PM, Jim DiNunzio <j...@dinunzio.com> wrote:



Alan Timm

unread,
Jun 23, 2025, 12:08:32 AMJun 23
to RSSC-List
Alfie's next upgrade:  The Jetson Nano NX 16gb.

It's about the size of two pis stacked on top of each other, and is capable of 100tops with this carrier board.
It'll fit perfectly in the base once i move the buck converter.

Right out of the box it's generating llm tokens at twice the speed of the rock 5c with ollama which seems... a little slow.  
I expected it to be ALOT faster.  18 vs 40 tokens per second isn't bad, but not really impressive for dedicated gpu hardware.

Plus there was a press release stating that these boards could generate tokens so much faster, but they don't say HOW.
I suspect they're using tensorrt-llm to run the models, so that's what I've been working on this weekend.

screenshot_22062025_205909.jpg

Nathan Lewis

unread,
Jun 23, 2025, 9:47:42 AMJun 23
to RSSC-list
That's awesome! Does that board have way to connect to the camera inputs on the Orin module?

- Nathan
--
You received this message because you are subscribed to the Google Groups "RSSC-List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rssc-list+...@googlegroups.com.

Alan Timm

unread,
Jun 23, 2025, 10:45:10 PMJun 23
to RSSC-List
Hey Nate! 

I took a closer look at the carrier board and the expansion board and there's no camera inputs.  :-(

That's kinda a bummer that they didn't make the cut.

They very recently released a slightly larger version of the carrier board that supports the full Super MAXN modes.  This one includes 4x CSI camera connectors.

Alan

Alan Timm

unread,
Jun 24, 2025, 12:22:10 AMJun 24
to RSSC-List
Oof, what an adventure, here's how to run accelerated llms on the jetson (and also on nvidia gpus)

tldr; 
mlc_llm chat HF://mlc-ai/Qwen3-1.7B-q4f16_1-MLC --device cuda
or...
mlc_llm serve HF://mlc-ai/Qwen3-1.7B-q4f16_1-MLC --device cuda

Firstly, in order to maintain sanity with the fast pace of changes in all the ai stuff there's a meta package called jetson-containers that dockerizes most of the things you'd want to do on the board.  super handy if you're running jetson hardware.

Secondly in their press release I figured out that they're running llms under mlc-llm, which compiles language models into something that can run much faster than on ollama or huggingface transformers.

So here's the final stats for Qwen3-0.6B 4bit quantized:
  • Radxa Rock 5C (Ollama):        18 tokens per second
  • Jetson Orin NX 16GB (Ollama):  37 tokens per second
  • Jetson Orin NX 16GB (mlc-llm): 98 tokens per second
That's pretty good.

And here's a few more stats on the Orin for different versions of the model under mlc:
  • Qwen3-0.6B: 98 tokens per second
  • Qwen3-1.7B: 50 tokens per second
  • Qwen3-4B:   22 tokens per second
  • Qwen3-8B:   13 tokens per second

Alan Timm

unread,
Jun 24, 2025, 11:39:15 PMJun 24
to RSSC-List
Tonight I benchmarked a handful of qwen3 models on my rtx 3090 and on my jetson using ollama and mlc in the background while working on other things.

I'd say that the performance improvement makes moving your SLMs to mlc_llm worthwhile.
I didn't expect there to be diminishing returns on larger models vs smaller models.  That's interesting.
The current gameplan is to host the model using mlc_llm serve, then interact with it using langgraph etc.


Proompt:   write a haiku about my third favorite mini slinkie

(Average of 3 runs)
(ollama models are unsloth Q4_0 quantized)
(mlc models are q4f16_1 quantized)

screenshot_24062025_203032.jpg

screenshot_24062025_202959.jpg

Alan Timm

unread,
Jul 5, 2025, 8:29:23 PMJul 5
to RSSC-List
Ok, so this weekend is the weekend I integrate the new jetson orin nx into Alfie.  (I was getting frustrated bouncing back and forth between the two systems.)

Here's a quick shot of just how small the jetson modules are, they're only 70mm wide. and another shot of the naked module with the carrier board.  This fits exactly where the previous stack of the radxa rock 5c and the 12v buck converter was.
Now I just need to design a new cubbyhole for the new 30amp buck converter.

There's just enough IO for everything except for one thing -- there's no GPIO on the board.  Luckily I have a few spare GPIOs on the waveshare driver board so I'll move over the shoulder limit switch to that.
  • USBA - oakd lite depth camera
  • USBA - waveshareboard 1 comms
  • USBC - Seeedstudio mic array
  • USB2.0 header - waveshareboard 2 comms
  • Debug Serial - host communications over usb pigtail
  • Serial header - closed loop stepper ttl serial comms at 115,000baud

On the LLM front... after spending an inordinate amount of time optimizing the qwen3 0.6b model for speed I remembered one of the first things that Seth said and...  the 0.6b model isn't very useful. So I've moved on to the qwen3 1.7b model and am getting ~50 tokens per second with it.

screenshot_05072025_171323.jpg

screenshot_05072025_171357.jpg

Alan Timm

unread,
Oct 10, 2025, 3:48:05 PM (5 days ago) Oct 10
to RSSC-List
Oof, it's been a minute, hasn't it.
With Dani's help Alfie has been reassembled and he's been online consistently for the past week or two.

Here's a look at the newelectronics bay with that jetson stuffed in.  It's a tight fit but it works.
screenshot_10102025_124202.jpg

I also reverse engineered the bottom plate and added in vents to help keep everything cool.
screenshot_10102025_124535.jpg

screenshot_10102025_124349.jpg

Alan Timm

unread,
Oct 12, 2025, 4:04:50 PM (3 days ago) Oct 12
to RSSC-List
Hey guys, quick update since I wasn't able to stick around for show'n'tell this time.

The code for the waveshare general driver boards is about 80% complete.
VSCode + PlatformIO + FreeRTOS + MicroROS is kinda awesome once you get the hang of it.

At this point I have the boards:
  • generating full 100hz status packets including diagnostic info
  • capturing critical serial bus servo stats for up to 10 servos
  • passing along 9 axis imu info
  • accepting 100hz command packets for the same
I just posted an update to the repo with all of the changes I've been working on.

And I know I'm repeating myself but "GET YOURSELF SOME GITHUB COPILOT!"
Even the free plan is incredibly useful.

I've been partnering with copilot for everything from code refactors to helping me to understand why my freertos + microros solution wasn't generating update messages at 100hz like I thought it should.
It's like having an infinitely patient subject matter expert looking over your shoulder to jump in and offer advice, explanations, and help when you need it.

Alan

screenshot_12102025_125703.jpg
Reply all
Reply to author
Forward
0 new messages