4 bit ai

117 views
Skip to first unread message

A J

unread,
May 6, 2024, 6:16:55 PM5/6/24
to HomeBrew Robotics Club
Hi All,

One of the rumors is that the new GPU will support the 4 bit instruction set.

Does anybody know if the robot boards or USB AI sticks supports 4 bit models?

Chris Albertson

unread,
May 7, 2024, 2:30:34 PM5/7/24
to hbrob...@googlegroups.com
It hardly matters what the “USB AI sticks" support because they can only run very small networks.   They have small amount of memory and swapping over the USB interface is dead-slow.    They are best used for image clasification of low-resolution images.  They are very good at recognizing humans and other objects.  My Correl USB dongle only does 8-bit integers

I just read the link.   I found this:  "up to 36GB of memory in the RTX 5090.”  4-bits would be a specialist feature not usable except in some special cases.  But 36GB would enable running a good set of models locally.  This would be the GPU to get for AI projects.

What remains to be seen is if the 5090 installed in a PC is cost competitive with Apple Silicon.   The PC needs a 1000 Watt power supply and some way to cool 1000 Watts of heat, The 5090 will sell for maybe $2,000 and you’d need another $2K for the PC.   So let’s say a $4K upfront cost.     Then at the cost of power in California that 1000 Watt power supply can add $300 per month to you electric bill.  That is $3,600 per year.

The Apple Mac Studio is looking good by comparison especially if you are paying 20 cents per KWH for power.


--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hbrobotics/8a36d910-7a48-4f2b-8d14-c12b39f0bec8n%40googlegroups.com.

Stephen Williams

unread,
May 7, 2024, 5:52:48 PM5/7/24
to hbrob...@googlegroups.com

The 4-bit modes are used to hold gigantic models in much less memory.  The lower the number of bits per 'float', the less fidelity, but sometimes it seems to work close to the same.  Maybe some of these models have variable numbers of bits for different parts of the model?  (Want to dig into this soon...)

The GPUs use very little power when not active.  My liquid cooled RTX-4090 very clearly powers up the cooler when stressed to various levels, then is silent otherwise - a pretty good measure of power usage.

Pretty sure we're going to learn how to compress networks down to be much more compact soon.  Of course we'll use that to create effectively even larger networks.  But we should get some very useful aspects to fit nice size / power envelopes.  When we settle on specific network models, and even specific weights to some extent, we can head toward the ASIC and/or custom processor architectures to get power down to almost nothing.  This is what the Movidius chips did for camera image processing: gigantic bandwidth, good per-pixel processing, on milliwatts.

In the mean time, a heavy-duty PC is going to be expensive & power hungry...

Seems like the main players are: NVidia, Apple, Qualcomm, Intel, and AMD.  Qualcomm is probably going to keep the lead for low-power high function mobile, with NVidia & Apple fighting out the very high end + low power, i.e. Low-SWaP.  Apple is pretty good on mobile, but not on broadly usable mobile: NVidia & Qualcomm have that end locked down.

Streaming pose + camera views + sensing to a cloud brain to get control inputs is viable for a certain range of things.  Having a local control loop that can sample + control in less than the 20-200ms range for wireless network roundtips that can still be guided by the cloud brain might be the best combination.

sdw

Stephen Williams

unread,
May 7, 2024, 6:26:14 PM5/7/24
to hbrob...@googlegroups.com

The new iPad Pro with the M4 chip can provide 38 trillion operations per second at low power, half the wattage of the previous version.

It supports Thunderbolt 4 (and USB4) which provides 32Gbps of PCIe.  Theoretically, you could connect Jetsons, Snapdragons, and even some microcontrollers to one or more iPad Pros (etc.) over PCIe at up to 32Gbps at low power.  And have some nice displays, removable 'brain'.

What do you know (err, wa-d'ya-know), there is already an open source board to do Thunderbolt<->PCIe:  (And, separately, 10GigE interfacing on the Jetsons via M2.)

https://antmicro.com/blog/2023/08/high-speed-thunderbolt-and-10gbe-connectivity-for-jetson-based-vr-smart-cameras-and-industrial-devices/

https://github.com/antmicro/thunderbolt-pcie-adapter

sdw

--

Stephen D. Williams
Founder: VolksDroid, Blue Scholar Foundation

A J

unread,
May 10, 2024, 4:10:21 PM5/10/24
to HomeBrew Robotics Club
Thanks Chris & Stephen for your informative replies.

With all the new products coming out it is tempting to upgrade the system.

Regards,

-andy

Chris Albertson

unread,
May 10, 2024, 4:31:10 PM5/10/24
to hbrob...@googlegroups.com
Why bother to connect the Jetson to the phone?   Run the app on the phone.    But the phone has seriously limited RAM and costs too much.

The M2 chip on a Mac mini runs at half that, 17 TOPS in 8-bit and in 16-bit.   Pretty soon the M4 will be on the Mini and the performance will double.     Why bother with a Jetson or Snapdragon.  The M2 is faster, M4 will be even faster.
The Mini is easy to program (it is basically a UNIX PC.) and sells for 1/2 the price of the phone.

The mini also has much more RAM.   The phone can’t run large models because it has about 4GB RAM.  The Mini can have as much as you can afford to pay for, much more than an Nvidia card.  So running a 24 GB model on a Mini is not a problem.

Pytorch and Tensorflow both run on the M2/M3 and (I assume) the M4.    

So far I’m seeing speech-to-text with translation running in about 20X faster than real-time on an M2 that is also running web browser, and 3D CAD open.  Running the LMM will have to wait a couple weeks.   My M2 is “last generation” the M3 is the current and M4 was just announced.   The M2 is good enough.  It’s small too, about 8” square and 2” tall.





--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.

Stephen Williams

unread,
May 10, 2024, 9:43:13 PM5/10/24
to hbrob...@googlegroups.com

A Mac mini is nice, less expensive because it doesn't have a display, battery, or mobile efficiency.  But on-robot compute could benefit from those, if it is worth including an expensive brain.

The new iPad Pro M4 tablets have up to 16GB LPDDR5X RAM.
https://www.trendforce.com/news/2024/05/08/news-apple-unveiled-m4-chip-for-ai-heralding-a-new-era-of-ai-pc/

I'm not clear how much of a pain it would be to write an app to make use of the GPU, neural chip, etc.  A Mini is going to be a lot easier to develop for.  Probably best to develop for the Mini, then deploy somewhat widely on the iPad etc.

I'm a lot more comfortable developing for Android, so any Android device is good there.  Especially if you can root it, but that isn't so required anymore.

The Samsung S24 Ultra has 12GB of RAM.  In a few years, refurb S24s are going to make great robot computers.  I have one old Samsung for just that reason.

sdw

Chris Albertson

unread,
May 11, 2024, 1:08:11 AM5/11/24
to hbrob...@googlegroups.com

On May 10, 2024, at 6:43 PM, 'Stephen Williams' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:

A Mac mini is nice, less expensive because it doesn't have a display, battery, or mobile efficiency.  But on-robot compute could benefit from those, if it is worth including an expensive brain.

I like to do software-first.  By the time it works, there will be better hardware to run it.    The key is to code to an interface that will last,  I think Either Pytorch or TensorFlow will be around for a while.

I’ve also given up on placing big computers inside robots.  It gets expensive because the robot needs to be very large.   WiFI now can do  >1Gb and can be faster than normal Ethernet.  And WiFi 7 at 5 Gbps is here now.



The new iPad Pro M4 tablets have up to 16GB LPDDR5X RAM.
https://www.trendforce.com/news/2024/05/08/news-apple-unveiled-m4-chip-for-ai-heralding-a-new-era-of-ai-pc/

I'm not clear how much of a pain it would be to write an app to make use of the GPU, neural chip, etc.  A Mini is going to be a lot easier to develop for.  Probably best to develop for the Mini, then deploy somewhat widely on the iPad etc.

Just write to the Pytorch interface.  The same code will run on the CPU, GPU or whatever is faster.   The check is done at run time so the same binary runs on different hardware.     In the Apple world, Macs can run iPhone apps, same binary apps will run on the Mac or iPad, if the needed hardware is present.   (Obviously the Mac has no IMU, GPS or camera or touch screen.   But some apps will run  on either platform.

But the phones are all low-RAM.   Now that I have Whisper running I’m looking at Apple’s openELM as they claim it beats “Llama” (which is basically GPT-2) and will run on an iPhone or Mac (or anything else.)

I looking at the openXLM source code and it appears to run on Linux/Nvidia or an iPhone or Mac.   From comments in the code it seems they are still training on Linux-on-Intel




I'm a lot more comfortable developing for Android, so any Android device is good there.


DOn’t worry about it, Code at a higher level then you compile for whatever hardware is best when it comes time.  There will be different hardware by then.


A J

unread,
May 11, 2024, 11:17:08 PM5/11/24
to HomeBrew Robotics Club
The recent CPU & GPU chips are very fast but software and AI might run faster if the

chips were 10 - 15x bigger. Its seems like it would take a lot of software to make a

quad-ped act like a dog or bi-ped to work seamlessly in a human environment.

Chris Albertson

unread,
May 12, 2024, 12:03:16 AM5/12/24
to hbrob...@googlegroups.com

On May 11, 2024, at 8:17 PM, A J <aj48...@gmail.com> wrote:

The recent CPU & GPU chips are very fast but software and AI might run faster if the

chips were 10 - 15x bigger.

It’s only money.  If you computer is not fast enough you can buy 10 or 15 computers.   The typical data center server has about four H100 cards.  Each card is about $30K.  Servers tend to cost $200K each by the time you add racks,power and cooling and so on.    The bigger companies have many thousands of these servers.


If all that were needed was more computing power, we would be there already.


What is needed is not 100X more of the same but a fundamental breakthrough.  Some smart person to invent the next step in AI.   LLMs were a big step forward but they will NEVER will be able to think and learn while they operate.  all the “smarts” happen in training, then they are static. and weights and biases are fixed.  

But in the meantime, we can build some interesting machines with what we have.   



 

Stephen Williams

unread,
May 12, 2024, 3:09:55 PM5/12/24
to hbrob...@googlegroups.com
On 5/10/24 10:07 PM, Chris Albertson wrote:
I looking at the openXLM source code and it appears to run on Linux/Nvidia or an iPhone or Mac.   From comments in the code it seems they are still training on Linux-on-Intel


openXLM?

Chris Albertson

unread,
May 12, 2024, 4:23:42 PM5/12/24
to hbrob...@googlegroups.com
Sorry a typo, openELM. (ELM = Efficient Language Model). 

This is Apple’s latest, their goal was not to be smarter than GPT-4 but to be MUCH faster and run locally on an iPhone or Mac.     It can also run on an Intal PC nut you’d want to have a mid-range Nvidia GPU card.

Apple claims that openELM is comparable to Meta’s Llama2 but it uses less than half as many parameters and took much less time to train.  If this is true, I’ll likely end up runing openELM.   But right now my goal is to set up a searver to runs openAI’s API.  The server will be able to use different LLMs.    To make my job easier I’m initially going to use Llama2 because everything aroubnd it is more mature and seems better documented.   I can switch later.

Runing an openAI API server locally will be a resource for any future robot and I like the idea of a stable API.      It’s not going to be finished today, there is much reading and I have to track down many dependencies and make each of them work.


--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.

A J

unread,
May 20, 2024, 1:05:30 PM5/20/24
to HomeBrew Robotics Club
The compute power for robots seems to have a pretty good path for the next decade.

If industry could support quality open source collaborations across hardware platforms

it would help. With so many programmers in the world what Dev Model would work best?


Chris Albertson

unread,
May 20, 2024, 2:41:24 PM5/20/24
to hbrob...@googlegroups.com

On May 20, 2024, at 10:05 AM, A J <aj48...@gmail.com> wrote:

The compute power for robots seems to have a pretty good path for the next decade.

If industry could support quality open source collaborations across hardware platforms

it would help. With so many programmers in the world what Dev Model would work best?

What model?  ROS packages.   

If you package your software as a ROS2 package then others can use it.  But it needs to do a function that is widely needed and used by many different robots.  The packages that are installed by default are good examples.  Every mobile robot needs “localization” and any robot with arms needs trajectory planning (using Moveit). Visual odometry is another.

For walking robots, we need what I’d call “balance planning”.  I think not-alling-down requires some “thinking ahead”.  Then of course you need a very general-purpose cyclic motion generator that can drive bipeds, quads and even octo-spider robots.  The more general purpose it is the wider it would be used.

How to start?    All the existing packages took the same route.  One person wrote it, then others saw that it was useful.  Most asked for changes but a few contributed changes and it snowballed.  As the package became better it was more widely used and gained more contributions.

The thing that hangs it up is that the user-to-contributer ratio might be 5,000 to 1.  For example how many people download and run SLAM and how many people contribute improvements to it?  5000 to 1 might be optimistic.      

You get collaboration only after you have 10,000 users, then you have two “collaborators”. with 60,000 users you can have a core development team.   You will never get the team first, you have to write it yourself first.   OK, the person who starts it could be one, two or maybe three people but the nuber is usually one.

The best example of this is Linux.  Linus wrote an entire UNIX-like kernal himself, then a few people wanted to try it and as said, one in a few thousand offered to help. But now with literally billions of Linux users there are many thousands of contributors including corporat contributors.  (and YES, Linus has billions of users.  Even Microsoft and Apple run Linux in their data centers.  Every Andriod phone and 99% of all WiFi routers and TV set top boxes and even every one of those SpaceX starlink satellites runs Linux.  Tesla cars run Linux and the list goes on an one.)

But no group came together to write Linux, It takes one person to(or a tiny group) to get the ball rolling.   I can’t think of any exceptions to this

I’ve been thinking of cyclic pattern generators and I have one working but I don’t likle it.  The code does not seem elegant and there is no generalize API.   I think I’ll need to re-wwrite it a couple more times.  You learn something on each round.

My pattern generator is a compromise between laboriously writing Python code to move a foot up then forward and then down to the ground and 1,000 variations of that for running and turning and stopping and walking up stairs and whatever.   The other extreme is to connect motor torque values to the output of a neural network and let the robot flail around for millions of simulated hours.   The problem with that is, that every new thing you want to teach it takes a million hours.     Baby horses learn to walk in hours and baby humans seem to know how to walk just as soon as their legs are strong enough, We never see babies flailing around and making random legs motions.  No, they seem to already have the basic idea but are not strong enough and fall. We are all “hard wired” to some degree.   Hearts don’t have to learn how to beat and we don’t have to learn to breathe.  Walking has the same hard-wired system.     A parrn generator implement what is hard-wired in anamals, It is both hard wired and can learn.

I don’t expect anyone to want to help with this even if it would be applicable to many robots.


You also see the problem when Unitree G1 was announced.  Many people said “I want to buy one”. bnut as of today I’ve yet to see some one say, “Let’s write a really good kinematics system for G1”.     It is that 5,000 to one ratio again.   We will not hear “let’s write a X system for G1” untill there are about 20,000 G1 robots






On Sunday, May 12, 2024 at 1:23:42 PM UTC-7 Chris Albertson wrote:
Sorry a typo, openELM. (ELM = Efficient Language Model). 

This is Apple’s latest, their goal was not to be smarter than GPT-4 but to be MUCH faster and run locally on an iPhone or Mac.     It can also run on an Intal PC nut you’d want to have a mid-range Nvidia GPU card.

Apple claims that openELM is comparable to Meta’s Llama2 but it uses less than half as many parameters and took much less time to train.  If this is true, I’ll likely end up runing openELM.   But right now my goal is to set up a searver to runs openAI’s API.  The server will be able to use different LLMs.    To make my job easier I’m initially going to use Llama2 because everything aroubnd it is more mature and seems better documented.   I can switch later.

Runing an openAI API server locally will be a resource for any future robot and I like the idea of a stable API.      It’s not going to be finished today, there is much reading and I have to track down many dependencies and make each of them work.


On May 12, 2024, at 12:09 PM, 'Stephen Williams' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:

On 5/10/24 10:07 PM, Chris Albertson wrote:
I looking at the openXLM source code and it appears to run on Linux/Nvidia or an iPhone or Mac.   From comments in the code it seems they are still training on Linux-on-Intel


openXLM?


-- 
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.

Dan

unread,
May 20, 2024, 4:49:54 PM5/20/24
to hbrob...@googlegroups.com
Using Linux as an example might not be the best way to start an open source project...Read the Wiki history of Linux. Originally Linus created a kernal, not a full OS. He ported other peoples GNU code to get the kernal to do something. He also kept the original kernal under his own license for a few years.

Additionally, IMHO, an robot OS should not take 30 seconds to boot up.  Imagine a robot pilot that for any number of reasons reboots while landing.

Using other peoples code breeds lazy programmers.  Sure open source is free, but in my 45 years as a professional contract programmer I have NEVER seen two systems that work the same. Ultimately, to serve everyone, more and more functions are added until the project is so bloated that it is unmanageable, inefficient and slow.

Do what you like but I opt for less code that does what I want and not what 5000 other people think it should do.

The code I write boots in under a second, has high speed communication and collaboration with other processors that provide distributed edge inference (without internet connectivity) for their individual tasks.

But by all means...keep drinking the Linux/ROS2 Kool Aid.

camp .

unread,
May 20, 2024, 5:11:18 PM5/20/24
to hbrob...@googlegroups.com
But by all means...keep drinking the Linux/ROS2 Kool Aid.

If you're waiting on me to write a mapping, localization, navigation, or visualization routine forget about it. But with ROS I am able to copy and paste my way. It's not so much laziness as just getting it done plus interacting with folks who know a lot more than me; speaking a common language.

- Camp


On Mon, May 20, 2024 at 4:49 PM, 'Dan' via HomeBrew Robotics Club

Chris Albertson

unread,
May 20, 2024, 6:17:40 PM5/20/24
to hbrob...@googlegroups.com

On May 20, 2024, at 2:11 PM, camp . <ca...@camppeavy.com> wrote:

But by all means...keep drinking the Linux/ROS2 Kool Aid.

If you're waiting on me to write a mapping, localization, navigation, or visualization routine forget about it. But with ROS I am able to copy and paste my way. It's not so much laziness as just getting it done plus interacting with folks who know a lot more than me; speaking a common language.

I agree, but it is not just your preferences.  It is the impossibility of doing otherwise.

The problem is the finite lifespan of humans.    Some years ago I wrote a system based on Postgres DBMS.   Yes I could have said SQL is bloated and too big and then written my own “better" database software.  But it took a team of people 10 years to write Postgres and they did not start from zero.  To duplicate their effort would have taken me about 50 years of full-time work.

The other problem is education.   I could write a robot localizer that fused inputs from multiple sensors but I only studied the usual two years of calculus and one semester of linear algebra.  I would need a bit more education to be able to write what I needed myself.   And that assumes I did not forget 80% of what I used to know.

I think if you can say “I wrote it all myself in a few weeks” it is the same as saying. ”it is very small and not very complex”



Dan

unread,
May 20, 2024, 7:39:30 PM5/20/24
to hbrob...@googlegroups.com
There is no denying that ROS has fantastic features used by many people.  It just doesn't fit my requirements.

That's why we have chocolate AND vanilla ice-cream.

Store bought and home brewed.

But on another topic....4 bit ai...
I have heard a lot of buzz lately about custom hardware (ASICs) that are optimized for specific LLMs.

I posit that forgotten technologies from the 70s may hold a key for better normalization in AI.  Radix-50 was a technique where you could stuff 3 five bit capital ascii letters into a sixteen bit memory location with one extra bit.

In the early 80s variable bit size CPUs were gaining some strength too.

So I wonder how more efficient a variable bit size GPU would be. We could normalize to 5,6 or 7 bits for example depending on the data.

Just a thought.





Chris Albertson

unread,
May 20, 2024, 9:32:53 PM5/20/24
to hbrob...@googlegroups.com


But on another topic....4 bit ai...
I have heard a lot of buzz lately about custom hardware (ASICs) that are optimized for specific LLMs.


The currrent devices and future ones are pretty much forced to using 4, 8, 16, 32, 64 bit words.    Things like 7 or 5-bit words just do not fit.  The reason is that the device needs to be versatile enough to run multiple different models, perhaps even several different models all at the same time.   So they all use a variable word length

Yes, just what you asked for, variable size.   Well varible except at the lowest cost chips where variability adds complexity and drives the cost up.  Correls TPU chip is 8-bits only but it costs abot $35 retail and there is no way it could run an actual LMM.  It is best used for smaller neural networks

It is most efficient if when you shorten a full word that you then have two.  If you cut it unevenly thern you have some odd-sized leftover bits that you might not get to use.     So today if the model is quatitixerd to 8-bits, we can grab a chunk of VRAM and then divide each of ther 64-bit words into four 8-bit words and have no waste

The maximum size is best if it is a power of two so that you can subdivide the word just by adding or ignoring address pins (because the address pins have only 2 states.)

Finally, we can choose the bit width for any model.  Let’s say you wanted to run Meta’s “Llama 2” on you own computer.  You would likely test it at different bit widths and select whatever you thought was the best compromise of accuacy, battery life and speed.     Nvidia GPS can run all kinds of data types, 16-bit integera and floats and 8_bits and 64-bits and it is cheaper because they can combine or split units, as long as all the valid sizes are powers of two.

Dan

unread,
May 20, 2024, 11:19:53 PM5/20/24
to hbrob...@googlegroups.com
I think you missed the concept of what is meant by variable bit size in hardware.

Older and now newer technologies are variable at the bit level...not the a power of two. 
And please try not to use the ambiguous nomenclature of "word" all word lengths are not 16 bits...just as all bytes are not 8 bits. It is better to use something like U16 which basically says unsigned 16 bits. If you need an example of a byte that is NOT 8 and a word that is NOT 16 you can google it.

The reason the variable bit processors died out in the early 80s was because as you indicated they cost more to put in the IP logic needed to accomplish this style of processing.

But now a new problem has crept up that calls for a resurgence of these technologies.
Power usage of large data centers running inferences on LLMs and some not large.

The bottom line is that power is a function of the number of transistors. It takes less transistors to run a 7 bit machine than it does an 8 bit.  Variable bit memories can transfer on arbitrated variable width buses...Example: on a 32 bit wide bus I can simultaneously move 5 data streams of (7, 8, 9, 5 and 3.
Machine code instructions are designed to handle smaller data size and can also embed multiple data in the instruction.
Smaller models run faster, in this case I can run 5 models of different sizes very efficiently. 

I hope you can see the value of this style hardware.  Sure it costs more but the energy savings over time is much larger.



--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion on the web visit

Stephen Williams

unread,
May 21, 2024, 3:31:25 PM5/21/24
to hbrob...@googlegroups.com

I am also uncomfortable piling on thick, complex, inefficient layers just because it is easy to do so.  It's the general problem I have with a lot of Java stacks & approaches.  ROS2 still has me a bit uncomfortable, although it has a great argument for use in prototyping, trying out drivers to new hardware.  I suspect that for anything that gets past the prototype stage, I'll want to build something lean that is equivalent.  I'm unclear how practical that is in varous scenarios, but I know I'll want to.

On the OS, I guess it depends on a lot of factors.  Linux is always going to have the best ease of development, deployment, etc.  Zephyr is probably a good target for embedded compute above extremity controllers.

NVIDIA Jetson boards can be configured to boot in 3 seconds:

https://docs.nvidia.com/jetson/archives/r34.1/DeveloperGuide/text/SD/Kernel/BootTimeOptimzation.html

Depending on details, it should be possible to boot Linux in under a second.  Some Android phones could boot very quickly, although radio startup often takes a bit after that.

There is a major Linux-like, Linux-inheriting embedded OS for more hard-core real-time & fast startup cases: Zephyr.  It is now a Linux Foundation project, and is being supported by more and more hardware manufacturers.  My impression of it is that it borrows as much as possible from Linux to solve hard problems, but is directly focused on being a good RTOS.  Probably the most important thing borrowed is the device tree & device drivers.  I suspect it overlaps the Linux ABI quite a bit.

https://www.zephyrproject.org/

"Zephyr supports more than 600 boards. Search our list for the hardware used in your application. This diversity of supported boards gives developers and product manufacturers multiple options to solve their embedded RTOS challenges with Zephyr. If your board is not supported out of the box, adding support for a new board is simple."

Zephyr has micro-ROS, a port of ROS2:

https://www.zephyrproject.org/micro-ros-a-member-of-the-zephyr-project-and-integrated-into-the-zephyr-build-system-as-a-module/

I happened to find Zephyr when I found civetweb and was integrating it along with many other C/C++ libraries into a CMake build.  I saw a reference to Zephyr in the civetweb CMakeLists.txt.  Zephyr is a whole OS as a single CMake build.  Normally a cross-compiling build I expect.  That's very nice.


I'm comfortable stripping down Linux, or using already stripped down Linux instances.  I used to regularly reconfigure, rebuild, and modify the Linux kernel.  A number of alternatives to the monolithic kernel have been created, but through a number of clever techniques, the Linux kernel has mostly keep up and been better.  But it does take a certain level of complexity, at least to run fast.  There are some fun projects that run Linux kernels in an emulator on hardware that doesn't have an MMU, using external storage to get enough memory.  Not practical, but neat.  However, with Zephyr, I don't expect to be doing such Linux hacking.  And powerful enough microcontrollers are inexpensive enough that things like uclinux are probably not worth the effort.

https://www.eetimes.com/how-uclinux-provides-mmu-less-processors-with-an-alternative/

https://popovicu.com/posts/789-kb-linux-without-mmu-riscv/

This is an interesting article on related topics: https://jaycarlson.net/embedded-linux/


sdw

Steve " 'dillo" Okay

unread,
May 21, 2024, 4:30:03 PM5/21/24
to HomeBrew Robotics Club
On Monday, May 20, 2024 at 1:49:54 PM UTC-7 Dan wrote:
Using Linux as an example might not be the best way to start an open source project...Read the Wiki history of Linux. Originally Linus created a kernal, not a full OS. He ported other peoples GNU code to get the kernal to do something. He also kept the original kernal under his own license for a few years.

Additionally, IMHO, an robot OS should not take 30 seconds to boot up.  Imagine a robot pilot that for any number of reasons reboots while landing.

I just got back from a week at a client-site working on an autonomous vehicle which it took up to 5 minutes to start.
The remote start signal was sent from their office in SF to a box w/ a cellular modem in it that tripped a relay to switch the whole thing onto battery-power.
It took a couple minutes for the command to wend its way through their systems, hit the mobile network and ping the box mounted on the back of the car.

30 seconds isn't that long to come up from a dead start. It's long enough for a user to push the button, turn away to grab a controller or their laptop, etc. and then the system is up.
People start their cars all the time, which starts more or less instantly, and then sit there for the next 5 minutes fiddling with the controls or their phone or something before driving away.

'dillo

Dan

unread,
May 21, 2024, 5:04:26 PM5/21/24
to 'Stephen Williams' via HomeBrew Robotics Club
Thanks Stephen, this is very good information.
One of my gripes with Posix style RTOSs in general is during driver optimization and debugging. The use of IOCTL pointers to driver functions makes it difficult to step thru code.
My personal preference is FreeRTOS. It allows me to write very fast and efficient drivers in C that can easily step thru the C and assembly code.  I also NEVER use the compiler optimizer because of the mess it causes with the debugger.

I think there is a micro-ROS port to FreeRTOS.

Lately I have been attending the TinyML group meetups.  Since most of what I work on is battery powered I need to be small and low power. The newer NPUs are bringing useful ML apps to the edge.  If anyone monitoring this thread is using an NPU that they like please start a new thread.
I would be very interested in getting some first hand info.



Stephen Williams

unread,
May 21, 2024, 5:34:00 PM5/21/24
to hbrob...@googlegroups.com

OK for a prototype perhaps, but in general, horrifying.  In just about all circumstances, communications should take at most 3 seconds, and usually <1s.  OS startup should be fast, although to be fully up might take longer to do full system checks, scans, validations.  There are a lot of bad messaging systems out there, often connected to convoluted cloud computing systems.

My 2023 Toyota Highlander Hybrid is ready to shift into reverse in about 1-2 seconds from when I hit the start button.  I know it is ready when it unlocks shifting.  I do all of my fiddling after I've started rolling.  The entertainment system takes a few more seconds to start up.  It isn't really booting of course, just in power save mode with the system computer always mostly ready.  A lot like a mobile phone.  Robots, generally, will be always-on like a mobile phone, table, or computer.  MacBooks are always running, even when mostly sleeping: They wake up periodically in a very low power, low clock rate mode to check email, etc.  (I had a friend who managed the email group at Apple one time who mentioned that feature.)

sdw

--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.

Chris Albertson

unread,
May 30, 2024, 2:52:28 PM5/30/24
to hbrob...@googlegroups.com

On May 21, 2024, at 2:04 PM, 'Dan' via HomeBrew Robotics Club <hbrob...@googlegroups.com> wrote:



I think there is a micro-ROS port to FreeRTOS.


Yes, but it might be moot.   You will find that under the hood FreeRTOS or STM’s Mbed has been incorporated into the Arduino IDE.   For example Aruino’s support for RP2040 has STM’s Mbed RTOS included.   FreeRTOS is also used in some cases.   It is transparent to the user


Micro Python is that way too, if you use it, you in effect also have a RTOS.  Micro Python has true async I/O and tasking across CPU cores with no GIL.
Reply all
Reply to author
Forward
0 new messages