Re: [HBRobotics] python cuda

Chris Albertson

unread,

May 25, 2025, 7:44:10 PM5/25/25

to hbrob...@googlegroups.com

I doubt many people directly use Cuda from Python. The way we’d do it is to use (say) NumPy to do the math and then NumPy uses the fastest way to do the work on that specifc computer. In my case on a Mac it might use Apple's “Metal” API and on a PC with Nvidea, then you would used CUDA if present of just the Intel vectorized instructionsif there was not graphic card.

I think in Python you really need an intermediate layer between you and the hardware. For example, if I need to multiply 5 numbers by three, a loop is dead-dog slow but this is very fast

A = np.array([1.23, 2.34, 3.45, 4.56, 5.67])

B = 3.0 * A

Did I just use Cuda on my NVIDIA GPU? Who knows? I can assume it was done in the most efficient way on whatever computer it runs on and I know 100% that I did not run a for-loop inside an interpreter. This is a trivial example, but imagine something more complex with 10,000 data points.

On May 25, 2025, at 1:00 PM, A J <aj48...@gmail.com> wrote:

Good news for the Bot builders that want to use Python, Nvidia

now supports CUDA in Python. The CuTile will be more Array

oriented than Thread in C\C++.

https://thenewstack.io/nvidia-finally-adds-native-python-support-to-cuda/

--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/5f68e9df-ccc1-4c4e-8cd8-31d13c3a9b4cn%40googlegroups.com.

Message has been deleted

A J

unread,

May 26, 2025, 12:47:59 AM5/26/25

to HomeBrew Robotics Club

Hopefully this will bring the power of C\C++ Cuda closer to the Python user. The library is not due out until later this year.

In the past, people could use PyTorch or CuPy to get GPU acceleration for NumPy like code.

https://hwbusters.com/news/a-new-era-for-gpu-programming-nvidia-finally-adds-native-python-support-to-cuda-millions-of-users-incoming/

https://developer.nvidia.com/blog/introducing-tile-based-programming-in-warp-1-5-0/

Marco Walther

unread,

May 26, 2025, 3:46:37 AM5/26/25

to hbrob...@googlegroups.com, A J

On 5/25/25 21:47, A J wrote:
> Hopefully this will bring the power of C\C++ Cuda closer to the Python
> user. The library is not due out until later this year.
> In the past, people could use PyTorch or CuPy to get GPU acceleration
> for NumPy like code.

NumPy is already C/C++ under the covers. And Tensorflow & Keras and many
others are all essentially just Python interface-ed C/C++/Cuda/...
libraries. It depends somewhat on where you get your version from, but
the source code supported even CUDA for many years.

-- Marco

>
> https://hwbusters.com/news/a-new-era-for-gpu-programming-nvidia-finally-
> adds-native-python-support-to-cuda-millions-of-users-incoming/
>
> https://developer.nvidia.com/blog/introducing-tile-based-programming-in-
> warp-1-5-0/

>
> On Sunday, May 25, 2025 at 4:44:10 PM UTC-7 Chris Albertson wrote:
>
> I doubt many people directly use Cuda from Python. The way we’d
> do it is to use (say) NumPy to do the math and then NumPy uses the
> fastest way to do the work on that specifc computer. In my case on
> a Mac it might use Apple's “Metal” API and on a PC with Nvidea, then
> you would used CUDA if present of just the Intel vectorized
> instructionsif there was not graphic card.
>
> I think in Python you really need an intermediate layer between you
> and the hardware. For example, if I need to multiply 5 numbers by
> three, a loop is dead-dog slow but this is very fast
>
> A = np.array([1.23, 2.34, 3.45, 4.56, 5.67])
> B = 3.0 * A
>
> Did I just use Cuda on my NVIDIA GPU? Who knows? I can assume it
> was done in the most efficient way on whatever computer it runs on
> and I know 100% that I did not run a for-loop inside an interpreter.
> This is a trivial example, but imagine something more complex with
> 10,000 data points.
>
>
>
>
>> On May 25, 2025, at 1:00 PM, A J <aj48...@gmail.com> wrote:
>>
>> Good news for the Bot builders that want to use Python, Nvidia
>>
>> now supports CUDA in Python. The CuTile will be more Array
>>
>> oriented than Thread in C\C++.
>>
>> https://thenewstack.io/nvidia-finally-adds-native-python-support-

>> to-cuda/ <https://thenewstack.io/nvidia-finally-adds-native-
>> python-support-to-cuda/>

>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "HomeBrew Robotics Club" group.
>> To unsubscribe from this group and stop receiving emails from it,
>> send an email to hbrobotics+...@googlegroups.com.
>> To view this discussion visit https://groups.google.com/d/msgid/
>> hbrobotics/5f68e9df-

>> ccc1-4c4e-8cd8-31d13c3a9b4cn%40googlegroups.com <https://
>> groups.google.com/d/msgid/hbrobotics/5f68e9df-
>> ccc1-4c4e-8cd8-31d13c3a9b4cn%40googlegroups.com?
>> utm_medium=email&utm_source=footer>.

>
> --
> You received this message because you are subscribed to the Google
> Groups "HomeBrew Robotics Club" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to hbrobotics+...@googlegroups.com

> <mailto:hbrobotics+...@googlegroups.com>.

> To view this discussion visit https://groups.google.com/d/msgid/

> hbrobotics/a58cb73a-0057-4c61-96fe-3c8b404e435fn%40googlegroups.com
> <https://groups.google.com/d/msgid/hbrobotics/
> a58cb73a-0057-4c61-96fe-3c8b404e435fn%40googlegroups.com?
> utm_medium=email&utm_source=footer>.

Steve " 'dillo" Okay

unread,

May 27, 2025, 10:06:12 AM5/27/25

to HomeBrew Robotics Club

There's also Numba, which I read about recently(because a project wanted it as a dependency),

which is basically JIT-compiled Python for math ops, similar to NumPy.

https://jetsonhacks.com/2024/01/15/cuda-programming-in-python-with-numba/

https://github.com/NVIDIA/numba-cuda

It's all good though.

With LLMs and other "code-spinning engines" these days able to push out the actual characters

I think the focus has shifted to finding a point where engineers/builders/creator/$PEOPLE can

meet the tools that they can use right away. There's always going to be a need for people who

understand what's going on under the hood an are able to dissect and fix so they'll use one set of

tools and others will use what suits them. Which is fine, esp. when you consider the number of

these packages that boldly declare themselves to be "...a wrapper for $PACKAGE" .

'dillo

Chris Albertson

unread,

May 27, 2025, 12:15:13 PM5/27/25

to hbrob...@googlegroups.com

Yes, Numba is a Python compiler that converts Python to machine code. It's for those who think interpreted Python is too slow. But you would still use NumPy just like you would if using the normal CPython.

MicroPython does something like this, too, but rather than using a different Python, you put “‘decorations” above a function, and it will be compiled to machine code.

I think the way to work is just to write the code and get it to work, then later, if needed, worry about getting it to run faster. Surprisingly, with simple robots, the problem is almost never that the controller runs too slow. And if that does happen it tends to be only some tiny part that is too slow

--
You received this message because you are subscribed to the Google Groups "HomeBrew Robotics Club" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hbrobotics+...@googlegroups.com.

To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/376f8edb-c946-4c8b-8821-4cff355d4c01n%40googlegroups.com.

A J

unread,

May 29, 2025, 10:36:33 PM5/29/25

to HomeBrew Robotics Club

There is a lot to be said for NumPy Vectorization. I just read a blog about CuPy the (NumPy for GPU).

For an array of 300 million doing a simple += using a loop on the GPU was about 30x slower than a

NumPy vectorized on the CPU. But the NumPy with a loop was about 180x slower than vectorized.

Chris Albertson

unread,

May 30, 2025, 12:16:43 AM5/30/25

to hbrob...@googlegroups.com

Yes. I see so many C++ programmers doing things like this in Python

A= [1, 2, 3, 4, 5]
For i in range(A)
B = A[i] + 1

When of course as you say it is about 100X faster to write
B = A + 1

Then they complain that Python is slow.

We are moving to a kind of programming where we describe what needs to be done, not how to do it. AI will take this a step further.

Charles de Montaigu

unread,

Jun 7, 2025, 7:46:19 AM6/7/25

to hbrob...@googlegroups.com

.... a New self configuring AI architecture ( Liquid AI ? ) & Quantum AI are what are next !

Le lun. 26 mai 2025, 02:47, A J <aj48...@gmail.com> a écrit :

Hi Chris,

   I think that you are right that most people who are using Python will not want to write lower level CUDA code inline.
CuPy has drop-in replacements for many NumPy and Scipy functions for GPU acceleration. I think MLK was added to
Python so multi-core math is available.

   For Intel, AMD, and Apple they all have CPUs that support integrated GPU, NPU and AVX but they do support Python.
If they make it easier to port ROS code to multi-core/multi-acceleration it would be good for business.

   The 2 nm and under chips will start shipping next year so many of these laptops or phones could power Bots.
   Some of the Nvidia charts seem to show very fast results for quad-ped control.

   But remember many new MS or MA majors are exposed to some AI programming in school. So working with Python
should not be that hard.

   I saw one small cloud company offering Blackwell service at under $5 per hour. One Google executive had said that
25% of their code is generated by AI.

   The real question is what will come next by 2030?

https://cupy.dev/
https://developer.nvidia.com/blog/introducing-tile-based-programming-in-warp-1-5-0/

On Sunday, May 25, 2025 at 4:44:10 PM UTC-7 Chris Albertson wrote:

To view this discussion visit https://groups.google.com/d/msgid/hbrobotics/553547f6-5b63-4cf1-83bf-1a68ffb8dff6n%40googlegroups.com.

Reply all

Reply to author

Forward