Problem with FFT spectrogram example

38 views
Skip to first unread message

Radio Geek

unread,
Mar 21, 2021, 9:50:22 AM3/21/21
to reikna
I am trying to run the FFT spectrogram example from here:


I get the following error:

(base) C:\Users\engel\Documents\python_test>python demo_specgram.py
Traceback (most recent call last):
  File "demo_spegram.py", line 221, in <module>
    specgram_reikna = Spectrogram(
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\core\computation.py", line 206, in compile
    return self._get_plan(
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\core\computation.py", line 192, in _get_plan
    return self._build_plan(plan_factory, thread.device_params, *args)
  File "fft1a.py", line 200, in _build_plan
    plan.computation_call(self._transpose, output, temp)
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\core\computation.py", line 500, in computation_call
    self._append_plan(computation._get_plan(
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\core\computation.py", line 192, in _get_plan
    return self._build_plan(plan_factory, thread.device_params, *args)
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\algorithms\transpose.py", line 169, in _build_plan
    self._add_transpose(plan, device_params,
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\algorithms\transpose.py", line 147, in _add_transpose
    plan.kernel_call(
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\core\computation.py", line 466, in kernel_call
    kernel = self._thread.compile_static(
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\cluda\api.py", line 563, in compile_static
    return StaticKernel(self, template_src, name, global_size,
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\cluda\api.py", line 777, in __init__
    vs = VirtualSizes(
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\cluda\vsize.py", line 322, in __init__
    raise OutOfResourcesError(
reikna.cluda.OutOfResourcesError: Requested local size is greater than the maximum 256

I know my pyopencl installation is OK, because I can run other pyFFT examples.  What is wrong?  Is the exanple obsolete?  I have tried this with smaller array size, but same result.  -regards- Bill

Bogdan Opanchuk

unread,
Mar 21, 2021, 8:18:30 PM3/21/21
to reikna
This is caused by Transpose not being smart enough to select the proper local size. There is a somewhat crude solution in place, but it seems to have failed and tried to execute a kernel with a bigger local size than the device can handle. What device are you using? Could you go to C:\Users\engel\anaconda3\lib\site-packages\reikna\algorithms\transpose.py::Transpose._add_transpose() and after the lines 

        if block_width ** 2 > device_params.max_work_group_size:
            # If it is not CPU, current solution may affect performance
            block_width = int(numpy.sqrt(device_params.max_work_group_size))

Add

        print(block_width, device_params.max_work_group_size)

And see what it outputs? If max_work_group_size was 256, as it seems from the error, it should have settled on block_size=16, and local_size=(16, 16). But something apparently have gone wrong - either with this code, or with the OpenCL driver reporting inconsistent values (which happens occasionally).

Radio Geek

unread,
Mar 22, 2021, 10:43:45 AM3/22/21
to reikna
Hello Bogdan - thanks for the response.  I added this line, and it printed:    32  1024

The device I am trying to use is:    'GeForce GTX 1060 6GB

-regards - Bill

Bogdan Opanchuk

unread,
Mar 22, 2021, 4:08:19 PM3/22/21
to reikna
(I wrote a message and google groups seemingly ate it, sorry if they both end up reaching you. Have to re-type now.)

This is quite strange - 32 is the correct block size selected, so the kernel requests 32x32 local size which should fit into the 1024 maximum, but at virtual size selection stage it turns out that the maximum is only 256. Could you do a little more debugging?

1. In the same place in `_add_transpose()`, print `device_params.max_work_item_sizes`
2. In reikna/cluda/vsize.py::VirtualSizes.__init__(), at the start, print `device_params.max_work_group_size`, `device_params.max_work_item_sizes`, `virtual_global_size`, `virtual_local_size` and `max_local_size`
3. In the same function, right before the first occurrence of `raise OutOfResourcesError`, print `virtual_local_size` and `max_work_group_size`

Also, what is that `fft1a.py` in the call stack? Would it be possible to see the full code you're running?

Radio Geek

unread,
Mar 22, 2021, 9:48:26 PM3/22/21
to reikna
I had to modify the print code a bit (different python version?) to -

- in _add_transpose() - print ('device_params.max_work_item_sizes=',device_params.max_work_item_sizes)
- in vsize.py:
 print(' at _init_ of vsize.py: device_params.max_work_group_size, device_params.max_work_item_sizes, virtual_global_size, virtual_local_size and max_local_size')
 print (device_params.max_work_group_size, device_params.max_work_item_sizes, virtual_global_size, virtual_local_size and max_local_size)

and

print ('virtual_local_size and max_work_group_size=',virtual_local_size, max_work_group_size)

Results:

 at _init_ of vsize.py: device_params.max_work_group_size, device_params.max_work_item_sizes, virtual_global_size, virtual_local_size and max_local_size
1024 [1024, 1024, 64] 80640 1024
32 1024
device_params.max_work_item_sizes= [1024, 1024, 64]
 at _init_ of vsize.py: device_params.max_work_group_size, device_params.max_work_item_sizes, virtual_global_size, virtual_local_size and max_local_size
1024 [1024, 1024, 64] (1, 320, 1056) 1024
 at _init_ of vsize.py: device_params.max_work_group_size, device_params.max_work_item_sizes, virtual_global_size, virtual_local_size and max_local_size
1024 [1024, 1024, 64] (1, 320, 1056) 256
virtual_local_size and max_work_group_size= (32, 32, 1) 256

Traceback (most recent call last):
  File "demo_specgram.py", line 223, in <module>

    specgram_reikna = Spectrogram(
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\core\computation.py", line 206, in compile
    return self._get_plan(
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\core\computation.py", line 192, in _get_plan
    return self._build_plan(plan_factory, thread.device_params, *args)
  File "demo_specgram.py", line 202, in _build_plan

    plan.computation_call(self._transpose, output, temp)
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\core\computation.py", line 500, in computation_call
    self._append_plan(computation._get_plan(
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\core\computation.py", line 192, in _get_plan
    return self._build_plan(plan_factory, thread.device_params, *args)
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\algorithms\transpose.py", line 174, in _build_plan
    self._add_transpose(plan, device_params,
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\algorithms\transpose.py", line 152, in _add_transpose

    plan.kernel_call(
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\core\computation.py", line 466, in kernel_call
    kernel = self._thread.compile_static(
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\cluda\api.py", line 563, in compile_static
    return StaticKernel(self, template_src, name, global_size,
  File "C:\Users\engel\anaconda3\lib\site-packages\reikna\cluda\api.py", line 777, in __init__

When you see "fft1a.py" - this is just the filename I saved the demo program ( demo_specgram.py ) with first; now I named it demo_specgram.py to avoid confusion.
In case this is related to Python version, I have the following:

Python 3.8.5 (default, Sep  3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.

-regards - Bill

Bogdan Opanchuk

unread,
Mar 23, 2021, 7:21:01 PM3/23/21
to reikna
Thank you, I think I understand what's happening now. Transpose was trying to make a kernel with 32x32 local size, but OpenCL reported that the maximum size of the kernel is 256 work items, not because it's the theoretical maximum for the device, but it is the maximum for this specific kernel (generally determined by the number of registers GPU has available). I've never had this problem with Transpose, because it was always limited by the theoretical maximum work group size, so the computation was not made adaptive (didn't try decreasing local sizes until it could compile). Other computations, like FFT, are adaptive in that sense.

So, I created a branch https://github.com/fjarri/reikna/tree/specgram-fix - could you try it and see if it works for you? I also fixed the normalization for the spectrogram comparison; I haven't looked at this example for years, and apparently sometime they changed the normalization in matplotlib.

Reply all
Reply to author
Forward
0 new messages