FP 64 support

560 views
Skip to first unread message

Jeremy Davis

unread,
Aug 30, 2012, 10:21:46 PM8/30/12
to aparapi...@googlegroups.com
Hello,
I have a video card that I think supports FP64 (NVIDIA GeForce GT 650M, on OSX 10.8, OpenCL 1.2).
But when running code that uses doubles, I see :
WARNING: Reverting to Java Thread Pool (JTP) for class com.test.GpuTest$1: FP64 required but not supported

Does anyone have any thoughts on this? Do I need to recompile my .dynlib with special settings? or?

Thanks,
-JD

Jeremy Davis

unread,
Aug 30, 2012, 11:32:03 PM8/30/12
to aparapi...@googlegroups.com
Maybe it isn't FP64 capable?

Ryan R. LaMothe

unread,
Aug 31, 2012, 12:37:31 AM8/31/12
to aparapi...@googlegroups.com
If you are using the current Trunk code, there is some code checked-in that will allow you to investigate a number of OpenCL properties. Try using OpenCLDevice.firstGPU()

If you would like a complete print-out of all available OpenCL information, that functionality is not yet available in Aparapi, but is available by using http://nativelibs4java.sourceforge.net/webstart/javacl/HardwareReport.jnlp

Witold Bołt

unread,
Aug 31, 2012, 2:18:38 AM8/31/12
to aparapi...@googlegroups.com
Hi.

The FP64 support in aparapi uses cl_khr_fp64 extension.

Default OpenCL driver on OSX 10.8 / GeForce 650M has following extensions available:

cl_APPLE_SetMemObjectDestructor 
cl_APPLE_ContextLoggingFunctions 
cl_APPLE_clut 
cl_APPLE_query_kernel_names 
cl_APPLE_gl_sharing 
cl_khr_gl_event cl_khr_byte_addressable_store 
cl_khr_global_int32_base_atomics 
cl_khr_global_int32_extended_atomics 
cl_khr_local_int32_base_atomics 
cl_khr_local_int32_extended_atomics 
cl_APPLE_fp64_basic_ops

So it seems that cl_khr_fp64 is NOT available, but maybe somehow this cl_APPLE_fp64_basic_ops could be used instead, but it would probably require changes to Aparapi kernel code generation.

Br,

BTW: To query OpenCL extensions and other info I used this little clInfo program that I've attached. It's a modified version of this code: http://graphics.stanford.edu/~yoel/notes/clInfo.c - the only modifications are the headers, so that it compiles on OSX. I build it with following command:

gcc clInfo.c -o clInfo -framework OpenCL
 
clInfo.c

Ryan R. LaMothe

unread,
Aug 31, 2012, 2:26:12 AM8/31/12
to aparapi...@googlegroups.com
Witold is correct. We've encountered this problem before on Mac OS X, but I couldn't find the Issue ticket for it...if there isn't one, could you please file a new one and include the output from the link I sent you?




Wiadomość napisana przez Jeremy Davis <jerd...@speakeasy.net> w dniu 31 sie 2012, o godz. 05:32:

gfrost

unread,
Aug 31, 2012, 10:28:34 AM8/31/12
to aparapi...@googlegroups.com
Ahh  I was not aware of the  cl_APPLE_fp64_basic_ops

So we had a similar problem with AMD devices (AMD had it's own FP64 extensions - I think they predated Khronos extensions) which we fixed up. 

So my guess is this will be fixable with a KernelRunner change to detect the capability/extension and a KernelWriter change to create the required #pragma in the generated OpenCL code.

Jeremy can you open an issue for this?

BTW I included a 'poor mans' clinfo (called cltest) in the com.amd.aparapi.jni tree.  For windows this can be built with 'ant cltest' which creates cltest_x64.exe.  I know that I did not add the apple voodoo for allowing this build to work for MAC OSX..

Witold/Ryan/Jeremy can you take a stab at adding this to build.xml then we can check this in. This cltest utility is useful for some of the capability tests and questions related to group/memory size. Also it will allow us to add some aparapi specific queries, maybe even some low level performance checks.  

Gary  

Jeremy Davis

unread,
Aug 31, 2012, 11:17:00 AM8/31/12
to aparapi...@googlegroups.com
My head is spinning from the quality of support in this group. Thanks!

clinfo.c worked perfectly.
HardwareReport.jnlp threw an exception.

Opened Issue 67,
Issues 27,40,50 are the only related (fixed) issues I could find.

Here is the output of clinfo.c
Found 1 platform(s).
platform[0x7fff0000]: profile: FULL_PROFILE
platform[0x7fff0000]: version: OpenCL 1.2 (Jun 20 2012 14:18:19)
platform[0x7fff0000]: name: Apple
platform[0x7fff0000]: vendor: Apple
platform[0x7fff0000]: extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event
platform[0x7fff0000]: Found 2 device(s).
device[0xffffffff]: NAME: Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz
device[0xffffffff]: VENDOR: Intel
device[0xffffffff]: PROFILE: FULL_PROFILE
device[0xffffffff]: VERSION: OpenCL 1.2 
device[0xffffffff]: EXTENSIONS: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats
device[0xffffffff]: DRIVER_VERSION: 1.1

device[0xffffffff]: Type: CPU 
device[0xffffffff]: EXECUTION_CAPABILITIES: Kernel Native 
device[0xffffffff]: GLOBAL_MEM_CACHE_TYPE: Read-Write (2)
device[0xffffffff]: CL_DEVICE_LOCAL_MEM_TYPE: Global (2)
device[0xffffffff]: SINGLE_FP_CONFIG: 0xbf
device[0xffffffff]: QUEUE_PROPERTIES: 0x2

device[0xffffffff]: VENDOR_ID: 4294967295
device[0xffffffff]: MAX_COMPUTE_UNITS: 8
device[0xffffffff]: MAX_WORK_ITEM_DIMENSIONS: 3
device[0xffffffff]: MAX_WORK_GROUP_SIZE: 1024
device[0xffffffff]: PREFERRED_VECTOR_WIDTH_CHAR: 16
device[0xffffffff]: PREFERRED_VECTOR_WIDTH_SHORT: 8
device[0xffffffff]: PREFERRED_VECTOR_WIDTH_INT: 4
device[0xffffffff]: PREFERRED_VECTOR_WIDTH_LONG: 2
device[0xffffffff]: PREFERRED_VECTOR_WIDTH_FLOAT: 4
device[0xffffffff]: PREFERRED_VECTOR_WIDTH_DOUBLE: 2
device[0xffffffff]: MAX_CLOCK_FREQUENCY: 2600
device[0xffffffff]: ADDRESS_BITS: 64
device[0xffffffff]: MAX_MEM_ALLOC_SIZE: 4294967296
device[0xffffffff]: IMAGE_SUPPORT: 4294967297
device[0xffffffff]: MAX_READ_IMAGE_ARGS: 4294967424
device[0xffffffff]: MAX_WRITE_IMAGE_ARGS: 4294967304
device[0xffffffff]: IMAGE2D_MAX_WIDTH: 8192
device[0xffffffff]: IMAGE2D_MAX_HEIGHT: 8192
device[0xffffffff]: IMAGE3D_MAX_WIDTH: 2048
device[0xffffffff]: IMAGE3D_MAX_HEIGHT: 2048
device[0xffffffff]: IMAGE3D_MAX_DEPTH: 2048
device[0xffffffff]: MAX_SAMPLERS: 16
device[0xffffffff]: MAX_PARAMETER_SIZE: 4096
device[0xffffffff]: MEM_BASE_ADDR_ALIGN: 1024
device[0xffffffff]: MIN_DATA_TYPE_ALIGN_SIZE: 128
device[0xffffffff]: GLOBAL_MEM_CACHELINE_SIZE: 6291456
device[0xffffffff]: GLOBAL_MEM_CACHE_SIZE: 64
device[0xffffffff]: GLOBAL_MEM_SIZE: 17179869184
device[0xffffffff]: MAX_CONSTANT_BUFFER_SIZE: 65536
device[0xffffffff]: MAX_CONSTANT_ARGS: 8
device[0xffffffff]: LOCAL_MEM_SIZE: 32768
device[0xffffffff]: ERROR_CORRECTION_SUPPORT: 0
device[0xffffffff]: PROFILING_TIMER_RESOLUTION: 1
device[0xffffffff]: ENDIAN_LITTLE: 1
device[0xffffffff]: AVAILABLE: 1
device[0xffffffff]: COMPILER_AVAILABLE: 1
device[0x1022600]: NAME: GeForce GT 650M
device[0x1022600]: VENDOR: NVIDIA
device[0x1022600]: PROFILE: FULL_PROFILE
device[0x1022600]: VERSION: OpenCL 1.1 
device[0x1022600]: EXTENSIONS: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops 
device[0x1022600]: DRIVER_VERSION: CLH 1.0

device[0x1022600]: Type: GPU 
device[0x1022600]: EXECUTION_CAPABILITIES: Kernel 
device[0x1022600]: GLOBAL_MEM_CACHE_TYPE: None (0)
device[0x1022600]: CL_DEVICE_LOCAL_MEM_TYPE: Local (1)
device[0x1022600]: SINGLE_FP_CONFIG: 0x1e
device[0x1022600]: QUEUE_PROPERTIES: 0x2

device[0x1022600]: VENDOR_ID: 16918016
device[0x1022600]: MAX_COMPUTE_UNITS: 2
device[0x1022600]: MAX_WORK_ITEM_DIMENSIONS: 3
device[0x1022600]: MAX_WORK_GROUP_SIZE: 1024
device[0x1022600]: PREFERRED_VECTOR_WIDTH_CHAR: 1
device[0x1022600]: PREFERRED_VECTOR_WIDTH_SHORT: 1
device[0x1022600]: PREFERRED_VECTOR_WIDTH_INT: 1
device[0x1022600]: PREFERRED_VECTOR_WIDTH_LONG: 1
device[0x1022600]: PREFERRED_VECTOR_WIDTH_FLOAT: 1
device[0x1022600]: PREFERRED_VECTOR_WIDTH_DOUBLE: 1
device[0x1022600]: MAX_CLOCK_FREQUENCY: 405
device[0x1022600]: ADDRESS_BITS: 32
device[0x1022600]: MAX_MEM_ALLOC_SIZE: 268435456
device[0x1022600]: IMAGE_SUPPORT: 1
device[0x1022600]: MAX_READ_IMAGE_ARGS: 256
device[0x1022600]: MAX_WRITE_IMAGE_ARGS: 16
device[0x1022600]: IMAGE2D_MAX_WIDTH: 8192
device[0x1022600]: IMAGE2D_MAX_HEIGHT: 8192
device[0x1022600]: IMAGE3D_MAX_WIDTH: 2048
device[0x1022600]: IMAGE3D_MAX_HEIGHT: 2048
device[0x1022600]: IMAGE3D_MAX_DEPTH: 2048
device[0x1022600]: MAX_SAMPLERS: 32
device[0x1022600]: MAX_PARAMETER_SIZE: 4352
device[0x1022600]: MEM_BASE_ADDR_ALIGN: 1024
device[0x1022600]: MIN_DATA_TYPE_ALIGN_SIZE: 128
device[0x1022600]: GLOBAL_MEM_CACHELINE_SIZE: 0
device[0x1022600]: GLOBAL_MEM_CACHE_SIZE: 0
device[0x1022600]: GLOBAL_MEM_SIZE: 1073741824
device[0x1022600]: MAX_CONSTANT_BUFFER_SIZE: 65536
device[0x1022600]: MAX_CONSTANT_ARGS: 9
device[0x1022600]: LOCAL_MEM_SIZE: 49152
device[0x1022600]: ERROR_CORRECTION_SUPPORT: 0
device[0x1022600]: PROFILING_TIMER_RESOLUTION: 1000
device[0x1022600]: ENDIAN_LITTLE: 1
device[0x1022600]: AVAILABLE: 1
device[0x1022600]: COMPILER_AVAILABLE: 1

Witold Bołt

unread,
Aug 31, 2012, 5:05:31 PM8/31/12
to aparapi...@googlegroups.com
Hi.

I've just committed some really simple changes to trunk (revision 646) that enable building and running cltest on OSX.

Here is the output from my system:

houp: ~/code/aparapi-read-only/com.amd.aparapi.jni$ dist/cltest
clGetPlatformIDs(0,NULL,&platformc) OK!
There is 1 platform
platform 0{
   CL_PLATFORM_VENDOR.."Apple"
   CL_PLATFORM_VERSION."OpenCL 1.2 (Jun 20 2012 14:18:19)"
   CL_PLATFORM_NAME...."Apple"
   Platform 0 has 2 devices{
      Device 0{
         CL_DEVICE_TYPE..................... CPU (0x0) 
         CL_DEVICE_MAX_COMPUTE_UNITS........ 8
         CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS. 3
             dim[0] = 1024
             dim[1] = 1
             dim[2] = 1
         CL_DEVICE_MAX_WORK_GROUP_SIZE...... 1024
         CL_DEVICE_MAX_MEM_ALLOC_SIZE....... 2147483648
         CL_DEVICE_GLOBAL_MEM_SIZE.......... 8589934592
         CL_DEVICE_LOCAL_MEM_SIZE........... 32768
         CL_DEVICE_PROFILE.................. FULL_PROFILE
         CL_DEVICE_VERSION.................. OpenCL 1.2 
         CL_DRIVER_VERSION.................. 1.1
         CL_DEVICE_OPENCL_C_VERSION......... OpenCL C 1.2 
         CL_DEVICE_NAME..................... Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz
         CL_DEVICE_EXTENSIONS............... cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats
      }
      Device 1{
         CL_DEVICE_TYPE..................... GPU (0x0) 
         CL_DEVICE_MAX_COMPUTE_UNITS........ 2
         CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS. 3
             dim[0] = 1024
             dim[1] = 1024
             dim[2] = 64
         CL_DEVICE_MAX_WORK_GROUP_SIZE...... 1024
         CL_DEVICE_MAX_MEM_ALLOC_SIZE....... 268435456
         CL_DEVICE_GLOBAL_MEM_SIZE.......... 1073741824
         CL_DEVICE_LOCAL_MEM_SIZE........... 49152
         CL_DEVICE_PROFILE.................. FULL_PROFILE
         CL_DEVICE_VERSION.................. OpenCL 1.1 
         CL_DRIVER_VERSION.................. CLH 1.0
         CL_DEVICE_OPENCL_C_VERSION......... OpenCL C 1.1 
         CL_DEVICE_NAME..................... GeForce GT 650M
         CL_DEVICE_EXTENSIONS............... cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_APPLE_fp64_basic_ops 
      }
   }
}

As for the cryptical cl_APPLE_fp64_basic_ops … have any of you managed to find any doc about it?! Because I can't… Strange. Have no idea if it's anyhow compatible with cl_khr_fp64, but obviously it's not the same as cl_khr_fp64 since the CPU has them both enabled.

And btw: in attachment you can find a patch that enables CPU mode on OSX, which was broken for some time now, due to strange Mac CPU driver. It's not yet in the main code line, since we are discussing if this is the best way to apply it. Anyway it works ;) and if you need CPU mode on OSX simply copy that file to [aparapi]/com.aparapi.jni and then issue:

patch -p0 < workgroupsize.patch 

and after that recompile the native driver. And… you should be able to run samples on CPU. In case of any problems let me know!

Br,
Witek

workgroupsize.patch

sara shafaei

unread,
Nov 28, 2012, 3:20:24 PM11/28/12
to aparapi...@googlegroups.com
Hi ,

I am getting same error: FP64 required but not supported
I have Mac 10.7.2 and here is output for my system:


Found 1 platform(s).
platform[0x7fff0000]: profile: FULL_PROFILE
platform[0x7fff0000]: version: OpenCL 1.1 (Jul 25 2011 15:56:07)

platform[0x7fff0000]: name: Apple
platform[0x7fff0000]: vendor: Apple
platform[0x7fff0000]: extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event
platform[0x7fff0000]: Found 2 device(s).
    device[0xffffffff]: NAME: Intel(R) Core(TM) i7-2635QM CPU @ 2.00GHz

    device[0xffffffff]: VENDOR: Intel
    device[0xffffffff]: PROFILE: FULL_PROFILE
    device[0xffffffff]: VERSION: OpenCL 1.1
    device[0xffffffff]: EXTENSIONS: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formats
    device[0xffffffff]: DRIVER_VERSION: 1.1

    device[0xffffffff]: Type: CPU
    device[0xffffffff]: EXECUTION_CAPABILITIES: Kernel Native
    device[0xffffffff]: GLOBAL_MEM_CACHE_TYPE: Read-Write (2)
    device[0xffffffff]: CL_DEVICE_LOCAL_MEM_TYPE: Global (2)
    device[0xffffffff]: SINGLE_FP_CONFIG: 0x3f

    device[0xffffffff]: QUEUE_PROPERTIES: 0x2

    device[0xffffffff]: VENDOR_ID: 4294967295
    device[0xffffffff]: MAX_COMPUTE_UNITS: 8
    device[0xffffffff]: MAX_WORK_ITEM_DIMENSIONS: 3
    device[0xffffffff]: MAX_WORK_GROUP_SIZE: 1024
    device[0xffffffff]: PREFERRED_VECTOR_WIDTH_CHAR: 16
    device[0xffffffff]: PREFERRED_VECTOR_WIDTH_SHORT: 8
    device[0xffffffff]: PREFERRED_VECTOR_WIDTH_INT: 4
    device[0xffffffff]: PREFERRED_VECTOR_WIDTH_LONG: 2
    device[0xffffffff]: PREFERRED_VECTOR_WIDTH_FLOAT: 4
    device[0xffffffff]: PREFERRED_VECTOR_WIDTH_DOUBLE: 2
    device[0xffffffff]: MAX_CLOCK_FREQUENCY: 2000
    device[0xffffffff]: ADDRESS_BITS: 64
    device[0xffffffff]: MAX_MEM_ALLOC_SIZE: 2147483648
    device[0xffffffff]: IMAGE_SUPPORT: 1
    device[0xffffffff]: MAX_READ_IMAGE_ARGS: 128
    device[0xffffffff]: MAX_WRITE_IMAGE_ARGS: 8

    device[0xffffffff]: IMAGE2D_MAX_WIDTH: 8192
    device[0xffffffff]: IMAGE2D_MAX_HEIGHT: 8192
    device[0xffffffff]: IMAGE3D_MAX_WIDTH: 2048
    device[0xffffffff]: IMAGE3D_MAX_HEIGHT: 2048
    device[0xffffffff]: IMAGE3D_MAX_DEPTH: 2048
    device[0xffffffff]: MAX_SAMPLERS: 16
    device[0xffffffff]: MAX_PARAMETER_SIZE: 4096
    device[0xffffffff]: MEM_BASE_ADDR_ALIGN: 1024
    device[0xffffffff]: MIN_DATA_TYPE_ALIGN_SIZE: 128
    device[0xffffffff]: GLOBAL_MEM_CACHELINE_SIZE: 6291456
    device[0xffffffff]: GLOBAL_MEM_CACHE_SIZE: 64
    device[0xffffffff]: GLOBAL_MEM_SIZE: 8589934592

    device[0xffffffff]: MAX_CONSTANT_BUFFER_SIZE: 65536
    device[0xffffffff]: MAX_CONSTANT_ARGS: 8
    device[0xffffffff]: LOCAL_MEM_SIZE: 32768
    device[0xffffffff]: ERROR_CORRECTION_SUPPORT: 0
    device[0xffffffff]: PROFILING_TIMER_RESOLUTION: 1
    device[0xffffffff]: ENDIAN_LITTLE: 1
    device[0xffffffff]: AVAILABLE: 1
    device[0xffffffff]: COMPILER_AVAILABLE: 1
    device[0x1021b00]: NAME: ATI Radeon HD 6490M
    device[0x1021b00]: VENDOR: AMD
    device[0x1021b00]: PROFILE: FULL_PROFILE
    device[0x1021b00]: VERSION: OpenCL 1.1
    device[0x1021b00]: EXTENSIONS: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes
    device[0x1021b00]: DRIVER_VERSION: 1.0

    device[0x1021b00]: Type: GPU
    device[0x1021b00]: EXECUTION_CAPABILITIES: Kernel
    device[0x1021b00]: GLOBAL_MEM_CACHE_TYPE: None (0)
    device[0x1021b00]: CL_DEVICE_LOCAL_MEM_TYPE: Local (1)
    device[0x1021b00]: SINGLE_FP_CONFIG: 0x1e
    device[0x1021b00]: QUEUE_PROPERTIES: 0x2

    device[0x1021b00]: VENDOR_ID: 16915200
    device[0x1021b00]: MAX_COMPUTE_UNITS: 2
    device[0x1021b00]: MAX_WORK_ITEM_DIMENSIONS: 3
    device[0x1021b00]: MAX_WORK_GROUP_SIZE: 1024
    device[0x1021b00]: PREFERRED_VECTOR_WIDTH_CHAR: 16
    device[0x1021b00]: PREFERRED_VECTOR_WIDTH_SHORT: 8
    device[0x1021b00]: PREFERRED_VECTOR_WIDTH_INT: 4
    device[0x1021b00]: PREFERRED_VECTOR_WIDTH_LONG: 2
    device[0x1021b00]: PREFERRED_VECTOR_WIDTH_FLOAT: 4
    device[0x1021b00]: PREFERRED_VECTOR_WIDTH_DOUBLE: 0
    device[0x1021b00]: MAX_CLOCK_FREQUENCY: 750
    device[0x1021b00]: ADDRESS_BITS: 32
    device[0x1021b00]: MAX_MEM_ALLOC_SIZE: 134217728
    device[0x1021b00]: IMAGE_SUPPORT: 1
    device[0x1021b00]: MAX_READ_IMAGE_ARGS: 128
    device[0x1021b00]: MAX_WRITE_IMAGE_ARGS: 8
    device[0x1021b00]: IMAGE2D_MAX_WIDTH: 8192
    device[0x1021b00]: IMAGE2D_MAX_HEIGHT: 8192
    device[0x1021b00]: IMAGE3D_MAX_WIDTH: 2048
    device[0x1021b00]: IMAGE3D_MAX_HEIGHT: 2048
    device[0x1021b00]: IMAGE3D_MAX_DEPTH: 2048
    device[0x1021b00]: MAX_SAMPLERS: 16
    device[0x1021b00]: MAX_PARAMETER_SIZE: 1024
    device[0x1021b00]: MEM_BASE_ADDR_ALIGN: 32768
    device[0x1021b00]: MIN_DATA_TYPE_ALIGN_SIZE: 128
    device[0x1021b00]: GLOBAL_MEM_CACHELINE_SIZE: 0
    device[0x1021b00]: GLOBAL_MEM_CACHE_SIZE: 0
    device[0x1021b00]: GLOBAL_MEM_SIZE: 134217728
    device[0x1021b00]: MAX_CONSTANT_BUFFER_SIZE: 65536
    device[0x1021b00]: MAX_CONSTANT_ARGS: 8
    device[0x1021b00]: LOCAL_MEM_SIZE: 32768
    device[0x1021b00]: ERROR_CORRECTION_SUPPORT: 0
    device[0x1021b00]: PROFILING_TIMER_RESOLUTION: 37
    device[0x1021b00]: ENDIAN_LITTLE: 1
    device[0x1021b00]: AVAILABLE: 1
    device[0x1021b00]: COMPILER_AVAILABLE: 1

So I don't have cl_APPLE_fp64_basic_ops too. Is there anyway to make it work.
Thanks,
Sara

gfrost

unread,
Nov 28, 2012, 5:25:41 PM11/28/12
to aparapi...@googlegroups.com
So your GPU device does not appear to support 64 bit float operations. As noted above, we are not really sure what cl_APPLE_fp64_basic_opts actually offers. It might be enough for us to accept it. 

Of course another 'option' is to convert your algorithm to use Java float (which will map to 32 bit float operations in OpenCL), if this suffices for your needs. 

Clearly your CPU device (even though it is Intel, which I can't encourage :) ) has 64 bit support, but the requirement is that your OpenCL driver exposes your CPU 64 bit support.

So you might consider setting execution mode to ExecutionMode.CPU which (with Wiltold's patch above) might allow your code to work using OpenCL + CPU. Sometimes this gives a performance boost over multi-threaded Java, but probably not the boost that you were hoping for. 

Gary

Ryan R. LaMothe

unread,
Nov 28, 2012, 9:34:36 PM11/28/12
to aparapi...@googlegroups.com

We know of the cl_APPLE_fp64_basic_opts on OS X...weren't we discussing to support that, but didn't want to introduce platform dependencies.

With the other 'breaking' OS X bugs, we pretty have to at this point introduce these kinds of dependencies.

BTW the OS X patch did not solve the invalid workgroup size but for our test code, unfortunately...

Message has been deleted

sara shafaei

unread,
Nov 29, 2012, 12:55:28 PM11/29/12
to aparapi...@googlegroups.com
Thanks Gary for your reply. You are right, I didn't get the performance that I expected from OpenCL.  I had a Mandelbroat multi-threaded java program and wanted to boost its performance using OpenCL. But to my surprise, I didn't get a good performance. Even if I use float in OpenCL kernel is little slower than double in my Java-Multi thread. My question is  if use programed FP64 ( as you said) is it make much slower than hardware double. I used FP128.cl and it is very very slow on my machine.
What CPU and GPU do recommend to get best results.
Thanks,
~Sara

Ryan R. LaMothe

unread,
Nov 29, 2012, 3:51:56 PM11/29/12
to aparapi...@googlegroups.com

Can you supply any source code to review?

sara shafaei

unread,
Nov 30, 2012, 9:28:58 AM11/30/12
to aparapi...@googlegroups.com
For OpenCL code, I mostly used your example and just used the same limits and iteration to compare to the multi-thread one.  Do you need multi-thread Code?
 I also tried using CPU and OpenCL. It is faster than GPU, but still not fast enough.
Thanks,
~Sara

Ryan R. LaMothe

unread,
Dec 1, 2012, 1:07:17 PM12/1/12
to aparapi...@googlegroups.com

I am asking for a couple of reasons:

- Peer code review

- Comparison on different hardware. We have everything from AMD W9000's to laptop GPUs to test against.

- What is your exact hardware? If you have 'clinfo' available on your system, please attach output. If not, just a basic hardware description will work (brand/GPU family/video ram/CPU/etc)

Thanks!

sara shafaei

unread,
Dec 3, 2012, 11:45:41 AM12/3/12
to aparapi...@googlegroups.com
Thanks! That is great. I am changing opencl part a little. I will send it to you as soon as I am done. That would be great to test it on another hardware.
I attached output from clinfo here.
Thanks again!
myclinfo.txt

gfrost

unread,
Dec 4, 2012, 10:17:31 AM12/4/12
to aparapi...@googlegroups.com
Thanks for the cltest output it really helps us dig into your platform/runtime. 

From the output we can see that your GPU does not support fp_64 

device[0x1021b00]: EXTENSIONS: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes

We search for the extension strings 'cl_khr_fp64 cl' or 'cl_amd_fp64'

Also it is fairly old GPU (only has two compute devices available) and so I would not expect great performance here.

Gary

sara shafaei

unread,
Dec 5, 2012, 1:11:58 PM12/5/12
to aparapi...@googlegroups.com
Thanks Gary,
I am still working on the code. Ya,  my GPU is not very good and kind surprise to me. I got my MAC at end of 2011, so I expected better one.
I had to try JOCL a little bit too, since I had to write my own .cl file to support double, I didn't know if it is possible in Aparapi.  I want to run and test my program in another machine with better GPU later on. So , do you think if I can get better speed with JOCL or aparapi can has same speed.
Thanks,
~Sara

gfrost

unread,
Dec 5, 2012, 1:22:23 PM12/5/12
to aparapi...@googlegroups.com
Sara

Actually there is a mechanism whereby you can write your own OpenCL and bind to a Java interface. I responded  to another discuss topic just this morning from lzero (topic was "run pre-generated OpenCL").

https://groups.google.com/forum/?fromgroups=#!topic/aparapi-discuss/n99VlLhc6mU


Gary

sara shafaei

unread,
Dec 5, 2012, 1:28:26 PM12/5/12
to aparapi...@googlegroups.com

Thanks Gary!
That is great! I will try it.
Reply all
Reply to author
Forward
0 new messages