Skip to first unread message

ArrayFire Users

Jun 15, 2015, 10:56:54 AM6/15/15
Release Notes
ArrayFire Downloads:
ArrayFire Documentation:
ArrayFire Support:     


Bug Fixes

* Fixed header to work in Visual Studio 2015
* Fixed a bug in batched mode for FFT based convolutions
* Fixed graphics issues on OSX
* Fixed various bugs in visualization functions

Other improvements

* Improved fractal example
* New OSX installer
* Improved Windows installer
  * Default install path has been changed
* Fixed bug in machine learning examples


Major Updates

* ArrayFire is now open source
* Major changes to the visualization library
* Introducing handle based C API
* New backend: CPU fallback available for systems without GPUs
* Dense linear algebra functions available for all backends
* Support for 64 bit integers

Function Additions
* Data generation functions
    * range()
    * iota()

* Computer Vision Algorithms
    * features()
        * A data structure to hold features
    * fast()
        * FAST feature detector
    * orb()
        * ORB A feature descriptor extractor

* Image Processing
    * convolve1(), convolve2(), convolve3()
        * Specialized versions of convolve() to enable better batch support
    * fftconvolve1(), fftconvolve2(), fftconvolve3()
        * Convolutions in frequency domain to support larger kernel sizes
    * dft(), idft()
    * Unified functions for calling multi dimensional ffts.
    * matchTemplate()
        * Match a kernel in an image
    * sobel()
        * Get sobel gradients of an image
    * rgb2hsv(), hsv2rgb(), rgb2gray(), gray2rgb()
    * Explicit function calls to colorspace conversions
    * erode3d(), dilate3d()
    * Explicit erode and dilate calls for image morphing

* Linear Algebra
    * matmulNT(), matmulTN(), matmulTT()
        * Specialized versions of matmul() for transposed inputs
    * luInPlace(), choleskyInPlace(), qrInPlace()
        * In place factorizations to improve memory requirements
    * solveLU()
        * Specialized solve routines to improve performance
    * OpenCL backend now Linear Algebra functions

* Other functions
    * lookup() - lookup indices from a table
* batchFunc() - helper function to perform batch operations

* Visualization functions
    * Support for multiple windows
    * window.hist()
        * Visualize the output of the histogram

    * Removed old pointer based C API
    * Introducing handle base C API
    * Just In Time compilation available in C API
    * C API has feature parity with C++ API
    * bessel functions removed
    * cross product functions removed
    * Kronecker product functions removed

Performance Improvements
* Improvements across the board for OpenCL backend

API Changes
* `print` is now af_print()
* seq(): The step parameter is now the third input
    * seq(start, step, end) changed to seq(start, end, step)
* gfor(): The iterator now needs to be seq()

Deprecated Function APIs
Deprecated APIs are in af/compatible.h

* devicecount() changed to getDeviceCount()
* deviceset() changed to setDevice()
* deviceget() changed to getDevice()
* loadimage() changed to loadImage()
* saveimage() changed to saveImage()
* gaussiankernel() changed to gaussianKernel()
* alltrue() changed to allTrue()
* anytrue() changed to anyTrue()
* setunique() changed to setUnique()
* setunion() changed to setUnion()
* setintersect() changed to setIntersect()
* histequal() changed to histEqual()
* colorspace() changed to colorSpace()
* filter() deprecated. Use convolve1() and convolve2()
* mul() changed to product()
* deviceprop() changed to deviceProp()

Known Issues
* OpenCL backend issues on OSX
    * AMD GPUs not supported because of driver issues
    * Intel CPUs not supported
    * Linear algebra functions do not work on Intel GPUs.
* Stability and correctness issues with open source OpenCL implementations such as Beignet, GalliumCompute.

ArrayFire Users

Jun 29, 2015, 2:03:10 PM6/29/15
Release Notes

Bug Fixes

* Added missing symbols from the compatible API
* Fixed a bug affecting corner rows and elements in \ref grad()
* Fixed linear interpolation bugs affecting large images in the following:
    - \ref approx1()
    - \ref approx2()
    - \ref resize()
    - \ref rotate()
    - \ref scale()
    - \ref skew()
    - \ref transform()


* Added missing documentation for \ref constant()
* Added missing documentation for `array::scalar()`
* Added supported input types for functions in `arith.h`

Shehzan Mohammed

Aug 31, 2015, 5:42:59 PM8/31/15
to ArrayFire Users,
Release Notes

v3.1.0 - Feature Release

Function Additions
* Computer Vision Functions
    * nearestNeighbour() - Nearest Neighbour with SAD, SSD and SHD distances
    * harris() - Harris Corner Detector
    * susan() - Susan Corner Detector
    * sift() - Scale Invariant Feature Transform (SIFT)
        * Method and apparatus for identifying scale invariant features"
          "in an image and use of same for locating an object in an image,\" David"
          "G. Lowe, US Patent 6,711,293 (March 23, 2004). Provisional application"
          "filed March 8, 1999. Asignee: The University of British Columbia. For"
          "further details, contact David Lowe ( or the"
          "University-Industry Liaison Office of the University of British"
        * SIFT is available for compiling but does not ship with ArrayFire
          hosted installers/pre-built libraries
    * dog() -  Difference of Gaussians

* Image Processing Functions
    * ycbcr2rgb() and rgb2ycbcr() - RGB <->YCbCr color space conversion
    * wrap() and unwrap() Wrap and Unwrap
    * sat() - Summed Area Tables
    * loadImageMem() and saveImageMem() - Load and Save images to/from memory
        * af_image_format - Added imageFormat (af_image_format) enum

* Array & Data Handling
    * copy() - Copy
    * array::lock() and array::unlock() - Lock and Unlock
    * select() and replace() - Select and Replace
    * Get array reference count (af_get_data_ref_count)

* Signal Processing
    * fftInPlace() - 1D in place FFT
    * fft2InPlace() - 2D in place FFT
    * fft3InPlace() - 3D in place FFT
    * ifftInPlace() - 1D in place Inverse FFT
    * ifft2InPlace() - 2D in place Inverse FFT
    * ifft3InPlace() - 3D in place Inverse FFT
    * fftR2C() - Real to complex FFT
    * fftC2R() - Complex to Real FFT

* Linear Algebra
    * svd() and svdInPlace() - Singular Value Decomposition

* Other operations
    * sigmoid() - Sigmoid
    * Sum (with option to replace NaN values)
    * Product (with option to replace NaN values)

* Graphics
    * Window::setSize() - Window resizing using Forge API

* Utility
    * Allow users to set print precision (print, af_print_array_gen)
    * saveArray() and readArray() - Stream arrays to binary files
    * toString() - toString function returns the array and data as a string

* CUDA specific functionality
    * getStream() - Returns default CUDA stream ArrayFire uses for the current device
    * getNativeId() - Returns native id of the CUDA device

* dot
    * Allow complex inputs with conjugate option
* AF_INTERP_LOWER interpolation
    * For resize, rotate and transform based functions
* 64-bit integer support
    * For reductions, random, iota, range, diff1, diff2, accum, join, shift
      and tile
* convolve
    * Support for non-overlapping batched convolutions
* Complex Arrays
    * Fix binary ops on complex inputs of mixed types
    * Complex type support for exp
* tile
    * Performance improvements by using JIT when possible.
* Add AF_API_VERSION macro
    * Allows disabling of API to maintain consistency with previous versions
* Other Performance Improvements
    * Use reference counting to reduce unnecessary copies
* CPU Backend
    * Device properties for CPU
    * Improved performance when all buffers are indexed linearly
* CUDA Backend
    * Use streams in CUDA (no longer using default stream)
    * Using async cudaMem ops
    * Add 64-bit integer support for JIT functions
    * Performance improvements for CUDA JIT for non-linear 3D and 4D arrays
* OpenCL Backend
    * Improve compilation times for OpenCL backend
    * Performance improvements for non-linear JIT kernels on OpenCL
    * Improved shared memory load/store in many OpenCL kernels (PR 933)
    * Using cl.hpp v1.2.7

Bug Fixes
* Common
    * Fix compatibility of c32/c64 arrays when operating with scalars
    * Fix median for all values of an array
    * Fix double free issue when indexing (30cbbc7)
    * Fix bug in rank
    * Fix default values for scale throwing exception
    * Fix conjg raising exception on real input
    * Fix bug when using conjugate transpose for vector input
    * Fix issue with const input for array_proxy::get()
* CPU Backend
    * Fix randn generating same sequence for multiple calls
    * Fix setSeed for randu
    * Fix casting to and from complex
    * Check NULL values when allocating memory
    * Fix offset issue for CPU element-wise operations

The source code with submodules can be downloaded directly from the following link:

New Examples
* Match Template
* Susan
* Heston Model (contributed by Michael Nowotny)

Distribution Changes
* Fixed automatic detection of ArrayFire when using with CMake in the Windows
* Compiling ArrayFire with FreeImage as a static library for Linux x86

Known Issues
* OpenBlas can cause issues with QR factorization in CPU backend
* FreeImage older than 3.10 can cause issues with loadImageMem and

ArrayFire Users

Sep 14, 2015, 10:46:31 AM9/14/15
to ArrayFire Users
Release Notes

v3.1.1 - Bug Fix Release


* CUDA backend now depends on CUDA 7.5 toolkit
* OpenCL backend now require OpenCL 1.2 or greater

Bug Fixes

* Fixed [bug]( in reductions after indexing
* Fixed [bug]( in indexing when using reverse indices


* `cmake` now includes `PKG_CONFIG` in the search path for CBLAS and LAPACKE libraries
* [heston_model.cpp]( example now builds with the default ArrayFire cmake files after installation


ArrayFire Users

Sep 28, 2015, 9:29:18 AM9/28/15
to ArrayFire Users,
Release Notes

v3.1.2 - Bug Fix Release

Bug Fixes

* Fixed [bug]( in assign that was causing test to fail
* Fixed bug in convolve. Frequency condition now depends on kernel size only
* Fixed [bug]( in indexed reductions for complex type in OpenCL backend
* Fixed [bug]( in kernel name generation in ireduce for OpenCL backend
* Fixed non-linear to linear indices in ireduce
* Fixed [bug]( in reductions for small arrays
* Fixed [bug]( in histogram for indexed arrays
* Fixed [compiler error]( CPUID for non-compliant devices
* Fixed [failing tests]( on i386 platforms
* Add missing AFAPI


* Documentation: Added missing examples and other corrections
* Documentation: Fixed warnings in documentation building
* Installers: Send error messages to log file in OSX Installer

ArrayFire Users

Oct 19, 2015, 10:13:45 AM10/19/15
to ArrayFire Users,
Release Notes

v3.1.3 - Final Bug Fix Release for 3.1

The source code with submodules can be downloaded directly from the following link:

Bug Fixes

* Fixed [bugs]( in various OpenCL kernels without offset additions
* Remove ARCH_32 and ARCH_64 flags
* Fix [missing symbols]( when freeimage is not found
* Use CUDA driver version for Windows
* Improvements to SIFT
* Fixes for Windows compilation when not using MKL [#1047](
* Fixed for building without LAPACK


* Documentation: Fixed documentation for select and replace
* Documentation: Fixed documentation for af_isnan

ArrayFire Users

Nov 16, 2015, 11:35:08 AM11/16/15
to ArrayFire Users,

Release Notes


Major Updates

  • Added Unified backend
  • Support for 16-bit integers (s16 and u16)
    • All functions that support 32-bit interger types (s32u32), now also support 16-bit interger types

Function Additions

Other Improvements

Build Improvements

  • Submodules update is now automatically called if not cloned recursively
  • Fixes for compilation on Visual Studio 2015
  • Option to use fallback to CPU LAPACK for linear algebra functions in case of CUDA 6.5 or older versions.

Bug Fixes

Documentation Updates

  • Improved tutorials documentation
  • Added return type information for functions that return different type arrays

New Examples


  • All installers now include the Unified backend and corresponding CMake files
  • Visual Studio projects include Unified in the Platform Configurations
  • Added installer for Jetson TX1
  • SIFT and GLOH do not ship with the installers as SIFT is protected by patents that do not allow commercial distribution without licensing.


Bug Fixes

  • Fixed bugs in various OpenCL kernels without offset additions

  • Remove ARCH_32 and ARCH_64 flags

  • Use CUDA driver version for Windows

  • Fixes for Windows compilation when not using MKL #1047

  • Fixed for building without LAPACK


      • Documentation: Fixed documentation for select and replace

      • Documentation: Fixed documentation for af_isnan


        Bug Fixes

        • Fixed bug in assign that was causing test to fail

        • Fixed bug in convolve. Frequency condition now depends on kernel size only

        • Fixed bug in indexed reductions for complex type in OpenCL backend
        • Fixed bug in kernel name generation in ireduce for OpenCL backend

        • Fixed non-linear to linear indices in ireduce

        • Fixed bug in reductions for small arrays
        • Fixed bug in histogram for indexed arrays
        • Fixed compiler error CPUID for non-compliant devices
        • Fixed failing tests on i386 platforms
        • Add missing AFAPI


          • Documentation: Added missing examples and other corrections

          • Documentation: Fixed warnings in documentation building

          ArrayFire Users

          Dec 7, 2015, 10:06:05 AM12/7/15
          to ArrayFire Users,
          Release Notes


          Bug Fixes


          • Tests can now be used as a standalone project
            • Tests can now be built using pre-compiled libraries
            • Similar to how the examples are built
          • The install target now installs the examples source irrespective of the BUILD_EXAMPLES value
            • Examples are not built if BUILD_EXAMPLES is off


          • HTML documentation is now built and installed in docs/html
          • Added documentation for af::seq class
          • Updated Matrix Manipulation tutorial
          • Examples list is now generated by CMake
            • Examples are now listed as dir/example.cpp
          • Removed dummy groups used for indexing documentation (affected doxygen < 1.8.9)

          ArrayFire Users

          Sep 13, 2016, 4:22:33 PM9/13/16
          to ArrayFire Users,

          Release Notes


          The source code with submodules can be downloaded directly from the following link:

          Major Updates


          • Sparse Matrix and BLAS 1 2
            • Support for CSR and COO storage types.
            • Sparse-Dense Matrix Multiplication and Matrix-Vector Multiplication as a part of af::matmul() using \ref AF_STORAGE_CSR format for sparse.
            • Conversion to and from dense matrix to CSR and COO storage types.
          • Faster JIT 1 2
            • Performance improvements for CUDA and OpenCL JIT functions.
            • Support for evaluating multiple outputs in a single kernel. See af::array::eval() for more.
          • Random Number Generation 1 2
            • af::randomEngine(): A random engine class to handle setting the type and seed for random number generator engines.
            • Supported engine types are (\ref af_random_engine_type):
          • Graphics 1 2
            • Using Forge v0.9.0
            • Vector Field plotting functionality. 1
            • Removed GLEW and replaced with glbinding.
              • Removed usage of GLEW after support for MX (multithreaded) was dropped in v2.0. 1
            • Multiple overlays on the same window are now possible.
              • Overlays support for same type of object (2D/3D)
              • Supported by af::Window::plot, af::Window::hist, af::Window::surface, af::Window::vectorField.
            • New API to set axes limits for graphs.
              • Draw calls do not automatically compute the limits. This is now under user control.
              • af::Window::setAxesLimits can be used to set axes limits automatically or manually.
              • af::Window::setAxesTitles can be used to set axes titles.
            • New API for plot and scatter:
              • af::Window::plot() and af::Window::scatter() now can handle 2D and 3D and determine appropriate order.
              • af_draw_plot_nd()
              • af_draw_plot_2d()
              • af_draw_plot_3d()
              • af_draw_scatter_nd()
              • af_draw_scatter_2d()
              • af_draw_scatter_3d()
          • New interpolation methods 1
            • Applies to
              • \ref af::resize()
              • \ref af::transform()
              • \ref af::approx1()
              • \ref af::approx2()
          • Support for complex mathematical functions 1
            • Add complex support for \ref trig_mat, \ref af::sqrt(), \ref af::log().
          • af::medfilt1(): Median filter for 1-d signals 1
          • Generalized scan functions: \ref scan_func_scan and \ref scan_func_scanbykey
            • Now supports inclusive or exclusive scans
            • Supports binary operations defined by \ref af_binary_op. 1
          • Image Moments functions 1
          • Add af::getSizeOf() function for \ref af_dtype 1
          • Explicitly extantiate \ref af::array::device() for `void * 1

          Bug Fixes

          • Fixes to edge-cases in \ref morph_mat. 1
          • Makes JIT tree size consistent between devices. 1
          • Delegate higher-dimension in \ref convolve_mat to correct dimensions. 1
          • Indexing fixes with C++11. 1 2
          • Handle empty arrays as inputs in various functions. 1
          • Fix bug when single element input to af::median. 1
          • Fix bug in calculation of time from af::timeit(). 1
          • Fix bug in floating point numbers in af::seq. 1
          • Fixes for OpenCL graphics interop on NVIDIA devices. 1
          • Fix bug when compiling large kernels for AMD devices. 1
          • Fix bug in af::bilateral when shared memory is over the limit. 1
          • Fix bug in kernel header compilation tool bin2cpp1
          • Fix inital values for \ref morph_mat functions. 1
          • Fix bugs in af::homography() CPU and OpenCL kernels. 1
          • Fix bug in CPU TNJ. 1


          • CUDA 8 and compute 6.x(Pascal) support, current installer ships with CUDA 7.5. 1 2 3
          • User controlled FFT plan caching. 1
          • CUDA performance improvements for \ref image_func_wrap, \ref image_func_unwrap and \ref approx_mat. 1
          • Fallback for CUDA-OpenGL interop when no devices does not support OpenGL. 1
          • Additional forms of batching with the \ref transform_func_transform functions. New behavior defined here1
          • Update to OpenCL2 headers. 1
          • Support for integration with external OpenCL contexts. 1
          • Performance improvements to interal copy in CPU Backend. 1
          • Performance improvements to af::select and af::replace CUDA kernels. 1
          • Enable OpenCL-CPU offload by default for devices with Unified Host Memory. 1
            • To disable, use the environment variable AF_OPENCL_CPU_OFFLOAD=0.


          • Compilation speedups. 1
          • Build fixes with MKL. 1
          • Error message when CMake CUDA Compute Detection fails. 1
          • Several CMake build issues with Xcode generator fixed. 1 2
          • Fix multiple OpenCL definitions at link time. 1
          • Fix lapacke detection in CMake. 1
          • Update build tags of
          • Fix builds with GCC 6.1.1 and GCC 5.3.0. 1


          • All installers now ship with ArrayFire libraries build with MKL 2016.
          • All installers now ship with Forge development files and examples included.
          • CUDA Compute 2.0 has been removed from the installers. Please contact us directly if you have a special need.


          • Added example simulating gravity for demonstration of vector field.
          • Improvements to \ref financial/black_scholes_options.cpp example.
          • Improvements to \ref graphics/gravity_sim.cpp example.
          • Fix graphics examples to use af::Window::setAxesLimits and af::Window::setAxesTitles functions.

          Documentation & Licensing

          • ArrayFire copyright and trademark policy
          • Fixed grammar in license.
          • Add license information for glbinding.
          • Remove license infomation for GLEW.
          • Random123 now applies to all backends.
          • Random number functions are now under \ref random_mat.


          The following functions have been deprecated and may be modified or removed
          permanently from future versions of ArrayFire.

          • \ref af::Window::plot3(): Use \ref af::Window::plot instead.
          • \ref af_draw_plot(): Use \ref af_draw_plot_nd or \ref af_draw_plot_2d instead.
          • \ref af_draw_plot3(): Use \ref af_draw_plot_nd or \ref af_draw_plot_3d instead.
          • \ref af::Window::scatter3(): Use \ref af::Window::scatter instead.
          • \ref af_draw_scatter(): Use \ref af_draw_scatter_nd or \ref af_draw_scatter_2d instead.
          • \ref af_draw_scatter3(): Use \ref af_draw_scatter_nd or \ref af_draw_scatter_3d instead.

          Known Issues

          Certain CUDA functions are known to be broken on Tegra K1. The following ArrayFire tests are currently failing:

          • assign_cuda
          • harris_cuda
          • homography_cuda
          • median_cuda
          • orb_cudasort_cuda
          • sort_by_key_cuda
          • sort_index_cuda

          Shehzan Mohammed

          Oct 17, 2016, 9:56:10 AM10/17/16
          to ArrayFire Users,


          The source code with submodules can be downloaded directly from the following link:

          Installer CUDA Version: 8.0 (Required)
          Installer OpenCL Version: 1.2 (Minimum)


          • Installers for Linux, OS X and Windows
            • CUDA backend now uses CUDA 8.0.
            • Uses Intel MKL 2017.
            • CUDA Compute 2.x (Fermi) is no longer compiled into the library.
          • Installer for OS X
            • The libraries shipping in the OS X Installer are now compiled with Apple Clang v7.3.1 (previouly v6.1.0).
            • The OS X version used is 10.11.6 (previously 10.10.5).
          • Installer for Jetson TX1 / Tegra X1
            • Requires JetPack for L4T 2.3 (containing Linux for Tegra r24.2 for TX1).
            • CUDA backend now uses CUDA 8.0 64-bit.
            • Using CUDA's cusolver instead of CPU fallback.
            • Uses OpenBLAS for CPU BLAS.
            • All ArrayFire libraries are now 64-bit.


          • Add sparse array support to af::eval(). 1
          • Add OpenCL-CPU fallback support for sparse af::matmul() when running on a unified memory device. Uses MKL Sparse BLAS.
          • When using CUDA libdevice, pick the correct compute version based on device. 1
          • OpenCL FFT now also supports prime factors 7, 11 and 13. 1 2

          Bug Fixes

          • Allow CUDA libdevice to be detected from custom directory.
          • Fix aarch64 detection on Jetson TX1 64-bit OS. 1
          • Add missing definition of af_set_fft_plan_cache_size in unified backend. 1
          • Fix intial values for af::min() and af::max() operations. 1 2
          • Fix distance calculation in af::nearestNeighbour for CUDA and OpenCL backend. 1 2
          • Fix OpenCL bug where scalars where are passed incorrectly to compile options. 1
          • Fix bug in af::Window::surface() with respect to dimensions and ranges. 1
          • Fix possible double free corruption in af_assign_seq(). 1
          • Add missing eval for key in af::scanByKey in CPU backend. 1
          • Fixed creation of sparse values array using AF_STORAGE_COO. 1 1


          • Add a Conjugate Gradient solver example to demonstrate sparse and dense matrix operations. 1

          CUDA Backend

          • When using CUDA 8.0, compute 2.x are no longer in default compute list.
            • This follows CUDA 8.0 deprecating computes 2.x.
            • Default computes for CUDA 8.0 will be 30, 50, 60.
          • When using CUDA pre-8.0, the default selection remains 20, 30, 50.
          • CUDA backend now uses -arch=sm_30 for PTX compilation as default.
            • Unless compute 2.0 is enabled.

          Known Issues

          • af::lu() on CPU is known to give incorrect results when built run on OS X 10.11 or 10.12 and compiled with Accelerate Framework. 1
            • Since the OS X Installer libraries uses MKL rather than Accelerate Framework, this issue does not affect those libraries.

          Miguel Lloreda

          Dec 28, 2016, 5:31:58 PM12/28/16
          to ArrayFire Users


          The source code with submodules can be downloaded directly from the following link:

          Installer CUDA Version: 8.0 (Required)
          Installer OpenCL Version: 1.2 (Minimum)

          Deprecation Announcement

          This release supports CUDA 6.5 and higher. The next ArrayFire release will
          support CUDA 7.0 and higher, dropping support for CUDA 6.5. Reasons for no
          longer supporting CUDA 6.5 include:

          • CUDA 7.0 NVCC supports the C++11 standard (whereas CUDA 6.5 does not), which is used by ArrayFire's CPU and OpenCL backends.
          • Very few ArrayFire users still use CUDA 6.5.

          As a result, the older Jetson TK1 / Tegra K1 will no longer be supported in
          the next ArrayFire release. The newer Jetson TX1 / Tegra X1 will continue to
          have full capability with ArrayFire.



          • Implemented sparse storage format conversions between AF_STORAGE_CSR and AF_STORAGE_COO. 1
            • Directly convert between AF_STORAGE_COO <--> AF_STORAGE_CSR using the af::sparseConvertTo() function.
            • af::sparseConvertTo() now also supports converting to dense.
          • Added cast support for sparse arrays1
            • Casting only changes the values array and the type. The row and column index arrays are not changed.
          • Reintroduced automated computation of chart axes limits for graphics functions. 1
            • The axes limits will always be the minimum/maximum of the current and new limit.
            • The user can still set limits from API calls. If the user sets a limit from the API call, then the automatic limit setting will be disabled.
          • Using boost::scoped_array instead of boost::scoped_ptr when managing array resources. 1
          • Internal performance improvements to getInfo() by using const references to avoid unnecessary copying of ArrayInfo objects. 1
          • Added support for scalar af::array inputs for af::convolve() and set functions1 2 3
          • Performance fixes in af::fftConvolve() kernels. 1 2


          • Support for Visual Studio 2015 compilation. 1 2
          • Fixed FindCBLAS.cmake when PkgConfig is used. 1

          Bug fixes

          • Fixes to JIT when tree is large. 1 2
          • Fixed indexing bug when converting dense to sparse af::array as AF_STORAGE_COO. 1
          • Fixed af::bilateral() OpenCL kernel compilation on OS X. 1
          • Fixed memory leak in af::regions() (CPU) and af::rgb2ycbcr(). 1 2 3


          • Major OS X installer fixes. 1
            • Fixed installation scripts.
            • Fixed installation symlinks for libraries.
          • Windows installer now ships with more pre-built examples.


          • Added af::choleskyInPlace() calls to cholesky.cpp example. 1


          • Added u8 as supported data type in getting_started.md1
          • Fixed typos. 1

          CUDA 8 on OSX

          Known Issues

          • Known failures with CUDA 6.5. These include all functions that use sorting. As a result, sparse storage format conversion between AF_STORAGE_COO and AF_STORAGE_CSR has been disabled for CUDA 6.5.

          ArrayFire Users

          Apr 22, 2019, 5:22:26 PM4/22/19
          to ArrayFire Users
          v3.6.3 Release

          The source code with sub-modules can be downloaded directly from the following link:


          • Graphics are now a runtime dependency instead of a link time dependency #2365
          • Reduce the CUDA backend binary size using runtime compilation of kernels #2437
          • Improved batched matrix multiplication on the CPU backend by using Intel MKL's cblas_Xgemm_batched#2206
          • Print JIT kernels to disk or stream using the AF_JIT_KERNEL_TRACE environment variable #2404
          • void* pointers are now allowed as arguments to af::array::write() #2367
          • Slightly improve the efficiency of JITed tile operations #2472
          • Make the random number generation on the CPU backend to be consistent with CUDA and OpenCL #2435
          • Handled very large JIT tree generations #2484 #2487

          Bug Fixes

          • Fixed af::array::array_proxy move assignment operator #2479
          • Fixed input array dimensions validation in svdInplace() #2331
          • Fixed the typedef declaration for window resource handle #2357.
          • Increase compatibility with GCC 8 #2379
          • Fixed af::write tests #2380
          • Fixed a bug in broadcast step of 1D exclusive scan #2366
          • Fixed OpenGL related build errors on OSX #2382
          • Fixed multiple array evaluation. Performance improvement. #2384
          • Fixed buffer overflow and expected output of kNN SSD small test #2445
          • Fixed MKL linking order to enable threaded BLAS #2444
          • Added validations for forge module plugin availability before calling resource cleanup #2443
          • Improve compatibility on MSVC toolchain(_MSC_VER > 1914) with the CUDA backend #2443
          • Fixed BLAS gemm func generators for newest MSVC 19 on VS 2017 #2464
          • Fix errors on exits when using the cuda backend with unified #2470


          • Updated svdInplace() documentation following a bugfix #2331
          • Fixed a typo in matrix multiplication documentation #2358
          • Fixed a code snippet demonstrating C-API use #2406
          • Updated hamming matcher implementation limitation #2434
          • Added illustration for the rotate function #2453


          • Use cudaMemcpyAsync instead of cudaMemcpy throughout the codebase #2362
          • Display a more informative error message if CUDA driver is incompatible #2421 #2448
          • Changed forge resource management to use smart pointers #2452
          • Deprecated intl and uintl typedefs in API #2360
          • Enabled graphics by default for all builds starting with v3.6.3 #2365
          • Fixed several warnings #2344 #2356 #2361
          • Refactored initArray() calls to use createEmptyArray(). initArray() is for internal use only by Array class. #2361
          • Refactored void* memory allocations to use unsigned char type #2459
          • Replaced deprecated MKL API with in-house implementations for sparse to sparse/dense conversions #2312
          • Reorganized and fixed some internal backend API #2356
          • Updated compilation order of CUDA files to speed up compile time #2368
          • Removed conditional graphics support builds after enabling runtime loading of graphics dependencies #2365
          • Marked graphics dependencies as optional in CPack RPM config #2365
          • Refactored a sparse arithmetic backend API #2379
          • Fixed const correctness of af_device_array API #2396
          • Update Forge to v1.0.4 #2466
          • Manage Forge resources from the DeviceManager class #2381
          • Fixed non-mkl & non-batch blas upstream call arguments #2401
          • Link MKL with OpenMP instead of TBB by default
          • use clang-format to format source code


          Special thanks to our contributors:
          Alessandro Bessi
          Jacob Khan
          William Tambellini

          Message has been deleted

          Pradeep Garigipati

          Feb 13, 2020, 8:42:04 AM2/13/20
          to ArrayFire Users
          v3.7.0 Release

          The source code with sub-modules can be downloaded directly from the following link: arrayfire-full-3.7.0.tar.bz2

          Major Updates

          • Added the ability to customize the memory manager(Thanks jacobkahn and flashlight) [#2461]
          • Added 16-bit floating point support for several functions [#2413] [#2587] [#2585] [#2587] [#2583]
          • Added sumByKey, productByKey, minByKey, maxByKey, allTrueByKey, anyTrueByKey, countByKey [#2254]
          • Added confidence connected components [#2748]
          • Added neural network based convolution and gradient functions [#2359]
          • Added a padding function [#2682]
          • Added pinverse for pseudo inverse [#2279]
          • Added support for uniform ranges in approx1 and approx2 functions. [#2297]
          • Added support to write to preallocated arrays for some functions [#2599] [#2481] [#2328] [#2327]
          • Added meanvar function [#2258]
          • Add support for sparse-sparse arithmetic support [#2312]
          • Added rsqrt function for reciprocal square root [#2500]
          • Added a lower level af_gemm function for general matrix multiplication [#2481]
          • Added a function to set the cuBLAS math mode for the CUDA backend [#2584]
          • Separate debug symbols into separate files [#2535]
          • Print stacktraces on errors [#2632]
          • Support move constructor for af::array [#2595]
          • Expose events in the public API [#2461]
          • Add setAxesLabelFormat to format labels on graphs [#2495]



          • Fix multi-config generators [#2736]
          • Fix access errors in canny [#2727]
          • Fix segfault in the unified backend if no backends are available [#2720]
          • Fix access errors in scan-by-key [#2693]
          • Fix sobel operator [#2600]
          • Fix an issue with the random number generator and s16 [#2587]
          • Fix issue with boolean product reduction [#2544]
          • Fix array_proxy move constructor [#2537]
          • Fix convolve3 launch configuration [#2519]
          • Fix an issue where the fft function modified the input array [#2520]
          • Added a work around for nvidia-opencl runtime if forge dependencies are missing [#2761]


          Special thanks to our contributors:

          Umar Arshad

          Mar 28, 2020, 2:31:54 PM3/28/20
          to ArrayFire Users

          v3.7.1 Release

          The source code with sub-modules can be downloaded directly from the following link: arrayfire-full-3.7.1.tar.bz2


          • Improve mtx download for test data #2742
          • Improve Documentation #2754 #2792 #2797
          • Remove verbose messages in older CMake versions #2773
          • Reduce binary size with the use of NVRTC #2790
          • Use texture memory to load LUT in orb and fast #2791
          • Add missing print function for f16 #2784
          • Add checks for f16 support in the CUDA backend #2784
          • Create a thrust policy to intercept temporary buffer allocations #2806


          • Fix segfault on exit when ArrayFire is not initialized in the main thread
          • Fix support for CMake 3.5.1 #2771