CuPy v8.0.0a1 をリリースしました

5 views
Skip to first unread message

ecas...@preferred.jp

unread,
Feb 14, 2020, 1:41:12 AM2/14/20
to CuPy Japanese User Group
CuPy v8.0.0a1 をリリースしました! リリースノートは以下の通りです。

This is the release note of v8.0.0a1. See here for the complete list of solved issues and merged PRs.

Known packaging issues:

  • Wheel packages for CUDA 10.2 (cupy-cuda102) are currently unavailable on PyPI. Packages will be published after getting approval of the file size limit increase.
  • CuPy build fails when using CUDA 8.0 on Windows (#3076). Due to this issue, cupy-cuda80 wheel packages for Windows are unavailable for this version. Linux or CUDA 9.0+ users are unaffected.

Highlights

This release adds support for CUDA 10.2 and NumPy 1.18.
CuPy 8.0.0a1 comes with several exciting new features such as better sparse matrix support, and for users who like to write their own CUDA kernels, there is the possibility of using grid synchronization in RawKernel and RawModule and allow tuning the block size for ElementwiseKernels. There are some noticeable performance improvements as well thanks to the extended support of CUB in several CuPy functions.

Changes without compatibility

  • update slicing of CSR and CSC matrices for compatibility with SciPy 1.4.0 (#2776)
    • Fixed to follow Scipy returns empty slices are returned for such cases.
  • Separate code and path arguments in RawModule (#2784)
  • Avoid device synchronization in cupy.allclose (#2799)
    • Changed cupy.isclose to return a 0-dim cupy.ndarray instead of a float value to avoid device synchronization.
  • Remove dtype argument from min/max (#2875)
  • Rename arg of isscalar (#2974)
    • Renamed the argument of cupy.isscalar to element, previously named as num.

New Features

  • Added min, max, argmin, argmax to sparse csr and csc matrices (#2711, thanks @dloney!)
  • Add helpers to measure execution times (#2740)
  • Add digitize (#2758)
  • Support loading PTX in cupy.RawModule (#2782, thanks @leofang!)
  • Fix cupyx.scipy.ndimage.map_coordinates for cases with coords > 2d (#2813, thanks @grlee77!)
  • Detect synchronization (#2819)
  • Add ptp ndarray method and function (#2859, thanks @grlee77!)
  • Add convex analysis ufuncs to cupyx.scipy.special (#2861, thanks @grlee77!)
  • Allow ElementwiseKernel to set the block_size (#2914)
  • Support grid synchronization in RawKernel and RawModule (#2925)
  • Add cupy.conjugate and make cupy.conj its alias (#2982)
  • Add a keyword-only plan argument to cupyx.scipy.fft.* (#2998, thanks @leofang!)

Enhancements

  • Support sorting complex arrays (#2745, thanks @leofang!)
  • Fix slow import of cupy (#2759, thanks @cgohlke!)
  • update slicing of CSR and CSC matrices for compatibility with SciPy 1.4.0 (#2776, thanks @grlee77!)
  • Add nogil to CUB (#2787, thanks @y1r!)
  • Avoid device synchronization in cupy.allclose (#2799)
  • Skip zero valued coefficients in cupyx.scipy.ndimage.convolve (#2846, thanks @grlee77!)
  • Add CUB reduction support to mean (#2860, thanks @grlee77!)
  • Sort type map in _kernel.pyx (#2881)
  • Make test helper decorators pdb-friendly (#2888)
  • Declare device synchronization at runtime.free() (#2898)
  • Ignore error when peer access is already enabled (#2901, thanks @leofang!)
  • Add CUDA 10.2 support (#2910, thanks @ksangeek!)
  • Show warning for cuFFT bug in irfftn (#2922)
  • Use cuTensor for einsum (#2928)
  • Improve error message for wrong number of arguments in elementwise kernels (#2932)
  • Use asynchronous copy in cupy.copyto (#2942)
  • MemoryPointer.__repr__ (#2981)
  • Allow multiple axes in expand_dims (#2992)
  • Check size before accesing empty vectors data ptr (#3025)
  • Improve compatibility of random.randint (#2828)
  • Support 64 bit extent randint (#2829)
  • Disallow boolean subtraction (#2874)
  • Remove dtype argument from min/max (#2875)
  • Fix handling of dtypes in cupy.mean (#2903, thanks @grlee77!)
  • Disallow boolean negative (#2973)
  • Rename arg of isscalar (#2974)
  • Fix linspace(..., num=1, endpoint=False, retstep=True) (#2975)

Performance Improvements

  • Avoid numpy.can_cast call to improve guess routine (#2673)
  • Improve caching in ElementwiseKernel (#2688)
  • Remove memory copy to improve memory range checking (#2699)
  • Avoid can_cast calling to reduce overhead (#2704)
  • Use getrfBatched in linalg.slogdet (#2735)
  • reduce overhead in calls to multi-dimensional FFTs. (#2746, thanks @grlee77!)
  • Allow squashing f-contiguous axes for faster reduction (#2822)
  • Support CUB prefix sum & product (#2919, thanks @leofang!)
  • Improve performance of element-wise einsum where no contraction is necessary (#2960)

Bug Fixes

  • Fix true_divide with dtype argument (#2076)
  • keepdims should always preserve all dimensions in CUB-based reductions (#2725, thanks @grlee77!)
  • Update thrust::complex headers with a bug fix (#2741, thanks @leofang!)
  • Separate code and path arguments in RawModule (#2784)
  • Avoid looking up null pointers' attributes (#2802, thanks @leofang!)
  • Fix range used in cupyx.scipy.ndimage filter origin check (#2805, thanks @grlee77!)
  • Detect interpreter shutdown for proper __del__ behavior (#2809)
  • Fix split and array_split with indices overrun (#2814)
  • Fix split and array_split with unordered indices supplied (#2815)
  • Fix compilation error causes when thrust is enabled (#2838)
  • Fix testing.shaped_random for shape () (#2870)
  • Fix argmin/argmax dtype argument (#2872)
  • Fix imag for 0-size array (#2886)
  • Fix logic to check explicit size argument in ElementwiseKernel (#2909)
  • Sets the default value for thread_local.linalg if not defined (#2915)
  • Fix cupy.cuda.cub.device_segmented_reduce() not being used (#2921, thanks @leofang!)
  • Fix complex type checks in _correlate_or_convolve (#2923)
  • Fix ParameterInfo as a cache key (#2941)
  • Avoid invalid in-place division in CUB-based mean (#2943, thanks @grlee77!)
  • Fix empty vector access (#3020)
  • Fix nvcc command lookup (#3028)

Code Fixes

  • Use intptr_t for cuSOLVER handles (#2718)
  • Merge reduction implementations (#2732)
  • Rename and reorder private functions in reduction.pxi (#2767)
  • Avoid using PyThread API (#2769)
  • Remove unused cuParamSetTexRef() (#2770, thanks @leofang!)
  • Separate reduction code from _kernel.pyx (#2785)
  • Refactor reduction code (#2801)
  • Refactor ops (#2817)
  • Separate CArray and family from core.pyx (#2831)
  • Add missing blank lines (#2887)
  • Readability fix in memory.pyx (#2899)
  • Clean up _scalar.pyx (#2917)
  • Enhance type and argument manipulation in elementwise and reduction kernels (#2940)
  • Remove intermediate aliases of cupy.sort (#2944, thanks @rushabh-v!)
  • Silence sign comparison warnings (#2949, thanks @leofang!)
  • Fix typos in comments (#2978)
  • Remove dependency to six (#2980)
  • A nit-picking code fix (#2988)
  • Rename _op variable in cub.pyx (#3002)
  • Remove code paths for unsupported Python versions (#3004)

Documentation

  • Fix docs of options argument in RawKernel and RawModule (#2643)
  • Document device synchronization (#2798)
  • Fix typo in scipy.fft docs (#2804, thanks @grlee77!)
  • Fix the docstring format of cupy.asarray (#2821, thanks @leofang!)
  • Update cuTENSOR version in docs (#2948)
  • Document get_allocator function (#2953, thanks @jakirkham!)
  • Add NumPy 1.18 to installation guide (#3005)
  • Fix typo in note (#3012, thanks @Schoyen!)
  • Add cupy-cuda102 (#3057)

Installation

  • Do not let Python 2 users build CuPy v7+ (#2766, thanks @leofang!)
  • Fix an issue that cuComplex_bridge.h is not installed (#2984)
  • Fix ROCm build errors (#3071)

Examples

  • Fix GMM example for matplotlib 3 (#2996)
  • Use cupy.random in kmeans example (#3026)

Tests

  • Test cuTENSOR v1.0.0 (#2727)
  • Use more stable input to test linalg.matrix_power (#2788)
  • Remove Python 3.4 matrix from Travis CI (#2794)
  • Drop ChainerCV's test in master branch. (#2803)
  • Refactor array testing decorators (#2818)
  • Fix decorator usage in tests (#2820)
  • Add f-contiguous reduction tests (#2830)
  • Test ifloordiv with numpy 1.18 (#2852)
  • Fix test_helper.py for NumPy 1.18 (#2883)
  • Avoid 0s in the diagonal of TestSolveTriangular inputs (#2927)
  • Add tests for size argument with no input (#2931)
  • Print installed packages in pytest (#2979)
  • Make testing.parameterize pdb-friendly (#3024)
  • Require scipy in test_gmm (#3048)

Others

  • Allow install without thrust (#2730)
  • Add Mergify configuration file (#2894)
  • Make cupyx.time.repeat experimental (#2897)
  • Make cupyx.allow_synchronize experimental (#2947)
  • Some fixes to .pfnci/script.sh (#3041)
  • Set CUPY_CI environment variable in Travis CI and AppVeyor (#3058)
  • Bump version to v8.0.0a1 (#3069)
Reply all
Reply to author
Forward
0 new messages