CuPy v8.0.0a1 をリリースしました! リリースノートは以下の通りです。
This is the release note of v8.0.0a1. See here for the complete list of solved issues and merged PRs.
Known packaging issues:
- Wheel packages for CUDA 10.2 (
cupy-cuda102) are currently unavailable on PyPI. Packages will be published after getting approval of the file size limit increase. - CuPy build fails when using CUDA 8.0 on Windows (#3076). Due to this issue,
cupy-cuda80 wheel packages for Windows are unavailable for this version. Linux or CUDA 9.0+ users are unaffected.
Highlights
This release adds support for CUDA 10.2 and NumPy 1.18.
CuPy 8.0.0a1 comes with several exciting new features such as better sparse matrix support, and for users who like to write their own CUDA kernels, there is the possibility of using grid synchronization in RawKernel and RawModule and allow tuning the block size for ElementwiseKernels. There are some noticeable performance improvements as well thanks to the extended support of CUB in several CuPy functions.
Changes without compatibility
- update slicing of CSR and CSC matrices for compatibility with SciPy 1.4.0 (#2776)
- Fixed to follow Scipy returns empty slices are returned for such cases.
- Separate code and path arguments in
RawModule (#2784) - Avoid device synchronization in
cupy.allclose (#2799)- Changed
cupy.isclose to return a 0-dim cupy.ndarray instead of a float value to avoid device synchronization.
- Remove
dtype argument from min/max (#2875) - Rename arg of
isscalar (#2974)- Renamed the argument of
cupy.isscalar to element, previously named as num.
New Features
- Added min, max, argmin, argmax to sparse csr and csc matrices (#2711, thanks @dloney!)
- Add helpers to measure execution times (#2740)
- Add
digitize (#2758) - Support loading PTX in
cupy.RawModule (#2782, thanks @leofang!) - Fix
cupyx.scipy.ndimage.map_coordinates for cases with coords > 2d (#2813, thanks @grlee77!) - Detect synchronization (#2819)
- Add
ptp ndarray method and function (#2859, thanks @grlee77!) - Add convex analysis ufuncs to
cupyx.scipy.special (#2861, thanks @grlee77!) - Allow
ElementwiseKernel to set the block_size (#2914) - Support grid synchronization in
RawKernel and RawModule (#2925) - Add
cupy.conjugate and make cupy.conj its alias (#2982) - Add a keyword-only
plan argument to cupyx.scipy.fft.* (#2998, thanks @leofang!)
Enhancements
- Support sorting complex arrays (#2745, thanks @leofang!)
- Fix slow import of cupy (#2759, thanks @cgohlke!)
- update slicing of CSR and CSC matrices for compatibility with SciPy 1.4.0 (#2776, thanks @grlee77!)
- Add
nogil to CUB (#2787, thanks @y1r!) - Avoid device synchronization in
cupy.allclose (#2799) - Skip zero valued coefficients in cupyx.scipy.ndimage.convolve (#2846, thanks @grlee77!)
- Add CUB reduction support to
mean (#2860, thanks @grlee77!) - Sort type map in
_kernel.pyx (#2881) - Make test helper decorators pdb-friendly (#2888)
- Declare device synchronization at
runtime.free() (#2898) - Ignore error when peer access is already enabled (#2901, thanks @leofang!)
- Add CUDA 10.2 support (#2910, thanks @ksangeek!)
- Show warning for cuFFT bug in
irfftn (#2922) - Use cuTensor for
einsum (#2928) - Improve error message for wrong number of arguments in elementwise kernels (#2932)
- Use asynchronous copy in
cupy.copyto (#2942) MemoryPointer.__repr__ (#2981)- Allow multiple axes in
expand_dims (#2992) - Check size before accesing empty vectors data ptr (#3025)
- Improve compatibility of
random.randint (#2828) - Support 64 bit extent
randint (#2829) - Disallow boolean subtraction (#2874)
- Remove
dtype argument from min/max (#2875) - Fix handling of dtypes in
cupy.mean (#2903, thanks @grlee77!) - Disallow boolean
negative (#2973) - Rename arg of
isscalar (#2974) - Fix
linspace(..., num=1, endpoint=False, retstep=True) (#2975)
Performance Improvements
- Avoid
numpy.can_cast call to improve guess routine (#2673) - Improve caching in
ElementwiseKernel (#2688) - Remove memory copy to improve memory range checking (#2699)
- Avoid
can_cast calling to reduce overhead (#2704) - Use
getrfBatched in linalg.slogdet (#2735) - reduce overhead in calls to multi-dimensional FFTs. (#2746, thanks @grlee77!)
- Allow squashing f-contiguous axes for faster reduction (#2822)
- Support CUB prefix sum & product (#2919, thanks @leofang!)
- Improve performance of element-wise
einsum where no contraction is necessary (#2960)
Bug Fixes
- Fix
true_divide with dtype argument (#2076) keepdims should always preserve all dimensions in CUB-based reductions (#2725, thanks @grlee77!)- Update thrust::complex headers with a bug fix (#2741, thanks @leofang!)
- Separate code and path arguments in
RawModule (#2784) - Avoid looking up null pointers' attributes (#2802, thanks @leofang!)
- Fix range used in
cupyx.scipy.ndimage filter origin check (#2805, thanks @grlee77!) - Detect interpreter shutdown for proper
__del__ behavior (#2809) - Fix
split and array_split with indices overrun (#2814) - Fix
split and array_split with unordered indices supplied (#2815) - Fix compilation error causes when thrust is enabled (#2838)
- Fix
testing.shaped_random for shape () (#2870) - Fix
argmin/argmax dtype argument (#2872) - Fix
imag for 0-size array (#2886) - Fix logic to check explicit
size argument in ElementwiseKernel (#2909) - Sets the default value for
thread_local.linalg if not defined (#2915) - Fix
cupy.cuda.cub.device_segmented_reduce() not being used (#2921, thanks @leofang!) - Fix complex type checks in
_correlate_or_convolve (#2923) - Fix
ParameterInfo as a cache key (#2941) - Avoid invalid in-place division in CUB-based mean (#2943, thanks @grlee77!)
- Fix empty vector access (#3020)
- Fix
nvcc command lookup (#3028)
Code Fixes
- Use
intptr_t for cuSOLVER handles (#2718) - Merge reduction implementations (#2732)
- Rename and reorder private functions in
reduction.pxi (#2767) - Avoid using PyThread API (#2769)
- Remove unused
cuParamSetTexRef() (#2770, thanks @leofang!) - Separate reduction code from
_kernel.pyx (#2785) - Refactor reduction code (#2801)
- Refactor ops (#2817)
- Separate
CArray and family from core.pyx (#2831) - Add missing blank lines (#2887)
- Readability fix in
memory.pyx (#2899) - Clean up
_scalar.pyx (#2917) - Enhance type and argument manipulation in elementwise and reduction kernels (#2940)
- Remove intermediate aliases of
cupy.sort (#2944, thanks @rushabh-v!) - Silence sign comparison warnings (#2949, thanks @leofang!)
- Fix typos in comments (#2978)
- Remove dependency to six (#2980)
- A nit-picking code fix (#2988)
- Rename
_op variable in cub.pyx (#3002) - Remove code paths for unsupported Python versions (#3004)
Documentation
- Fix docs of options argument in
RawKernel and RawModule (#2643) - Document device synchronization (#2798)
- Fix typo in scipy.fft docs (#2804, thanks @grlee77!)
- Fix the docstring format of
cupy.asarray (#2821, thanks @leofang!) - Update cuTENSOR version in docs (#2948)
- Document
get_allocator function (#2953, thanks @jakirkham!) - Add NumPy 1.18 to installation guide (#3005)
- Fix typo in note (#3012, thanks @Schoyen!)
- Add
cupy-cuda102 (#3057)
Installation
- Do not let Python 2 users build CuPy v7+ (#2766, thanks @leofang!)
- Fix an issue that
cuComplex_bridge.h is not installed (#2984) - Fix ROCm build errors (#3071)
Examples
- Fix GMM example for matplotlib 3 (#2996)
- Use
cupy.random in kmeans example (#3026)
Tests
- Test cuTENSOR v1.0.0 (#2727)
- Use more stable input to test
linalg.matrix_power (#2788) - Remove Python 3.4 matrix from Travis CI (#2794)
- Drop ChainerCV's test in master branch. (#2803)
- Refactor array testing decorators (#2818)
- Fix decorator usage in tests (#2820)
- Add f-contiguous reduction tests (#2830)
- Test
ifloordiv with numpy 1.18 (#2852) - Fix
test_helper.py for NumPy 1.18 (#2883) - Avoid 0s in the diagonal of
TestSolveTriangular inputs (#2927) - Add tests for size argument with no input (#2931)
- Print installed packages in pytest (#2979)
- Make
testing.parameterize pdb-friendly (#3024) - Require
scipy in test_gmm (#3048)
Others
- Allow install without thrust (#2730)
- Add Mergify configuration file (#2894)
- Make
cupyx.time.repeat experimental (#2897) - Make
cupyx.allow_synchronize experimental (#2947) - Some fixes to
.pfnci/script.sh (#3041) - Set
CUPY_CI environment variable in Travis CI and AppVeyor (#3058) - Bump version to v8.0.0a1 (#3069)