CuPy v8.0.0b5 をリリースしました

5 views

Skip to first unread message

ecas...@preferred.jp

unread,

Jul 30, 2020, 4:56:35 AM7/30/20

to CuPy Japanese User Group

CuPy v8.0.0b5 をリリースしました！リリースノートは以下の通りです。

This is the release note of v8.0.0b5. See here for the complete list of solved issues and merged PRs.

Highlights

CUB is now bundled with CuPy so that everyone can use it out-of-the-box (thanks @leofang!). This release also introduces a mechanism to enable acceleration using different libraries, CUPY_ACCELERATORS environment variable. You can enable CUB and cuTENSOR by setting export CUPY_ACCELERATORS=cub,cutensor.

The new features include an implementation of the SciPy ndimage filters contributed by @coderforlife and the introduction of the cupy_backends library, used to decouple the CUDA ecosystem APIs from CuPy itself.
Currently, cupy_backends is considered an undocumented API and it is subject to further refactoring. In the meantime, you can still continue to use cupy.cuda.* APIs.

Changes without compatibility

Supported Platform (#3670)

As announced previously, we dropped support for CUDA 8.0 and 9.1. We are also going to drop support for NumPy 1.15 and SciPy 1.2 or earlier in the upcoming release.

CUB (#2584, #3461, #3562)

CUB is now bundled in the source tree. As a consequence, gcc-6 or later is required for the CuPy v8 build. If you are building CuPy from source on systems with legacy gcc, follow the instructions below. These steps are not necessary for general users using wheel packages.

### Ubuntu 16
$ sudo add-apt-repository ppa:ubuntu-toolchain-r/test
$ sudo apt-get update
$ sudo apt-get install g++-6
$ export NVCC="nvcc --compiler-bindir gcc-6"

### CentOS 6 and 7:
$ sudo yum install centos-release-scl
$ sudo yum install devtoolset-7-gcc-c++
$ source /opt/rh/devtoolset-7/enable

CUB-related environment variables (CUB_PATH, CUB_DISABLED) are no longer effective. You need to enable CUB by setting CUPY_ACCELERATORS=cub environment variable to boost reduction kernels and several functions such as min, max, sum, and scan.

cuTENSOR (#3592)

In response to the introduction of CUPY_ACCELERATORS, you need to explicitly specify the option CUPY_ACCELERATORS=cutensor to enable cuTENSOR.

Others

Avoid early compilation when initializing a RawModule instance (#3534)
Remove CHAINER_SEED (#3674)
Remove sum_duplicate parameter in sparse min/max/argmin/argmax (#3676)

New Features

Support multistage reduction and indexing in cupy.fuse (#2734, thanks @xuzijian629!)
Implementation of ndimage filters (#3184, thanks @coderforlife!)
Add cupy.convolve (#3371, thanks @Dahlia-Chehata!)
Move CUDA low-level API to cupy_backends namespace (#3386)
Add choose_conv_method (#3464, thanks @Dahlia-Chehata!)
Add cupy.poly1d (#3466, thanks @Dahlia-Chehata!)
Sparse mean (#3487, thanks @cjnolet!)
Add support for cusolverDn<t>syevj and cusolverDn<t>syevjBatched (#3488, thanks @dmargala!)
ndimage rank-based filters (#3500, thanks @coderforlife!)
ndimage common linear filters (#3505, thanks @coderforlife!)
Implement flatiter.__iter__() (#3508)
Implement has_sorted_indices, has_canonical_format, sort(ed)_indices() for sparse matrices (#3509)
Add multi-gpu support to time (#3519)
Add cupy.correlate (#3525, thanks @Dahlia-Chehata!)
Add cupyx.scipy.sparse.kron() (#3528)
Support ncclSend / ncclRecv from NCCL 2.7 (#3567)
Add cupyx.scipy.fft.next_fast_len (#3571)
ndimage generic filters (#3614, thanks @coderforlife!)
Support CSR matrix multiply (#3647)
Support CSR matrix division (#3680)

Enhancements

Build the cupy.cuda.cub module by default (#2584)
Expose cuda IPC runtime calls (#3290)
Merge CUPY_CUB_BLOCK_REDUCTION_DISABLED and CUB_DISABLED (#3461)
Support CUB histogram (#3473)
Support cuTENSOR 1.1 (#3477)
Added functionality to print nvcc and nvrtc output (#3485, thanks @mnicely!)
Support axis=None in sparse min/max (#3515)
Small fixes for CUB block reduction kernels (#3520)
Avoid early compilation when initializing a RawModule instance (#3534)
Improve _prepare_mask_indexing_single (#3539)
Support batched slogdet with complex numbers (#3551, thanks @yoshipon!)
Fix hip header files (#3566)
Remove compute_30 when CUDA 11 (#3578)
Change einsum not to use cuTENSOR when accelerator is not set (#3592)
Update CUDA 11.0 FP16 header to production release version (11.0.2) (#3668)
Drop support for CUDA 8.0 and 9.1 (#3670)
Remove CHAINER_SEED (#3674)

Performance Improvements

Use cuTENSOR in cupy.sum (#2939)
Reduce numpy.ndarray creation in cuTENSOR operation preparation (#3393)
Improve scan operation (#3540)
Improve _ArgInfo init (#3549)
Fix small performance issue (#3550)
Improve _fft_convolve (#3560)
Reduce device synchronization in poly1d instantiation (#3563, thanks @Dahlia-Chehata!)
Reuse FFT plan for convolve/correlate (#3587)
Improve efficiency of cupy.fft.fftfreq and cupy.fft.rfftfreq (#3653, thanks @grlee77!)

Bug Fixes

Fix cupyx.scipy.ndimage.sum taking zero-dimensional input (#3425)
Use CUSPARSE_VERSION instead of CUDA_VERSION (#3491)
Fix sparse min/max to return sparse matrix (#3536)
Fix boolean indexing (#3538)
Support 0-size ndarray and fix possible error in __del__ at fft (#3543)
Fix cupy.percentile type assignment in asarray (#3570)
Fix array creation for ndarray list of arrays of different dtypes (#3605)
Change sorting order of COO sparse matrix for cuSPARSE (#3620)
Add __name__ to custom kernels (#3626)
Fix sparse argmin/argmax return shape (#3639)
Fix missing imports and cupy.show_config (#3642)
Fix sparse matrix related test failures on CUDA 11 (#3649)
Fix error message broken (#3669)
Remove sum_duplicate parameter in sparse min/max/argmin/argmax (#3676)
Fix broken imports for cupy.cuda.* (#3685)
Fix Windows build failure of cuSparse generic API (#3690)
Fix compile option on HIP environment (#3604)

Code Fixes

Use .data() for std::vector (#3022)
Add short comments for the internals (#3475)
Use absolute import (#3496)
Make type dispatcher from cupy.cuda.cub reusable (#3546)
Clean up CUB-related stuff (#3562)
Suppress compile warnings (#3573)
Remove unused descriptor definition (#3594)

Documentation

Add sample code for image resizing (#3559, thanks @pmixer!)
Update documentation of CUPY_ACCELERATORS (#3596)
Update url and email (#3608)
Add a warning for sum_duplicates (#3624)
Remove Chainer related docs (#3673)

Installation

Add missing cupy_cub.cu in package data (#3572)
Fix rpath for wheel build (#3689)

Tests

Test against scipy.fft when available (#3032)
Add tests for _cub_reduction (#3462)
Add mock tests to ensure cupy.cuda.cub is used (#3467)
Fix to set testing.slow correctly (#3501)
Check NumPy compatibility in flatiter tests (#3514)
Fix slogdet tests to check dtypes of return values (#3577)
Fix negative value test in test_helper (#3579)
Deprecate numpy_cupy_array_list_equal (#3582)
Use numpy_cupy_array_equal instead of numpy_cupy_array_list_equal (#3599)
Checks return types in testing.numpy_cupy_* (#3621)
Add tests for sparse max with axis=None (#3638)
Parameterize sparse min/max/argmin/argmax tests (#3656)
Expose accelerator internal API to one level up (#3664)

Others

Fix to raise ValueError for invalid order (#3498)
Fix to raise ValueError for invalid clipmode (#3499)
Fix to raise TypeError for invalid subscripts in einsum (#3502)
Use builtins directly (#3651, thanks @larsoner!)
Add link to Twitter account (#3529)
Update style checker version for Python 3.7 (#3585)
Bump version to v8.0.0b5 (#3687)

Reply all

Reply to author

Forward

0 new messages