CuPy v8.0.0 をリリースしました

14 views

Skip to first unread message

Emilio Castillo

unread,

Oct 1, 2020, 3:07:01 AM10/1/20

to CuPy Japanese User Group

CuPy v8.0.0 をリリースしました！リリースノートは以下の通りです。

Highlights

The CuPy v8.0.0 release includes a number of new features, as well as enhanced NumPy/SciPy functionality coverage.

TensorFloat-32 (TF32) Support
- CuPy now supports TensorFloat-32, a new feature available in NVIDIA Ampere GPU and CUDA 11. Set CUPY_TF32=1 environment variable to boost the performance of matrix multiplications in routines such as cupy.matmul or cupy.tensordot.
Official support for NVIDIA cuTENSOR and CUB libraries
- Several routines in CuPy now support using the cuTENSOR and CUB libraries to further improve performance. Set CUPY_ACCELERATORS=cub,cutensor environment variable to benefit from these libraries.
Enhanced kernel fusion
- While combining multiple kernels into a single one using cupy.fuse, it was only possible to use a single reduction operation (cupy.sum, etc.) at the end. With the new kernel fusion mechanism available in CuPy v8, now it is possible to combine multiple element-wise operations with interleaved reductions.
Automatic tuning of kernel launch parameters
- CuPy now supports discovering the optimal CUDA kernel launch parameters depending on the data and device properties for better performance. See the API reference (cupyx.optimizing.optimize) for details.
Memory pool sharing with external libraries
- With the new PythonFunctionAllocator API, you can let CuPy use arbitrary Python functions instead of a built-in memory pool when managing GPU memory. This improves interoperability with external libraries; for example, you can flexibly use CuPy to preprocess data or use its custom CUDA kernel features inside PyTorch. With pytorch-pfn-extras bundled allocator it is possible to easily use the PyTorch memory pool from CuPy.
Improved NumPy/SciPy function coverage
- Many functions added, including the NumPy Polynomials package (results of Google Summer of Code 2020, thanks @Dahlia-Chehata!), the SciPy image processing package, and extended support for the SciPy sparse matrices package.

For the list of all backward-incompatible changes in v8, please refer to the Upgrade Guide.

Notes on Wheel Packages

CuPy for CUDA 10.1 (cupy-cuda101), 10.2 (cupy-cuda102), and 11.0 (cupy-cuda110) packages are built with cuDNN v8 support but without bundled cuDNN shared libraries (see #3724 for the discussion). To use cuDNN features, You need to download cuDNN library using the following command: python -m cupyx.tools.install_library --library cudnn --cuda X.X. It is also possible to install cuDNN v8.0.x via the system package manager (e.g., apt install libcudnn8 or yum install libcudnn8) or manually install it and set LD_LIBRARY_PATH environment variables.

Changes since v8.0.0rc1

See here for the complete list of merged PRs after v8.0.0rc1 release. For all changes since v7 series, please refer to the release notes of the pre-releases (alpha1, beta1, beta2, beta3, beta4, beta5, rc1).

Highlights

Add a cache to reuse FFT plans that greatly improves CPU time. (thanks @leofang!)
Support for cuTENSOR 1.2 and acceleration of cupy.prod, cupy.max, cupy.min, cupy.ptp and cupy.mean by means of CUPY_ACCELERATORS
Sparse matrices support greatly improved with the addition of new operators and the possibility of setting items.

New Features

Support sparse matrix pointwise maximum and minimum (#3943)
Support sparse matrix pointwise division by vectors or matrices (#3964)
Add cupy.testing.shaped_sparse_random (#3976)
Add compressed sparse __setitem__ (#3998)
Add sparse.linalg.norm (#4040)
Add cuTENSOR 1.2 support (#3970)
Add a cuFFT plan cache (#4010)

Enhancements

Update FP16 header to CUDA 11.0 Update 1 (11.0.3) (#3986)
Bump cuDNN version to v8.0.3 (#3996)

Performance Improvements

Use _csr_row_index for CSR matrix major-axis slicing with step (#3898)
Improve CSR matrix column fancy indexing (#3960)
Improve cupyx.scipy.sparse int x int indexing (#4003)
Avoid using CUlinkState unless absolutely necessary (#4016)
Use cuTENSOR in cupy.prod, cupy.max, cupy.min, cupy.ptp and cupy.mean (#4046)

Bug Fixes

Fix dtype in CSR matrix division (#3924)
Fix _compressed_sparse_matrix._minor_slice for step > 1 case (#3952)
Fix csr_matrix._get_intXslice for step < 0 case (#3957)
Handle transfer to cupy view (#3962)
Fix sparse.__getitem__ not to return view of input (#3993)
Fix managed memory leak (#4032)
Use __dealloc__ instead of __del__ for cdef class (#4037)

Code Fixes

Rename cupyx.scatter submodule (#3921)
Hide private names in cupyx/scipy/__init__.py (#3923)
Rename submodule under cupyx.scipy.fftpack (#3926)
Refactor CSR sparse matrix row fancy indexing (#3930)
Rename cupyx.runtime submodule (#3937)
Rename cupy.util submodule to cupy._util (#3938)
Rename cupy.statistics submodule to cupy._statistics (#3939)
Rename submodule under cupy.prof package (#3940)
Hide private names in cupyx.time (#3990)
Hide private names in cupy.cusparse (#4005)
Rename cupy.math submodule to cupy._math (#4028)
Hide private names in cupy.cudnn (#4029)
Rename cupy.logic submodule to cupy._logic (#4030)
Hide private names in cupy/__init__.py (#4039)

Documentation

Add cupy.searchsorted to doc (#3925)
Update cupyx.scipy API documentation (#3997)

Tests

Fix test fail when cuDNN is unavailable (#3910)
Fix 32-bit boundary test to run on Windows (#3913)
Add v8 to list of known branch in FlexCI script (#3914)
Fix side effects in some tests (#3953)
Fix some test to check compatibility with SciPy's behavior (#3956)
Refactor sparse indexing tests (#3977)
Fix cupy.ndim test style (#4034)
Fix bugs and test suites to make ROCm/HIP happy - Part 1 (#3929)

Others

Disable GitHub checks annotations of Codecov (#4022)
Bump version to v8.0.0 (#4049)

Contributors

The CuPy Team would like to thank all those who contributed to this release!

@anaruse @cjnolet @grlee77 @kalvdans @leofang @saswatpp

Reply all

Reply to author

Forward

0 new messages