It would be helpful to know the arguments passed to cublasSgemm at the time of the failure.
Can you attach a debugger and find out, or is it not easily reproducible?
Thanks,
Cliff
--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/2c3aa2fb-271e-47f2-9609-4f4820ee4442%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
[17938.298821] NVRM: Xid(PCI:0000:01:00): 31, Ch 00000002, engmask 00000101, intr 10000000
^[[2~F1021 14:40:44.983147 12312 math_functions.cu:32] Check failed: status == CUBLAS_STATUS_SUCCESS(13 vs. 0) CUBLAS_STATUS_EXECUTION_FAILED
cuTransB:1. cuTransA:0, N:363, M:96, K:2809, alpha:1, B:0x4049c0000, ldb:2809, A:0x42c7e7580, lda:2809, beta:1, C:0x404e12200, N:363void caffe_gpu_gemm<float>(const CBLAS_TRANSPOSE TransA, const CBLAS_TRANSPOSE TransB, const int M, const int N, const int K, const float alpha, const float* A, const float* B, const float beta, float* C) { // Note that cublas follows fortran order. int lda = (TransA == CblasNoTrans) ? K : M; int ldb = (TransB == CblasNoTrans) ? N : K; cublasOperation_t cuTransA = (TransA == CblasNoTrans) ? CUBLAS_OP_N : CUBLAS_OP_T; cublasOperation_t cuTransB = (TransB == CblasNoTrans) ? CUBLAS_OP_N : CUBLAS_OP_T; // CUBLAS_CHECK(cublasSgemm(Caffe::cublas_handle(), cuTransB, cuTransA, // N, M, K, &alpha, B, ldb, A, lda, &beta, C, N)); cublasStatus_t status = cublasSgemm(Caffe::cublas_handle(), cuTransB, cuTransA, N, M, K, &alpha, B, ldb, A, lda, &beta, C, N); CHECK_EQ(status, CUBLAS_STATUS_SUCCESS) << caffe::cublasGetErrorString(status) << "\ncuTransB:" << cuTransB << ", cuTransA:" << cuTransA << ", N:" << N << ", M:" << M << ", K:" << K << ", alpha:" << alpha << ", B:" << B << ", ldb:"<< ldb << ", A:" << A << ", lda:" << lda << ", beta:" << beta << ", C:" << C << ", N:" << N;}This event is logged when a fault is reported by the MMU, such as when an illegal address access is made by an applicable unit on the chip. Typically these are application-level bugs, but can also be driver bugs or hardware bugs.
When this event is logged, NVIDIA recommends the following:
Note: The cuda-memcheck tool instruments the running application and reports which line of code performed the illegal read. |
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/ff0eb646-b950-43eb-a115-b62ddb897181%40googlegroups.com.
You shouldn't have to run Caffe (or any other regular CUDA app, for that matter) as root to get it to work. If you do, that sounds like a bug of some kind.
--Cliff
--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/d1d4288c-75e6-4c3c-ae81-5180ccbc8f73%40googlegroups.com.