run cudadecoder on Jetson Nano

Gary

unread,

Mar 4, 2021, 12:33:12 PM3/4/21

to kaldi-help

nano is aarch64 platform

when i firstly run cudadecoder on nano , it showed the error

ERROR ([5.5.776~1-fcc6a3]:SynchronizeGpu():cudamatrix/cu-device.cc:629) cudaError_t 713 : "pointer does not correspond to a registered memory region" returned from 'cudaGetLastError()'

someone said

cudaHostRegister() is not supported on ARM platforms.
This is because the caching attribute of an existing allocation can't be changed on the fly.

If required, please use cudaHostAlloc() with the flag cudaHostAllocMapped to allocate device-mapped host-accessible memory.

reference https://devtalk.nvidia.com/default/topic/1032259/cudaerrornotsupported-when-calling-cv-cuda-cudahostregister-on-nvidia-tx2/

I modified the code cudadecoder/batched-threaded-nnet3-cuda-pipeline.cc in red color

void BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures(

int32 first, std::vector<TaskState *> &tasks,

OnlineCudaFeaturePipeline &feature_pipeline) {

KALDI_ASSERT(config_.gpu_feature_extract == true);

nvtxRangePushA("CopyBatchWaves");

// below we will pack waves into a single buffer for efficient transfer

// across device

// first count the total number of elements and create a single large

// vector

int count = 0;

for (int i = first; i < tasks.size(); i++) {

count += tasks[i]->task_data->wave_samples->Dim();

}

// creating a thread local vector of pinned memory.

// wave data will be stagged through this memory to get

// more efficient non-blocking transfers to the device.

thread_local Vector<BaseFloat> pinned_vector;

if (pinned_vector.Dim() < count) {

// WAR: Not pinning memory because it seems to impact

// correctness we are continuing to look into a fix but want to

// commit this workaround as a temporary measure.

if (pinned_vector.Dim() != 0) {

//cudaHostUnregister(pinned_vector.Data());

cudaFreeHost(pinned_vector.Data());

}

// allocated array 2x size

pinned_vector.Resize(count * 2, kUndefined);

//cudaHostRegister(pinned_vector.Data(),

// pinned_vector.Dim() * sizeof(BaseFloat), 0);

void *ptr = pinned_vector.Data();

cudaHostAlloc((void **)&ptr, pinned_vector.Dim() * sizeof(BaseFloat), cudaHostAllocMapped);

}

it can successfully decode few utterances but sometimes still throw the error when decoding

ERROR ([5.5.847~1527-6a95a]:SynchronizeGpu():cudamatrix/cu-device.cc:629) cudaError_t 1 : "invalid argument" returned from 'cudaGetLastError()'

what is the correct way to run cudadecoder on nano ?

thanks

Gary

unread,

Mar 5, 2021, 6:59:06 AM3/5/21

to kaldi-help

conclusion

use cudadecoder2 instead of cudadecoder1

cudadecoder1 frequently alloc , free gpu memory

cudadecoder2 allocate gpu memory better

close this issue

Daniel Povey

unread,

Mar 5, 2021, 9:17:22 AM3/5/21

to kaldi-help

good.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/6d45fe2e-1a41-4f48-8764-01f79a84705cn%40googlegroups.com.

Reply all

Reply to author

Forward