run cudadecoder on Jetson Nano

188 views
Skip to first unread message

Gary

unread,
Mar 4, 2021, 12:33:12 PM3/4/21
to kaldi-help
nano is aarch64 platform

when i firstly run cudadecoder on nano , it showed the error
ERROR ([5.5.776~1-fcc6a3]:SynchronizeGpu():cudamatrix/cu-device.cc:629) cudaError_t 713 : "pointer does not correspond to a registered memory region" returned from 'cudaGetLastError()' 

someone said 
cudaHostRegister() is not supported on ARM platforms.
This is because the caching attribute of an existing allocation can't be changed on the fly.
If required, please use cudaHostAlloc() with the flag cudaHostAllocMapped to allocate device-mapped host-accessible memory.

I modified the code cudadecoder/batched-threaded-nnet3-cuda-pipeline.cc in red color

 void BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures(
    int32 first, std::vector<TaskState *> &tasks,
    OnlineCudaFeaturePipeline &feature_pipeline) {
  KALDI_ASSERT(config_.gpu_feature_extract == true);
  nvtxRangePushA("CopyBatchWaves");
  // below we will pack waves into a single buffer for efficient transfer
  // across device

  // first count the total number of elements and create a single large
  // vector
  int count = 0;
  for (int i = first; i < tasks.size(); i++) {
    count += tasks[i]->task_data->wave_samples->Dim();
  }

  // creating a thread local vector of pinned memory.
  // wave data will be stagged through this memory to get
  // more efficient non-blocking transfers to the device.
  thread_local Vector<BaseFloat> pinned_vector;

  if (pinned_vector.Dim() < count) {
    // WAR:  Not pinning memory because it seems to impact
    // correctness we are continuing to look into a fix but want to
    // commit this workaround as a temporary measure.
    if (pinned_vector.Dim() != 0) {
      //cudaHostUnregister(pinned_vector.Data());
        cudaFreeHost(pinned_vector.Data());
    }

    // allocated array 2x size
    pinned_vector.Resize(count * 2, kUndefined);
    //cudaHostRegister(pinned_vector.Data(),
    //                 pinned_vector.Dim() * sizeof(BaseFloat), 0);
    void *ptr = pinned_vector.Data();
    cudaHostAlloc((void **)&ptr, pinned_vector.Dim() * sizeof(BaseFloat), cudaHostAllocMapped);
  }

it can successfully decode few utterances but sometimes still throw the error when decoding 

ERROR ([5.5.847~1527-6a95a]:SynchronizeGpu():cudamatrix/cu-device.cc:629) cudaError_t 1 : "invalid argument" returned from 'cudaGetLastError()'

what is the correct way to run cudadecoder on nano ?
thanks

Gary

unread,
Mar 5, 2021, 6:59:06 AM3/5/21
to kaldi-help
conclusion
use cudadecoder2 instead of cudadecoder1
cudadecoder1  frequently alloc , free gpu memory
cudadecoder2  allocate gpu memory better 

close this issue 

Daniel Povey

unread,
Mar 5, 2021, 9:17:22 AM3/5/21
to kaldi-help
good.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/6d45fe2e-1a41-4f48-8764-01f79a84705cn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages