nano is aarch64 platform
when i firstly run cudadecoder on nano , it showed the error
ERROR ([5.5.776~1-fcc6a3]:SynchronizeGpu():cudamatrix/
cu-device.cc:629) cudaError_t 713 : "pointer does not correspond to a registered memory region" returned from 'cudaGetLastError()'
someone said
cudaHostRegister() is not supported on ARM platforms.
This is because the caching attribute of an existing allocation can't be changed on the fly.
If required, please use cudaHostAlloc() with the flag cudaHostAllocMapped to allocate device-mapped host-accessible memory.
I modified the code cudadecoder/batched-threaded-nnet3-cuda-pipeline.cc in red color
void BatchedThreadedNnet3CudaPipeline::ComputeBatchFeatures(
int32 first, std::vector<TaskState *> &tasks,
OnlineCudaFeaturePipeline &feature_pipeline) {
KALDI_ASSERT(config_.gpu_feature_extract == true);
nvtxRangePushA("CopyBatchWaves");
// below we will pack waves into a single buffer for efficient transfer
// across device
// first count the total number of elements and create a single large
// vector
int count = 0;
for (int i = first; i < tasks.size(); i++) {
count += tasks[i]->task_data->wave_samples->Dim();
}
// creating a thread local vector of pinned memory.
// wave data will be stagged through this memory to get
// more efficient non-blocking transfers to the device.
thread_local Vector<BaseFloat> pinned_vector;
if (pinned_vector.Dim() < count) {
// WAR: Not pinning memory because it seems to impact
// correctness we are continuing to look into a fix but want to
// commit this workaround as a temporary measure.
if (pinned_vector.Dim() != 0) {
//cudaHostUnregister(pinned_vector.Data());
cudaFreeHost(pinned_vector.Data());
}
// allocated array 2x size
pinned_vector.Resize(count * 2, kUndefined);
//cudaHostRegister(pinned_vector.Data(),
// pinned_vector.Dim() * sizeof(BaseFloat), 0);
void *ptr = pinned_vector.Data();
cudaHostAlloc((void **)&ptr, pinned_vector.Dim() * sizeof(BaseFloat), cudaHostAllocMapped);
}
it can successfully decode few utterances but sometimes still throw the error when decoding
ERROR ([5.5.847~1527-6a95a]:SynchronizeGpu():cudamatrix/cu-device.cc:629) cudaError_t 1 : "invalid argument" returned from 'cudaGetLastError()'
what is the correct way to run cudadecoder on nano ?
thanks