Hi Yoshiki,
Sorry for the delay.
First, in answer to your question regarding the discrepancy between the number of positions in the SONICOM HRTF dataset. The discrepancy arises from the repetition of the top measurement at the 90-degree elevation. This is due to the measurement rig setup, which uses a loudspeaker arch where the participant is rotated and measured every 5 degrees, resulting in the 90-degree elevation measurement being measured multiple times (36 times in total). Initially, the SONICOM HRTF dataset retained these 35 redundant measurements, and the original upsampling paper referenced those HRTFs with 828 transfer functions. The redundant positions have since been removed, leaving only the measurements from the 793 unique positions.
Second, in answer to your question regarding the thresholds, they were based on the fact that an HRTF selection method will achieve around 7 to 8 LSD with no personalisation. Therefore, we believe the thresholds are quite generous, and if they are not met, the HRTFs will probably not be very realistic.
The challenge is meant to encourage methods that can deal with sparse measurements, such as ML techniques, rather than the baselines, which will fail in this scenario. So, the baselines' performance was not used to calculate the threshold values, and in fact, the baselines do not pass the thresholds.
For transparency, below is the performance of the two baselines in terms of their mean LSD:
Barycentric baseline:
100 positions: 3.67
19 positions: 4.85
5 positions: 7.69
3 positions: 8.37
SHT baseline (a vanilla approach with no pre-processing):
100 positions: 4.19
19 positions: 5.39
5 positions: 16.83
3 positions: 16.46
Note that both baselines fail to meet the threshold of 7.4 at the low sparsity levels. This is not that surprising as they can only perform a weighted sum of the existing points without any prior knowledge.
Please also note a couple of things:
Although there will be an overall winner, we will also mention the 'winner' for each sparsity level.
You are not limited to upsampling with a single algorithm. We are asking (only) for 12 SOFA files. So, you can use different methods depending on the sparsity level.
I hope this helps to clarify any confusion. Let me know if you have any more questions.
Cheers,
Aidan, along with the rest of the LAP team
---------------------------------------------------------
Dr Aidan Hogg
Lecturer at Queen Mary University of London
Honorary Research Associate at Imperial College London
Centre for Digital Music
Email: a.h...@qmul.ac.uk
--
The IEEE Signal Processing Society is sponsoring the Listener Acoustic Personalisation Challenge.
---
You received this message because you are subscribed to the Google Groups "SONICOM LAP Challenge" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sonicom-lap-chal...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/sonicom-lap-challenge/22e0ed06-ebb1-4696-b77c-62a800c247f4n%40googlegroups.com.
Hi Yoshiki,
We agree with you that 31.6 μs is very strict (probably too strict), and the vanilla baseline approaches often fail to pass it. In light of this, we have decided to increase the ITD threshold to 62.5 μs.
This is mainly because it does seem unfair to disqualify entries only because the ITD is two samples out. Also, we will still be ranking the ITD performance.
The new threshold value of 62.5 μs (3 samples) has been chosen because the maximum ITD is approximately 660 μs when a sound source is placed at 90° azimuth, and the normal human detection threshold for an ITD is around 10 μs. Thus, the new threshold is around 10% of the average maximum ITD value instead of 5%.
We will update the description document and evaluation code to reflect this new ITD threshold and will send out an announcement shortly. We will also update the plots (in the description document) to reflect the LAP challenge configuration.
In the meantime, as requested, please see the ITD results for the two baselines below:
Barycentric baseline:
100 positions:
Mean ITD Error: 33.609
Mean ILD Error: 0.564
Mean LSD Error: 3.298
19 positions:
Mean ITD Error: 37.544
Mean ILD Error: 1.798
Mean LSD Error: 4.865
5 positions:
Mean ITD Error: 36.757
Mean ILD Error: 4.683
Mean LSD Error: 7.699
3 positions:
Mean ITD Error: 49.634
Mean ILD Error: 6.861
Mean LSD Error: 8.368
SHT baseline (a vanilla approach with no pre-processing):
100 positions:
Mean ITD Error: 42.126
Mean ILD Error: 0.587
Mean LSD Error: 4.195
19 positions:
Mean ITD Error: 42.455
Mean ILD Error: 1.685
Mean LSD Error: 5.388
5 positions:
Mean ITD Error: 43.542
Mean ILD Error: 7.923
Mean LSD Error: 16.831
3 positions:
Mean ITD Error: 51.818
Mean ILD Error: 9.140
Mean LSD Error: 16.469
Thank you so much for your help with this, and we apologise again for the last-minute adjustments. Given this is the first time we are running the LAP challenge, we are trying to be as flexible and open as possible.
Thank you for being accommodating.
Cheers,
Aidan, along with the rest of the LAP team
---------------------------------------------------------
Dr Aidan Hogg
Lecturer at Queen Mary University of London
Honorary Research Associate at Imperial College London
Centre for Digital Music
Email: a.h...@qmul.ac.uk