Diarized Speech
If everything went well, you should have a file called rttm in the directory $nnet_dir/xvectors_$name/plda_scores_threshold_${threshold}/. The 2nd column is the recording ID, the 3rd column is the start-time of a segment, and the 4th is the time offset. The 8th column is the speaker label assigned to that segment.
SPEAKER mfcny 0 86.200 16.400 <NA> <NA> 1 <NA> <NA>
SPEAKER mfcny 0 103.050 5.830 <NA> <NA> 1 <NA> <NA>
SPEAKER mfcny 0 109.230 4.270 <NA> <NA> 1 <NA> <NA>
SPEAKER mfcny 0 113.760 8.625 <NA> <NA> 1 <NA> <NA>
SPEAKER mfcny 0 122.385 4.525 <NA> <NA> 2 <NA> <NA>
SPEAKER mfcny 0 127.230 6.230 <NA> <NA> 2 <NA> <NA>
SPEAKER mfcny 0 133.820 0.850 <NA> <NA> 2 <NA> <NA>
wav.scpVL180810112737108 ./wavFile/VL180810112737108.wav
segments
spk001-VL180810112737108-001 VL180810112737108 0.00 0.76spk001-VL180810112737108-002 VL180810112737108 1.03 1.92spk002-VL180810112737108-001 VL180810112737108 1.92 2.86spk002-VL180810112737108-002 VL180810112737108 7.11 10.47
utt2spkspk001-VL180810112737108-001 spk001spk001-VL180810112737108-002 spk001spk002-VL180810112737108-001 spk002spk002-VL180810112737108-002 spk002
spk2uttspk001 spk001-VL180810112737108-001 spk001-VL180810112737108-002spk002 spk002-VL180810112737108-001 spk002-VL180810112737108-002
SPEAKER VL180810112737108 0 0.000 0.760 <NA> <NA> 1 <NA> <NA>SPEAKER VL180810112737108 0 1.030 1.830 <NA> <NA> 1 <NA> <NA>SPEAKER VL180810112737108 0 7.110 3.360 <NA> <NA> 1 <NA> <NA>SPEAKER VL180810112737108 0 11.930 1.330 <NA> <NA> 1 <NA> <NA>SPEAKER VL180810112737108 0 13.350 1.740 <NA> <NA> 1 <NA> <NA>SPEAKER VL180810112737108 0 20.050 4.570 <NA> <NA> 1 <NA> <NA>SPEAKER VL180810112737108 0 24.640 0.320 <NA> <NA> 2 <NA> <NA>SPEAKER VL180810112737108 0 25.730 1.020 <NA> <NA> 1 <NA> <NA>SPEAKER VL180810112737108 0 27.950 4.210 <NA> <NA> 1 <NA> <NA>SPEAKER VL180810112737108 0 32.190 0.260 <NA> <NA> 2 <NA> <NA>SPEAKER VL180810112737108 0 33.900 -13.895 <NA> <NA> 1 <NA> <NA>SPEAKER VL180810112737108 0 20.005 -14.605 <NA> <NA> 2 <NA> <NA>SPEAKER VL180810112737108 0 5.500 0.560 <NA> <NA> 2 <NA> <NA>SPEAKER VL180810112737108 0 6.360 0.640 <NA> <NA> 2 <NA> <NA>SPEAKER VL180810112737108 0 11.300 0.300 <NA> <NA> 2 <NA> <NA>SPEAKER VL180810112737108 0 11.620 0.270 <NA> <NA> 2 <NA> <NA>SPEAKER VL180810112737108 0 16.290 0.230 <NA> <NA> 2 <NA> <NA>SPEAKER VL180810112737108 0 16.680 1.090 <NA> <NA> 2 <NA> <NA>SPEAKER VL180810112737108 0 18.180 0.250 <NA> <NA> 2 <NA> <NA>SPEAKER VL180810112737108 0 18.530 0.630 <NA> <NA> 2 <NA> <NA>SPEAKER VL180810112737108 0 19.260 0.710 <NA> <NA> 2 <NA> <NA>SPEAKER VL180810112737108 0 33.430 0.350 <NA> <NA> 2 <NA> <NA>
SPEAKER VL180810112737108 0 0.000 0.760 <NA> <NA> 8 <NA> <NA>SPEAKER VL180810112737108 0 1.030 1.830 <NA> <NA> 8 <NA> <NA>SPEAKER VL180810112737108 0 7.110 3.360 <NA> <NA> 8 <NA> <NA>SPEAKER VL180810112737108 0 11.930 1.330 <NA> <NA> 8 <NA> <NA>SPEAKER VL180810112737108 0 13.350 1.740 <NA> <NA> 8 <NA> <NA>SPEAKER VL180810112737108 0 20.050 0.465 <NA> <NA> 1 <NA> <NA>SPEAKER VL180810112737108 0 20.515 4.105 <NA> <NA> 8 <NA> <NA>SPEAKER VL180810112737108 0 24.640 0.320 <NA> <NA> 7 <NA> <NA>SPEAKER VL180810112737108 0 25.730 1.020 <NA> <NA> 8 <NA> <NA>SPEAKER VL180810112737108 0 27.950 4.210 <NA> <NA> 8 <NA> <NA>SPEAKER VL180810112737108 0 32.190 0.260 <NA> <NA> 7 <NA> <NA>SPEAKER VL180810112737108 0 33.900 -13.895 <NA> <NA> 8 <NA> <NA>SPEAKER VL180810112737108 0 20.005 -14.605 <NA> <NA> 9 <NA> <NA>SPEAKER VL180810112737108 0 5.500 0.560 <NA> <NA> 2 <NA> <NA>SPEAKER VL180810112737108 0 6.360 0.640 <NA> <NA> 9 <NA> <NA>SPEAKER VL180810112737108 0 11.300 0.300 <NA> <NA> 9 <NA> <NA>SPEAKER VL180810112737108 0 11.620 0.270 <NA> <NA> 3 <NA> <NA>SPEAKER VL180810112737108 0 16.290 0.230 <NA> <NA> 4 <NA> <NA>SPEAKER VL180810112737108 0 16.680 1.090 <NA> <NA> 9 <NA> <NA>SPEAKER VL180810112737108 0 18.180 0.250 <NA> <NA> 5 <NA> <NA>SPEAKER VL180810112737108 0 18.530 0.630 <NA> <NA> 9 <NA> <NA>SPEAKER VL180810112737108 0 19.260 0.710 <NA> <NA> 9 <NA> <NA>SPEAKER VL180810112737108 0 33.430 0.350 <NA> <NA> 6 <NA> <NA>From the outcome, is it the number 1 and 2 in the 8th column refer to which speaker speak that segment?
The outcome just label the speaker in this recordings as number 1 and 2?
The highest number I get from this outcome is "9", so it mean there have 9 different speakers in the recording ?