I am trying to combine data from two datasets using the utils/combine_data.sh script. At the validation, step it keeps removing one of the datasets using the filters the issue is shown below
utils/combine_data.sh: combined utt2spk
utils/combine_data.sh [info]: not combining utt2lang as it does not exist
utils/combine_data.sh [info]: not combining utt2dur as it does not exist
utils/combine_data.sh [info]: not combining feats.scp as it does not exist
utils/combine_data.sh: combined text
utils/combine_data.sh [info]: not combining cmvn.scp as it does not exist
utils/combine_data.sh [info]: not combining reco2file_and_channel as it does not exist
utils/combine_data.sh: combined wav.scp
utils/combine_data.sh [info]: not combining spk2gender as it does not exist
utils/validate_data_dir.sh: Error: in data/combine_AmJm_2000_tmp, recording-ids extracted from segments and wav.scp
utils/validate_data_dir.sh: differ, partial diff is:
73a74,4896
> sp0.9-fabm2aa1
> sp0.9-fabm2ab2
> sp0.9-fabm2ac1
> sp0.9-fabm2ad2
> sp0.9-fabm2ae2
...
> sp1.1-mwjk2dq2
> sp1.1-mwjk2dr2
> sp1.1-mwjk2ds2
> sp1.1-mwjk2du2
> sp1.1-mwjk2dv2
> sp1.1-mwjk2dw2
[Lengths are kaldi.FfVS/recordings=146 versus kaldi.FfVS/recordings.wav=9792]
steps/make_mfcc.sh --cmd
run.pl --nj 50 data/combine_AmJm_2000_tmp exp/make_mfcc/combine_AmJm_2000_tmp mfcc_perturbed
utils/validate_data_dir.sh: Error: in data/combine_AmJm_2000_tmp, recording-ids extracted from segments and wav.scp
utils/validate_data_dir.sh: differ, partial diff is:
73a74,4896
> sp0.9-fabm2aa1
> sp0.9-fabm2ab2
> sp0.9-fabm2ac1
> sp0.9-fabm2ad2
> sp0.9-fabm2ae2
...
> sp1.1-mwjk2dq2
> sp1.1-mwjk2dr2
> sp1.1-mwjk2ds2
> sp1.1-mwjk2du2
> sp1.1-mwjk2dv2
> sp1.1-mwjk2dw2
[Lengths are kaldi.gdNw/recordings=146 versus kaldi.gdNw/recordings.wav=9792]