the `recordings` are read by `scipy.io.wavfile.read`, with the following possible data types:
===================== =========== =========== =============
WAV format Min Max NumPy dtype
===================== =========== =========== =============
32-bit floating-point -1.0 +1.0 float32
32-bit PCM -
2147483648 +2147483647 int32
16-bit PCM -32768 +32767 int16
8-bit PCM 0 255 uint8
===================== =========== =========== =============
Note that 8-bit PCM is unsigned.
I use the following function to do conversion (or normalization to [-1,1] which is the usual practice for autio data)
def _to_dtype(data:np.ndarray, dtype:np.dtype=np.float32) -> np.ndarray:
"""
"""
if data.dtype == dtype:
return data
if data.dtype in (np.int8, np.uint8, np.int16, np.int32, np.int64):
data = data.astype(dtype) / (np.iinfo(data.dtype).max+1)
return data
the result is at least correct for int16 type data (i.e. identical to data read using librosa or torchaudio):