A novel machine learning-based approach to detect DTMF tones affected by noise, frequency and time variations by employing the k-nearest neighbour (KNN) algorithm is proposed. The features required for training the proposed KNN classifier are extracted using Goertzel's algorithm that estimates the absolute discrete Fourier transform (DFT) coefficient values for the fundamental DTMF frequencies with or without considering their second harmonic frequencies. The proposed KNN classifier model is configured in four different manners which differ in being trained with or without augmented data, as well as, with or without the inclusion of second harmonic frequency DFT coefficient values as features.
It is found that the model which is trained using the augmented data set and additionally includes the absolute DFT values of the second harmonic frequency values for the eight fundamental DTMF frequencies as the features, achieved the best performance with a macro classification F1 score of 0.980835, a five-fold stratified cross-validation accuracy of 98.47% and test data set detection accuracy of 98.1053%.
The generated DTMF signal has been classified and detected using the proposed KNN classifier which utilizes the DFT coefficient along with second harmonic frequencies for better classification. Additionally, the proposed KNN classifier has been compared with existing models to ascertain its superiority and proclaim its state-of-the-art performance.
Nagi et al. , [ 20 ] proposed the AI-based method to detect DTMF signals which are contaminated by white Gaussian noise (WGN) using support vector machines (SVM). Pao et al. , [ 21 ] assigned three weighting functions and compared the performance of weighted k-nearest neighbour (KNN), weighted D-KNN and conventional KNN to identify ten digits in Mandarin database. To retrieve content-based audio, a genetic algorithm with KNN based approach has been suggested [ 22 ]. The audio files fed at the input of the system have been ordered by their similarity at the output of the system using varying features. Ali et al. , [ 23 ] investigated the performance of the KNN classifier to classify heterogeneous data by measuring the resemblance between the distance for binary and numerical data. Daponte et al. , [ 24 ] depicted a new method to decode DTMF tones efficiently using an artificial neural network (ANN) and implemented the same on a digital signal processor (DSP). After a suitable training phase, the ANN can relate an output about the DTMF signal. Salamon et al. [ 25 ] proposed a convolutional neural network (CNN)-based architecture to classify environmental sounds. Also, they have demonstrated the importance of data augmentation in avoiding the problem of data scarcity.
The major contributions of the proposed research are: Proposed a machine learning-based DTMF detection technique for identifying the DTMF signals which have been affected by subjected additive white Gaussian noise and also huge frequency and time variations.
The performance is analysed and compared with existing detection models using the classification precision, recall and F1 score combined with the five-fold stratified cross-validation accuracy to measure the efficacy.
The absolute value of the DFT coefficients in Goertzel's algorithm has been computed using a second-order recursive digital resonance system [ 12 ] and it is shown in Figure 1 . In place of solving for all N -point DFT values, this algorithm obtains the values pertinent to the DTMF frequencies by using eight/sixteen banks of filters, depending upon the inclusion of second harmonics. The index value k for the DFT is defined as k = N*f/f s , where f, N and f s are frequency of DTMF signal, length of the block and sampling frequency respectively.
In recent past, KNN models are widely used to classify, detect or recognize patterns amongst other similar applications where a high accuracy is required but not a human-readable model. In this section, the various preprocessing required for the proposed model is explained.
The lack of the amount of required data is one of the most prevalent issues in data science problems. Data augmentation assists in the generation of synthetic data from existing data sets such that the generalization capability of the classifier model can be enhanced. The proposed method for the detection of DTMF tones is designed to be robust to all noise, frequency and time variations corrupting audio signals during the transmission of these DTMF tones over a telecommunication network channel. To incorporate this, the clean data set comprising 2032 audio files was augmented such that each audio file is made into 10 corrupted audio files with different random errors therefore, creating a final data set of 20,320 audio files. Audio file corruption was done by the addition of additive white Gaussian noise (AWGN), time-stretch, time-shift and volume control.
In any automated audio-recognition/classification system, the arguably most important step is to extract features that can be used to train the classifier model. These features must be unique and should be useful to identify and differentiate the spectral content in the audio file while ignoring redundant information such as background noise etc. Our approach involves the extraction of the absolute value of the DFT components of the 8 frequencies, namely the 4 higher frequencies and the 4 lower frequencies used in DTMF tones along with their second harmonics. Therefore, a total of 16 computations are required. Goertzel's algorithm has been employed to compute the magnitude of these 16-individual DFT coefficients. Through data exploration, it was inferred that Goertzel algorithm's coefficients computation serves as viable features for training the model to produce accurate predictions.
The proposed machine learning-based KNN classifier is shown in Figure 2 . As discussed in previous section, the data set is acquired by web scraping by the prevalent technical specifications for DTMF tones. Next, the data set is augmented with noise and other channel discrepancies to make the model more robust to noise and other interferences. For training and testing the KNN classifier model, the data set is divided randomly into a ratio of 4:1. 80% of the data set was used for training the model and the balance is used to test the model's accuracy. Over or under-fitting of the model was avoided this way.
KNN Model A: The model is trained using the clean/non-augmented data set consisting of 2032 audio files. However, it is tested with the augmented data set to simulate real-word environments. The DFT coefficient values about the second harmonics are not considered in this model.
KNN Model B : The model is trained with the clean/non-augmented data set consisting of 2032 audio files. It is tested with the augmented data set. The DFT coefficient values about the second harmonics are considered in this model.
Training the model with the augmented data set makes the model more robust to noise and other signal interferences. Moreover, the inclusion of second harmonic frequency values as features also enhances the model's ability to distinguish and isolate DTMF tones from noise. We proceed with the intuition that the employment of both these methods would create a model less susceptible to noise which as the results prove, is a valid hypothesis.
If so, the algorithm deems that x is overexploiting. The exploration ratio, therefore controls a trade-off between exploring new points for a better global solution versus concentrating near points that have been examined already. The resulting optimizing hyperparameter values for each KNN model are shown in Table 2 .
The proposed KNN classifier models are created and compared based on the reported performance metrics. The results are evaluated to choose the best model amongst them. To impartially judge the performance of all the models, a test data set created by randomly taking 20% of the augmented data set which best simulates the real-word telecommunication channel scenario is given as input to the models, and the predicted responses are then used to plot the confusion matrix/chart. This confusion chart is used for computation of the performance metrics as it shows how well the model will perform on unseen and new data. The performance metrics such as macro-precision classification score, macro-recall classification score, macro F1 classification score and overall test data set detection accuracy are analysed. The proposed KNN model is simulated and validated using MATLAB R2019.
The first three models showcased results that are inferior to the final model (KNN Model D). Since these models are not the focus of our research article, we will refrain from discussing their results in great detail. Their macro precision, recall and F1 classification score along with their test dataset detection accuracy have been generated and discussed in the following subsection.
The confusion charts for the five-fold stratified cross validation accuracy and the test accuracy of the KNN model D are generated and shown in Figures 3 and 4 , respectively. The computed individual categorical class recall, precision and F1 score for the test data set are tabulated in Table 3 .
From Figure 3 , it is observed that the proposed KNN model D achieved a mean five-fold stratified cross-validation detection accuracy of 98.47%. Moreover, Figure 4 shows that the proposed model is able to detect the DTMF signals in the test data set with an overall detection accuracy of 98.1053%.
The value for macro-precision and macro-recall is found to be 0.980938 and 0.98075, respectively. The value of the macro F1 classification score is obtained as 0.980835. The performance metrics obtained for the four proposed KNN models have been tabulated in Table 4 .
From Table 4 , it is observed that the proposed KNN classifier model D which utilizes Goertzel's algorithm to compute DFT values at all the 16 frequencies, involving the 8 fundamental frequencies as well as their second harmonics values, has the highest metrics and is thus the most optimum model to decode DTMF signals. Since the model is tested using an augmented and noisy data set, the accuracy achieved can be considered robust and reliable. The data set is indicative of the noise one may face in real-world environments i.e. the telecommunication channels.
7fc3f7cf58