Thankyou for visiting
nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
The stethoscope has been considered as an invaluable diagnostic tool ever since it was invented in the early 1800s. Auscultation is non-invasive, real-time, inexpensive, and very informative1,2,3. Recent electronic stethoscopes have rendered lung sounds recordable, and it facilitated the studies of automatically analyzing lung sounds4,5. Abnormal lung sounds include crackles, wheezes, rhonchi, stridor, and pleural friction rubs (Table 1). Crackles, wheezes and rhonchi are the most commonly found among them, and detecting those sounds greatly aids the diagnosis of pulmonary diseases6,7. Crackles, which are short, explosive, and non-musical, are produced by patients with parenchymal lung diseases such as pneumonia, interstitial pulmonary fibrosis (IPF), and pulmonary edema1,8,9. Wheezes are musical high-pitched sounds associated with airway diseases such as asthma and chronic obstructive pulmonary disease (COPD). Rhonchi are musical low-pitched sounds similar to snores, usually indicating secretions in the airway, and are often cleared by coughing1.
Although auscultation has many advantages, the ability to analyze respiratory sounds among clinicians varies greatly depending on individual clinical experiences6,10. Salvatore et al. found that hospital trainees misidentified about half of all pulmonary sounds, as did medical students11. Melbye et al. reported significant inter-observer differences in terms of discriminating expiratory rhonchi and low-pitched wheezes from other sounds, potentially compromising diagnosis and treatment12. These limitations of auscultation raised the need to develop a standardized system that can classify accurately respiratory sounds using artificial intelligence (AI). AI-assisted auscultation can help a proper diagnosis of respiratory disease and identify patients in need of emergency treatment. It can be used to screen and monitor patients with various pulmonary diseases including asthma, COPD and pneumonia13,14.
Although this field has been being actively studied, it is still in its infancy with significant limitations. Many studies enrolled patients of a limited age group (children only), and some studies analyzed the sounds of a small numbers of patients. The studies that used the respiratory sounds of the ICBHI 2017 or the R.A.L.E. Repository database have a limitation in types of abnormal sounds. The ICBHI database contained crackles and wheezes only, and R.A.L.E. database lacked rhonchi39.
In this study, we aimed to classify normal respiratory sounds, crackles, wheezes, and rhonchi. We made a database of 1,918 respiratory sounds from adult patients with pulmonary diseases and healthy controls. Then we used transfer learning and convolutional neural network (CNN) to classify those respiratory sounds. We tried to combine pre-trained image feature extraction from time-series, respiratory sound, and CNN classification. In addition, we measured how accurately medical students, interns, residents, and fellows categorized breathing sounds to check the accuracy of auscultation classification in real clinical practice.
In clinical settings, distinguishing abnormal breathing sounds and normal sounds is very important in screening emergency situations and deciding whether to perform additional tests. Our sound database included 1222 normal sounds and 696 abnormal sounds. We first checked how accurately our deep-learning based algorithm can classify abnormal respiratory sounds from normal sounds (Fig. 1). The precision, recall, and F1 scores for abnormal lung sounds were 84%, 80%, and 81% respectively (Table 3). The accuracy was 86.5% and the mean AUC was 0.93 (Fig. 2).
Scheme of the classification of respiratory sounds using deep learning. Lung sounds database contains normal sounds, crackles, wheezes, and rhonchi. Deep learning was used for two types of classification: The first step is the discriminating normal sounds from abnormal sounds. The second is to categorize abnormal sounds into crackles, wheezes, and rhonchi. (ER: Emergency room, ICU: intensive care unit).
ROC of the model for classifying abnormal lung sounds into crackles, wheezes, and rhonchi. Each plot illustrates the ROC of the algorithm on the independent testing set for crackles, wheezes, and rhonchi with the mean AUC of 0.92.
Respiratory sounds, especially abnormal sounds, have very complicated structures with noise, and positional dependency in time. In the sound analysis, particularly mathematical point of view, its 2-D spectral-domain has more information rather than one dimensional time-series. Moreover, the deep learning structure gives an automatic feature extraction overcoming the difficulties on complicate data, especially image data. For this reason, we adopted CNN, which is a powerful method in image classification. To find out the most optimized strategy for the classification of respiratory sounds, we also compared the accuracy, precision, recall score and F1 score of each analytic method (Table 5). CNN classifier showed the best performance with VGG, especially, VGG16 rather than InceptionV3, DenseNet201, ResNet50, and ResNet101. Since VGG architecture has a better capability, especially in extracting image features for classification using transfer learning40,41, we adopted it for our AI models.
Additionally, we compared the performance between CNN and SVM classifiers in order to investigate classifier dependency of feature extractor. CNN showed better performance than SVM, and VGG16 was the best classifier for both CNN and SVM. Moreover, CNN was more efficient in computation time than SVM (Table 6).
Several studies have tried to automatically classify lung sounds. Chamberlain et al. classified lung sounds with a semi-supervised deep learning algorithm. The AUC were 0.86 for wheezes and 0.74 for crackles, respectively26. Guler et al. used a multilayer perceptron running a backpropagation training algorithm to predict the presence or absence of adventitious sounds27. They enrolled 56 patients and two hidden layers yielded 93.8% rated classification performance27.
Besides, our comparison of performances of different feature extractors demonstrated that CNN classifier showed much better performance with VGG, especially, VGG16 than InceptionV3 and Densenet201. The main contribution of this study is to develop the predictive model for respiratory sound classification combining pretrained image feature extractor of time-series, respiratory sound, and CNN classifier.
Our deep learning-based classification can detect abnormal lung sounds with an AUC of 0.93 and an accuracy of 86.5%. It has similar results in categorizing abnormal sounds into subcategorical sounds: crackles, wheezes, or rhonchi. Considering these are the result of analyzing the sounds recorded in a real clinical field with various noises, these are impressive results. We believe that these accuracies are adequate for primary screening and follow-up testing of patients with respiratory diseases.
Our test results showed that the auscultation accuracy of interns and residents were less than 80% in all four kinds of sounds and rhonchi was the most difficult sound to discriminate. The result of the test is not conclusive since the number of participants is small. However, it looks obvious that there are marked differences in the ability of each clinician to classify breathing sounds. This suggests that AI-assisted classification standardize the identification and categorization of breath sounds and greatly aid the diagnosis of pulmonary diseases.
There are several respiratory sounds in which two or more abnormal breath sounds are mixed. Such sounds are sometimes difficult even for experts and there may be disagreements between them. Few published studies have classified mixed abnormal breathing sounds, so research about these sounds is necessary. Also, since noises such as coughs, voices, heart sounds, and medical alarms are frequently recorded with breath sound, which reduces the accuracy of analysis, the technology for noise filtering is required.
We found that our deep learning-based classification could classify the respiratory sounds accurately. Utilizing the transfer learning method, combining pre-trained image feature extraction from respiratory sound and CNN classification, worked well and was helpful for improving the classification accuracy. Though the analysis of mixed abnormal sounds and filtering noises remain challenging, recent innovations in analytic algorithm and recording technology will accelerate the advance of respiratory sound analysis more rapidly. Soon, deep learning-based automated stethoscope is expected to be used in telemedicine and home care (Fig. 5).
Summary of deep learning assisted classification of respiratory sounds. Respiratory sounds were corrected from the patients with pulmonary diseases. The sounds were validated and classified by pulmonologists. The sounds were converted to Mel-spectrogram and features were extracted by VGG16 (transfer learning). Respiratory sounds were classified by CNN. Deep learning-based classification of respiratory sounds can be helpful for screening, monitoring, and diagnosis of pulmonary diseases.
Recorded sounds were ranged from a few seconds to several tens of seconds. We divided them into 6 s each with 50% overlapping. For example, the audio file is a 14.5-s audio file of wheezing, which is divided into 3 cycles according to the start and end times (Fig. 7). And, to process the feature extraction and use the 3-dimensional input data, we used Mel-spectrogram, average of harmonic and percussive Mel-spectrogram, and the derivative of Mel-spectrogram using the Python library librosa47.
3a8082e126