Multi Label Sound Classification using Deep Learning Models

Tasnim Akter Onisha, Jongyeop Kim, Jongho Seol

Research output: Contribution to book or proceedingConference articlepeer-review

Abstract

Accurate and automated sound classification enables a strong groundwork for diverse advanced deep learning applications within the audio and music domain. This study focuses on the application of Convolutional Neural Networks (CNN) and combined LSTM (Long Short-Term Memory) and GRU (Gated Recurrent unit) models for instrument classification from audio signals, contributing to intelligent audio processing systems. Our proposed model exclusively utilizes the Mel-frequency cepstral coefficients (MFCCs) extraction from the audio data for preprocessing. A large and complex dataset, including Nineteen instrument classes are used for training and evaluation. These experimental results demonstrate promising performance, with our proposed CNN architecture achieving an impressive accuracy of 97%, and the LSTM-GRU model achieves a lower accuracy of 80%, compared to the CNN model on the multi-label sound classification task for instruments classes, but its ability to model temporal dependencies add valuable insights into the dynamics of instrument audio sequences. These findings provide valuable insights for researchers and practitioners in audio signal processing and machine learning.

Original languageEnglish
Title of host publication2024 IEEE/ACIS 22nd International Conference on Software Engineering Research, Management and Applications, SERA 2024 - Proceedings
EditorsTeruhisa Hochin, Jixin Ma, Osamu Mizuno
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages129-134
Number of pages6
ISBN (Electronic)9798350391343
DOIs
StatePublished - 2024
Event22nd IEEE/ACIS International Conference on Software Engineering Research, Management and Applications, SERA 2024 - Honolulu, United States
Duration: May 30 2024Jun 1 2024

Publication series

Name2024 IEEE/ACIS 22nd International Conference on Software Engineering Research, Management and Applications, SERA 2024 - Proceedings

Conference

Conference22nd IEEE/ACIS International Conference on Software Engineering Research, Management and Applications, SERA 2024
Country/TerritoryUnited States
CityHonolulu
Period05/30/2406/1/24

Scopus Subject Areas

  • Computer Science Applications
  • Software
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Artificial Intelligence

Keywords

  • audio classification
  • CNN
  • deep learning
  • GRU
  • instrument recognition
  • LSTM
  • MFCCs
  • multi-label classification
  • sound classification

Fingerprint

Dive into the research topics of 'Multi Label Sound Classification using Deep Learning Models'. Together they form a unique fingerprint.

Cite this