A COMPARATIVE ANALYSIS OF OPTIMIZED NEURAL NETWORK AND LARGE-SCALE LANGUAGE MODELS FOR MUSIC GENRE CLASSIFICATION
Abstract
Abstract: The rapid growth of the digital music industry requires accurate music genre classification systems to enhance user experience in streaming services. This study compares a domain-specific Long Short-Term Memory (LSTM) network with three Large Language Models (LLMs)—HuBERT, WavLM, and WAV2Vec 2.0—for Music Genre Classification (MGC). The LSTM model was trained using Mel-spectrograms transformed from the GTZAN dataset, while the LLMs were fine-tuned using a smaller set of raw audio samples due to computational constraints. All models were tested on datasets with identical genre labels to ensure a fair evaluation. Results show that the LSTM model achieved the highest accuracy of 97.10%, outperforming HuBERT (86.00%), WavLM (83.00%), and WAV2Vec 2.0 (80.00%). The LSTM demonstrated superior generalization and stability without overfitting, while the LLMs struggled to differentiate between genres with similar acoustic characteristics. These findings indicate that general-purpose pre-trained models, although powerful, are less effective in music-specific tasks due to domain mismatch. Therefore, incorporating music-specific features and architectures remains essential for achieving higher accuracy and reliability in automatic genre classification systems.
Keywords: audio large language models; comparative deep learning; music genre classification.
Abstrak: Pertumbuhan industri musik digital yang pesat menuntut sistem klasifikasi genre musik yang akurat untuk meningkatkan pengalaman pengguna dalam layanan streaming. Penelitian ini dilatarbelakangi oleh perkembangan pesat model pembelajaran mendalam, khususnya jaringan LSTM dan model bahasa berskala besar LLM seperti HuBERT, WavLM, dan WAV2Vec 2.0, yang telah menunjukkan kemampuan representasi audio yang kuat. Tujuan penelitian ini ini membandingkan jaringan Long Short-Term Memory (LSTM) khusus domain dengan tiga model Large Language Models (LLM)—HuBERT, WavLM, dan WAV2Vec 2.0—untuk tugas Klasifikasi Genre Musik (MGC). Metode penelitian melibatkan pelatihan LSTM menggunakan data Mel-spectrogram hasil transformasi dari dataset GTZAN, sementara LLM disesuaikan (fine-tuning) menggunakan data audio mentah dalam jumlah lebih kecil karena keterbatasan komputasi. Seluruh model diuji pada dataset dengan label genre yang sama untuk memastikan evaluasi yang adil. Hasil penelitian menunjukkan bahwa model LSTM mencapai akurasi tertinggi sebesar 97,10%, sedangkan model HuBERT, WavLM, dan WAV2Vec 2.0 masing-masing memperoleh 86,00%, 83,00%, dan 80,00%. Model LSTM menunjukkan kemampuan generalisasi yang lebih baik tanpa overfitting, sedangkan model LLM cenderung kesulitan membedakan genre dengan karakteristik akustik yang mirip. Kesimpulan penelitian ini adalah ketidaksesuaian domain secara signifikan membatasi performa model umum saat diterapkan pada tugas berbasis musik. Oleh karena itu, penggunaan fitur dan arsitektur khusus musik sangat penting dalam membangun sistem klasifikasi genre yang lebih akurat.
Kata kunci: klasifikasi genre musik; model bahasa besar; perbandingan pembelajaran mendalam.
References
M. Wu and X. Liu, “A Double Weighted KNN Algorithm and Its Application in the Music Genre Classification,” in 2019 6th International Conference on Dependable Systems and Their Applications (DSA), Harbin, China: IEEE, Jan. 2020, pp. 335–340. doi: 10.1109/DSA.2019.00051.
N. Narkhede, S. Mathur, A. A. Bhaskar, K. K. Hiran, M. Dadhich, and M. Kalla, “A New Methodical Perspective for Classification and Recognition of Music Genre Using Machine Learning Classifiers,” in 2023 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC), Windhoek, Namibia: IEEE, Aug. 2023, pp. 94–99. doi: 10.1109/ETNCC59188.2023.10284969.
U. M. Srinivas, S. Rafi, T. V. Manohar, and M. V. Rao, “Classification of Music Genre Using Deep Learning Approaches,” in 2024 4th International Conference on Artificial Intelligence and Signal Processing (AISP), VIJAYAWADA, India: IEEE, Oct. 2024, pp. 1–5. doi: 10.1109/AISP61711.2024.10870721.
I. Pathania and N. Kaur, “Classification of Music Genre Using Machine Learning,” in 2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT), Bangalore, India: IEEE, Oct. 2022, pp. 1–5. doi: 10.1109/GCAT55367.2022.9972105.
M. Singla, K. S. Gill, M. Kumar, and R. Rawat, “Classification of Musical Genres Utilizing the CNN Sequential Model and Deep Learning Techniques,” in 2024 IEEE International Conference on Information Technology, Electronics and Intelligent Communication Systems (ICITEICS), Bangalore, India: IEEE, Jun. 2024, pp. 1–5. doi: 10.1109/ICITEICS61368.2024.10625371.
Z. Ma, “Comparison between Machine Learning Models and Neural Networks on Music Genre Classification,” in 2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA), Changchun, China: IEEE, May 2022, pp. 189–194. doi: 10.1109/CVIDLICCEA56201.2022.9825050.
R. Gusain, S. Sonker, S. K. Rai, A. Arora, and S. T. Nagarajan, “Comparison of Neural Networks and XGBoost Algorithm for Music Genre Classification,” in 2022 2nd International Conference on Intelligent Technologies (CONIT), Hubli, India: IEEE, Jun. 2022, pp. 1–6. doi: 10.1109/CONIT55038.2022.9847814.
S. Mohanapriya, S. Jhansi Ida, M. Magadalene, S. Nithiyashree, U. Monisha, and M. Indraja, “Deep Learning-Based Music Genre Classification using Convolutional Neural Network,” in 2024 First International Conference on Software, Systems and Information Technology (SSITCON), Tumkur, India: IEEE, Oct. 2024, pp. 1–6. doi: 10.1109/SSITCON62437.2024.10796579.
V. Shah, A. Tandle, N. Sharma, and V. Sheth, “Genre Based Music Classification using Machine Learning and Convolutional Neural Networks,” in 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kharagpur, India: IEEE, Jul. 2021, pp. 1–8. doi: 10.1109/ICCCNT51525.2021.9579597.
M. Sambath, R. L. Kumar, S. M. Vishnu Reddy, V. P. Reddy, L. Joseph, and M. Kathiravan, “Identification and Classification of Music Genre using Deep Learning,” in 2022 Second International Conference on Computer Science, Engineering and Applications (ICCSEA), Gunupur, India: IEEE, Sep. 2022, pp. 1–6. doi: 10.1109/ICCSEA54677.2022.9936530.
N. Srivastava, S. Ruhil, and G. Kaushal, “Music Genre Classification using Convolutional Recurrent Neural Networks,” in 2022 IEEE 6th Conference on Information and Communication Technology (CICT), Gwalior, India: IEEE, Nov. 2022, pp. 1–5. doi: 10.1109/CICT56698.2022.9997961.
A. Ghildiyal and S. Sharma, “Music Genre Classification Using Data Filtering Algorithm: An Artificial Intelligence Approach,” in 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India: IEEE, Sep. 2021, pp. 922–926. doi: 10.1109/ICIRCA51532.2021.9544592.
K. S. Mounika, S. Deyaradevi, K. Swetha, and V. Vanitha, “Music Genre Classification Using Deep Learning,” in 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), Coimbatore, India: IEEE, Oct. 2021, pp. 1–7. doi: 10.1109/ICAECA52838.2021.9675685.
S. Prince, J. J. Thomas, S. J. J, K. P. Priya, and J. J. Daniel, “Music Genre Classification using Deep learning - A review,” in 2022 6th International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), Bangalore, India: IEEE, Dec. 2022, pp. 1–5. doi: 10.1109/CSITSS57437.2022.10026394.
M. E. A. Meguenani, A. de S. Britto, and A. L. Koerich, “Music Genre Classification using Large Language Models,” 2024, arXiv. doi: 10.48550/ARXIV.2410.08321.
S. W. J, P. K. R M, P. K. K, and P. J, “Music Genre Classification Using LSTM and CNN,” in 2023 3rd International Conference on Pervasive Computing and Social Networking (ICPCSN), Salem, India: IEEE, Jun. 2023, pp. 205–209. doi: 10.1109/ICPCSN58827.2023.00039.
N. Ndou, R. Ajoodha, and A. Jadhav, “Music Genre Classification: A Review of Deep-Learning and Traditional Machine-Learning Approaches,” in 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS), Toronto, ON, Canada: IEEE, Apr. 2021, pp. 1–6. doi: 10.1109/IEMTRONICS52119.2021.9422487.
International Federation of the Phonographic Industry (IFPI), Global Music Report 2024: State of the Industry, London, UK, Mar. 2024. [Online]. Available: https://www.ifpi.org/wp-content/uploads/2024/04/GMR_2024_State_of_the_Industry.pdf