EFFICIENTNET MODEL FOR BONE AGE PREDICTION
Abstract
Abstract: Accurate bone age estimation is essential for monitoring pediatric growth, diagnosing endocrine disorders, and supporting clinical decision-making. Although deep learning has improved prediction accuracy, limited studies have systematically examined how increasing model depth affects performance and reliability. This study evaluates the effectiveness of progressively deeper convolutional neural networks, specifically EfficientNet variants B0 to B5, for bone age estimation from hand radiographs. Experiments were conducted using 12,611 hand X-ray images from the RSNA Pediatric Bone Age Challenge dataset on Kaggle. To ensure fair comparison, all models were trained using a unified and consistent training pipeline. Model performance was evaluated using Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Concordance Correlation Coefficient (CCC), and Pearson correlation coefficient. The results show a consistent improvement in prediction accuracy as model depth increases. Among the evaluated models, EfficientNet-B5 achieved the best performance, with an MAE of 21.5 months, MAPE of 6.23%, CCC of 0.9148, and Pearson’s r of 0.9203. These findings confirm that model scaling plays a critical role in enhancing prediction robustness and clinical reliability. Future work should emphasize external validation across diverse populations and incorporate interpretability techniques, such as Grad-CAM, to improve clinical transparency and trust.
Keywords: bone age prediction; deep learning; model evaluation; clinical validation
Abstrak: Estimasi usia tulang yang akurat sangat penting untuk memantau pertumbuhan anak, mendiagnosis gangguan endokrin, dan mendukung pengambilan keputusan klinis. Meskipun pembelajaran mendalam telah meningkatkan akurasi prediksi, studi yang secara sistematis meneliti bagaimana peningkatan kedalaman model memengaruhi kinerja dan keandalan masih terbatas. Studi ini mengevaluasi efektivitas jaringan saraf konvolusional yang semakin dalam, khususnya varian EfficientNet B0 hingga B5, untuk estimasi usia tulang dari radiografi tangan. Eksperimen dilakukan menggunakan 12.611 gambar sinar-X tangan dari dataset RSNA Pediatric Bone Age Challenge di Kaggle. Untuk memastikan perbandingan yang adil, semua model dilatih menggunakan alur pelatihan yang terpadu dan konsisten. Kinerja model dievaluasi menggunakan Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Concordance Correlation Coefficient (CCC), dan koefisien korelasi Pearson. Hasil menunjukkan peningkatan yang konsisten dalam akurasi prediksi seiring dengan peningkatan kedalaman model. Di antara model yang dievaluasi, EfficientNet-B5 mencapai kinerja terbaik, dengan MAE sebesar 21,5 bulan, MAPE sebesar 6,23%, CCC sebesar 0,9148, dan Pearson’s r sebesar 0,9203. Temuan ini menegaskan bahwa penskalaan model memainkan peran penting dalam meningkatkan optimasi prediksi dan keandalan klinis. Penelitian selanjutnya dapat menekankan validasi eksternal di berbagai populasi dan menggabungkan teknik interpretasi, seperti Grad-CAM, untuk meningkatkan transparansi dan kepercayaan klinis.
Kata kunci: prediksi usia tulang; deep learning; evaluasi model; validasi klinis
References
[2] W. Yuan, P. Fan, L. Zhang, W. Pan, and L. Zhang, “Bone Age Assessment Using Various Medical Imaging Techniques Enhanced by Artificial Intelligence,” 2025. doi: 10.3390/diagnostics15030257.
[3] M. A. H. Rony et al., “Artificial Intelligence-Driven Advancements in Otitis Media Diagnosis: A Systematic Review,” IEEE Access, vol. 12, pp. 99282–99307, 2024, doi: 10.1109/ACCESS.2024.3428700.
[4] A. Z. S. Bin Habib, M. E. Islam, M. A. Bin Syed, M. Y. Ahamed, and T. Tasnim, “Pediatric Bone Age Prediction Using Deep Learning,” in 2023 26th International Conference on Computer and Information Technology (ICCIT), 2023, pp. 1–6. doi: 10.1109/ICCIT60459.2023.10441258.
[5] A. S. Bayangkari Karno et al., “Classification of cervical spine fractures using 8 variants EfficientNet with transfer learning,” Int. J. Electr. Comput. Eng. (IJECE); Vol 13, No 6 December 2023DO - 10.11591/ijece.v13i6.pp7065-7077 , Dec. 2023, [Online]. Available: https://ijece.iaescore.com/index.php/IJECE/article/view/30669/17032
[6] P. Bailly, R. Bouzerar, R. Galan, and M.-E. Meyer, “Phantom study of an in-house amplitude-gating respiratory method with silicon photomultiplier technology positron emission tomography/computed tomography,” Comput. Methods Programs Biomed., vol. 221, p. 106907, 2022, doi: https://doi.org/10.1016/j.cmpb.2022.106907.
[7] G. P. Kanna, J. Kumar, P. Parthasarathi, P. Bhardwaj, and Y. Kumar, “Optimized deep transfer learning techniques for spine fracture detection using CT scan images,” Multimed. Tools Appl., vol. 84, no. 30, pp. 37133–37166, 2025, doi: 10.1007/s11042-025-20629-0.
[8] P. Barua et al., “Bone Abnormality Detection Using Deep Learning Models BT - Data Mining and Information Security,” A. Bhattacharya, S. Dutta, M. A. Razzak, and D. Samanta, Eds., Singapore: Springer Nature Singapore, 2026, pp. 493–508.
[9] K Scott Mader, “RSNA Bone Age,” kaggle.com. [Online]. Available: https://www.kaggle.com/datasets/kmader/rsna-bone-age
[10] A. Langenbucher, N. Szentmáry, J. Wendelstein, A. Cayless, P. Hoffmann, and D. Gatinel, “Performance Evaluation of a Simple Strategy to Optimize Formula Constants for Zero Mean or Minimal Standard Deviation or Root-Mean-Squared Prediction Error in Intraocular Lens Power Calculation,” Am. J. Ophthalmol., vol. 269, pp. 282–292, 2025, doi: https://doi.org/10.1016/j.ajo.2024.08.043.
[11] M. H. Maruo, S. J. M. Almeida, and J. C. M. Bermudez, “On the variance of the LMS algorithm squared-error sample curve,” Signal Processing, vol. 238, p. 110168, 2026, doi: https://doi.org/10.1016/j.sigpro.2025.110168.
[12] S. Reiter and S. W. R. Werner, “Interpolatory model reduction of dynamical systems with root mean squared error,” IFAC-PapersOnLine, vol. 59, no. 1, pp. 385–390, 2025, doi: https://doi.org/10.1016/j.ifacol.2025.03.066.
[13] Y. Yang, Z. Shao, K. Wu, N. Zhao, and Y. Wang, “Machine learning approaches for predicting rock mode I fracture toughness: Insights from ISRM suggested CCNBD and SCB tests,” Eng. Fract. Mech., vol. 318, p. 110949, 2025, doi: https://doi.org/10.1016/j.engfracmech.2025.110949.
[14] R. N. Muniz et al., “Time series forecasting based on multi-criteria optimization for model and filter selection applied to hydroelectric power plants,” Energy, vol. 337, p. 138688, 2025, doi: https://doi.org/10.1016/j.energy.2025.138688.
[15] J. Zhong and J. Zhang, “Concordance correlation coefficient under multiplicative distortion measurement errors,” J. Stat. Comput. Simul., vol. 95, no. 4, pp. 697–725, Mar. 2025, doi: 10.1080/00949655.2024.2438160.
[16] S. Tserkis, S. M. Assad, P. K. Lam, and P. Narang, “Quantifying total correlations in quantum systems through the Pearson correlation coefficient,” Phys. Lett. A, vol. 543, p. 130432, 2025, doi: https://doi.org/10.1016/j.physleta.2025.130432.








