PERFORMANCE ANALYSIS  RESNET50 AND INCEPTIONV3 MODELS FOR CAPTION IMAGE GENERATOR

Hasif Priyambudi; Arifiyanto Hadinegoro

doi:10.33330/jurteksi.v9i3.2277

Hasif Priyambudi Universitas Amikom Yogyakarta
Arifiyanto Hadinegoro Universitas Amikom Yogyakarta

DOI: https://doi.org/10.33330/jurteksi.v9i3.2277

Abstract

Abstract: Generating caption image automatically is one of the challenges in computer vision. This field can be very helpful in many ways, for example search engines. Currently there are many image classification algorithms that we can use to create a caption image model. In this article, we will compare performance between the Resnet50 and InceptionV3 models for text images. We will use 2000 (1800 train & 200 validation) image data and each image has 5 example captions to train the model. After the model is successfully created, we evaluate the model using 100 images and each image has 5 examples of additional captions that are not used in the training and validation process. The result of this research is that the InceptionV3 model is better than Resnet50. BLEU-1 is 0.53, BLEU-2 is 0.35, BLEU-3 is 0.18, BLEU-4 is 0.09, and METEOR is 0.35 for InceptionV3 model. While Resnet50 model has a value of BLEU-1 is 0.51, BLEU-2 is 0.31, BLEU-3 is 0.16, BLEU-4 is 0.06, and METEOR is 0.33.

Keywords: caption image; inceptionv3; LSTM; resnet50

Abstrak: Membuat gambar teks secara otomatis adalah salah satu tantangan dalam computer vision. Bidang ini bisa sangat membantu dalam banyak hal, misalnya mesin pencari. Saat ini banyak sekali algoritma klasifikasi citra yang dapat kita gunakan untuk membuat model teks citra. Pada artikel ini, kami akan membandingkan performa antara model Resnet50 dan InceptionV3 untuk gambar teks. Kami akan menggunakan 2000 (1800 train & 200 validation) data gambar dan setiap gambar memiliki 5 contoh caption untuk melatih model. Setelah model berhasil dibuat, kami mengevaluasi model menggunakan 100 gambar dan setiap gambar memiliki 5 contoh caption tambahan yang tidak digunakan dalam proses training dan validation. Hasil dari penelitian ini adalah model InceptionV3 lebih baik dibandingkan dengan Resnet50. BLEU-1 0.53, BLEU-2 0.35, BLEU-3 0.18, BLEU-4 0.09, dan METEOR 0.35 untuk model InceptionV3. Sedangkan model Resnet50 memiliki nilai BLEU-1 0.51, BLEU-2 0.31, BLEU-3 0.16, BLEU-4 0.06, dan METEOR 0.33.

Kata kunci: caption image; inceptionv3; LSTM; resnet50

References

C. Luchini, A. Pea, and A. Scarpa, â€œArtificial intelligence in oncology: current applications and future perspectives,â€ British Journal of Cancer, vol. 126, no. 1. Springer Nature, pp. 4â€“9, Jan. 01, 2022. doi: 10.1038/s41416-021-01633-1.

A. Rahman et al., â€œOn the ICN-IoT with federated learning integration of communication: Concepts, security-privacy issues, applications, and future perspectives,â€ Future Generation Computer Systems, vol. 138, pp. 61â€“88, Jan. 2023, doi: 10.1016/j.future.2022.08.004.

G. Pinto, Z. Wang, A. Roy, T. Hong, and A. Capozzoli, â€œTransfer learning for smart buildings: A critical review of algorithms, applications, and future perspectives,â€ Advances in Applied Energy, vol. 5, p. 100084, Feb. 2022, doi: 10.1016/j.adapen.2022.100084.

M. Bhalekar and M. Bedekar, â€œD-CNN: A New model for Generating Image Captions with Text Extraction Using Deep Learning for Visually Challenged Individuals,â€ Engineering, Technology & Applied Science Research, vol. 12, no. 2, pp. 8366â€“8373, Apr. 2022, doi: 10.48084/etasr.4772.

O. Vinyals Google, A. Toshev Google, S. Bengio Google, and D. Erhan Google, â€œShow and Tell: A Neural Image Caption Generator.â€

R. Castro, I. Pineda, W. Lim, and M. E. Morocho-Cayamcela, â€œDeep Learning Approaches Based on Transformer Architectures for Image Captioning Tasks,â€ IEEE Access, vol. 10, pp. 33679â€“33694, 2022, doi: 10.1109/ACCESS.2022.3161428.

M. Stefanini, M. Cornia, L. Baraldi, S. Cascianelli, G. Fiameni, and R. Cucchiara, â€œFrom Show to Tell: A Survey on Deep Learning-Based Image Captioning,â€ IEEE Trans Pattern Anal Mach Intell, vol. 45, no. 1, pp. 539â€“559, Jan. 2023, doi: 10.1109/TPAMI.2022.3148210.

A. Fikri Aji, N. Bogoychev, K. Heafield, and R. Sennrich, â€œIn Neural Machine Translation, What Does Transfer Learning Transfer?â€ [Online]. Available: http://www.panl10n.net/english/

A. K. Sharma, A. Nandal, A. Dhaka, D. Koundal, D. C. Bogatinoska, and H. Alyami, â€œEnhanced Watershed Segmentation Algorithm-Based Modified ResNet50 Model for Brain Tumor Detection,â€ Biomed Res Int, vol. 2022, 2022, doi: 10.1155/2022/7348344.

Y. Chu, X. Yue, L. Yu, M. Sergei, and Z. Wang, â€œAutomatic Image Captioning Based on ResNet50 and LSTM with Soft Attention,â€ Wirel Commun Mob Comput, vol. 2020, 2020, doi: 10.1155/2020/8909458.

C. Wang et al., â€œPulmonary image classification based on inception-v3 transfer learning model,â€ IEEE Access, vol. 7, pp. 146533â€“146541, 2019, doi: 10.1109/ACCESS.2019.2946000.

A. Kumar Yadav, E. Joyal Nadar, K. Chaudhary, M. Pal, and A. Professor, â€œIMAGE CAPTION GENERATOR USING CNN AND RNN (LSTM).â€ [Online]. Available: www.irjmets.com

R. Staniute and D. Å eÅ¡ok, â€œA systematic literature review on image captioning,â€ Applied Sciences (Switzerland), vol. 9, no. 10. MDPI AG, May 01, 2019. doi: 10.3390/app9102024.

J. Wieting, T. Berg-Kirkpatrick, K. Gimpel, and G. Neubig, â€œBeyond BLEU: Training Neural Machine Translation with Semantic Similarity,â€ Sep. 2019, [Online]. Available: http://arxiv.org/abs/1909.06694

L. Lei and H. Wang, â€œDesign and Analysis of English Intelligent Translation System Based on Internet of Things and Big Data Model,â€ Comput Intell Neurosci, vol. 2022, pp. 1â€“9, May 2022, doi: 10.1155/2022/6788813.

Y. Fauziyah et al., â€œMESIN PENTERJEMAH BAHASA INDONESIA-BAHASA SUNDA MENGGUNAKAN RECURRENT NEURAL NETWORKS,â€ 2022. [Online]. Available: https://ejurnal.teknokrat.ac.id/index.php/teknoinfo/index

S. Amirian, K. Rasheed, T. R. Taha, and H. R. Arabnia, â€œAutomatic Image and Video Caption Generation with Deep Learning: A Concise Review and Algorithmic Overlap,â€ IEEE Access, vol. 8. Institute of Electrical and Electronics Engineers Inc., pp. 218386â€“218400, 2020. doi: 10.1109/ACCESS.2020.3042484.

D. Setiawan, M. A. Coenradina Saffachrissa, S. Tamara, and D. Suhartono, â€œINTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION journal homepage : www.joiv.org/index.php/joiv INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION Image Captioning with Style Using Generative Adversarial Networks.â€ [Online]. Available: www.joiv.org/index.php/joiv

A. Hanifa, S. A. Fauzan, M. Hikal, and M. B. Ashfiya, â€œPERBANDINGAN METODE LSTM DAN GRU (RNN) UNTUK KLASIFIKASI BERITA PALSU BERBAHASA INDONESIA COMPARISON OF LSTM AND GRU (RNN) METHODS FOR FAKE NEWS CLASSIFICATION IN INDONESIAN.â€ [Online]. Available: https://covid19.go.id/p/hoax-buster.

Y. Pan, L. Wang, S. Duan, X. Gan, and L. Hong, â€œChinese image caption of Inceptionv4 and double-layer GRUs based on attention mechanism,â€ in Journal of Physics: Conference Series, IOP Publishing Ltd, Apr. 2021. doi: 10.1088/1742-6596/1861/1/012044.