PERFORMANCE ANALYSIS RESNET50 AND INCEPTIONV3 MODELS FOR CAPTION IMAGE GENERATOR

Hasif Priyambudi, Arifiyanto Hadinegoro

Abstract


Abstract: Generating caption image automatically is one of the challenges in computer vision. This field can be very helpful in many ways, for example search engines. Currently there are many image classification algorithms that we can use to create a caption image model. In this article, we will compare performance between the Resnet50 and InceptionV3 models for text images. We will use 2000 (1800 train & 200 validation) image data and each image has 5 example captions to train the model. After the model is successfully created, we evaluate the model using 100 images and each image has 5 examples of additional captions that are not used in the training and validation process. The result of this research is that the InceptionV3 model is better than Resnet50. BLEU-1 is 0.53, BLEU-2 is 0.35, BLEU-3 is 0.18, BLEU-4 is 0.09, and METEOR is 0.35 for InceptionV3 model. While Resnet50 model has a value of BLEU-1 is 0.51, BLEU-2 is 0.31, BLEU-3 is 0.16, BLEU-4 is 0.06, and METEOR is 0.33.

           
Keywords: caption image; inceptionv3; LSTM; resnet50

 

 

Abstrak: Membuat gambar teks secara otomatis adalah salah satu tantangan dalam computer vision. Bidang ini bisa sangat membantu dalam banyak hal, misalnya mesin pencari. Saat ini banyak sekali algoritma klasifikasi citra yang dapat kita gunakan untuk membuat model teks citra. Pada artikel ini, kami akan membandingkan performa antara model Resnet50 dan InceptionV3 untuk gambar teks. Kami akan menggunakan 2000 (1800 train & 200 validation) data gambar dan setiap gambar memiliki 5 contoh caption untuk melatih model. Setelah model berhasil dibuat, kami mengevaluasi model menggunakan 100 gambar dan setiap gambar memiliki 5 contoh caption tambahan yang tidak digunakan dalam proses training dan validation. Hasil dari penelitian ini adalah model InceptionV3 lebih baik dibandingkan dengan Resnet50. BLEU-1 0.53, BLEU-2 0.35, BLEU-3 0.18, BLEU-4 0.09, dan METEOR 0.35 untuk model InceptionV3. Sedangkan model Resnet50 memiliki nilai BLEU-1 0.51, BLEU-2 0.31, BLEU-3 0.16, BLEU-4 0.06, dan METEOR 0.33.

 

Kata kunci: caption image; inceptionv3; LSTM; resnet50


Full Text:

PDF

References


C. Luchini, A. Pea, and A. Scarpa, “Artificial intelligence in oncology: current applications and future perspectives,” British Journal of Cancer, vol. 126, no. 1. Springer Nature, pp. 4–9, Jan. 01, 2022. doi: 10.1038/s41416-021-01633-1.

A. Rahman et al., “On the ICN-IoT with federated learning integration of communication: Concepts, security-privacy issues, applications, and future perspectives,” Future Generation Computer Systems, vol. 138, pp. 61–88, Jan. 2023, doi: 10.1016/j.future.2022.08.004.

G. Pinto, Z. Wang, A. Roy, T. Hong, and A. Capozzoli, “Transfer learning for smart buildings: A critical review of algorithms, applications, and future perspectives,” Advances in Applied Energy, vol. 5, p. 100084, Feb. 2022, doi: 10.1016/j.adapen.2022.100084.

M. Bhalekar and M. Bedekar, “D-CNN: A New model for Generating Image Captions with Text Extraction Using Deep Learning for Visually Challenged Individuals,” Engineering, Technology & Applied Science Research, vol. 12, no. 2, pp. 8366–8373, Apr. 2022, doi: 10.48084/etasr.4772.

O. Vinyals Google, A. Toshev Google, S. Bengio Google, and D. Erhan Google, “Show and Tell: A Neural Image Caption Generator.”

R. Castro, I. Pineda, W. Lim, and M. E. Morocho-Cayamcela, “Deep Learning Approaches Based on Transformer Architectures for Image Captioning Tasks,” IEEE Access, vol. 10, pp. 33679–33694, 2022, doi: 10.1109/ACCESS.2022.3161428.

M. Stefanini, M. Cornia, L. Baraldi, S. Cascianelli, G. Fiameni, and R. Cucchiara, “From Show to Tell: A Survey on Deep Learning-Based Image Captioning,” IEEE Trans Pattern Anal Mach Intell, vol. 45, no. 1, pp. 539–559, Jan. 2023, doi: 10.1109/TPAMI.2022.3148210.

A. Fikri Aji, N. Bogoychev, K. Heafield, and R. Sennrich, “In Neural Machine Translation, What Does Transfer Learning Transfer?” [Online]. Available: http://www.panl10n.net/english/

A. K. Sharma, A. Nandal, A. Dhaka, D. Koundal, D. C. Bogatinoska, and H. Alyami, “Enhanced Watershed Segmentation Algorithm-Based Modified ResNet50 Model for Brain Tumor Detection,” Biomed Res Int, vol. 2022, 2022, doi: 10.1155/2022/7348344.

Y. Chu, X. Yue, L. Yu, M. Sergei, and Z. Wang, “Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention,” Wirel Commun Mob Comput, vol. 2020, 2020, doi: 10.1155/2020/8909458.

C. Wang et al., “Pulmonary image classification based on inception-v3 transfer learning model,” IEEE Access, vol. 7, pp. 146533–146541, 2019, doi: 10.1109/ACCESS.2019.2946000.

A. Kumar Yadav, E. Joyal Nadar, K. Chaudhary, M. Pal, and A. Professor, “IMAGE CAPTION GENERATOR USING CNN AND RNN (LSTM).” [Online]. Available: www.irjmets.com

R. Staniute and D. Šešok, “A systematic literature review on image captioning,” Applied Sciences (Switzerland), vol. 9, no. 10. MDPI AG, May 01, 2019. doi: 10.3390/app9102024.

J. Wieting, T. Berg-Kirkpatrick, K. Gimpel, and G. Neubig, “Beyond BLEU: Training Neural Machine Translation with Semantic Similarity,” Sep. 2019, [Online]. Available: http://arxiv.org/abs/1909.06694

L. Lei and H. Wang, “Design and Analysis of English Intelligent Translation System Based on Internet of Things and Big Data Model,” Comput Intell Neurosci, vol. 2022, pp. 1–9, May 2022, doi: 10.1155/2022/6788813.

Y. Fauziyah et al., “MESIN PENTERJEMAH BAHASA INDONESIA-BAHASA SUNDA MENGGUNAKAN RECURRENT NEURAL NETWORKS,” 2022. [Online]. Available: https://ejurnal.teknokrat.ac.id/index.php/teknoinfo/index

S. Amirian, K. Rasheed, T. R. Taha, and H. R. Arabnia, “Automatic Image and Video Caption Generation with Deep Learning: A Concise Review and Algorithmic Overlap,” IEEE Access, vol. 8. Institute of Electrical and Electronics Engineers Inc., pp. 218386–218400, 2020. doi: 10.1109/ACCESS.2020.3042484.

D. Setiawan, M. A. Coenradina Saffachrissa, S. Tamara, and D. Suhartono, “INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION journal homepage : www.joiv.org/index.php/joiv INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION Image Captioning with Style Using Generative Adversarial Networks.” [Online]. Available: www.joiv.org/index.php/joiv

A. Hanifa, S. A. Fauzan, M. Hikal, and M. B. Ashfiya, “PERBANDINGAN METODE LSTM DAN GRU (RNN) UNTUK KLASIFIKASI BERITA PALSU BERBAHASA INDONESIA COMPARISON OF LSTM AND GRU (RNN) METHODS FOR FAKE NEWS CLASSIFICATION IN INDONESIAN.” [Online]. Available: https://covid19.go.id/p/hoax-buster.

Y. Pan, L. Wang, S. Duan, X. Gan, and L. Hong, “Chinese image caption of Inceptionv4 and double-layer GRUs based on attention mechanism,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Apr. 2021. doi: 10.1088/1742-6596/1861/1/012044.




DOI: https://doi.org/10.33330/jurteksi.v9i3.2277

Article Metrics

Abstract view : 454 times
PDF - 240 times

Refbacks

  • There are currently no refbacks.


Lembaga Penelitian dan Pengabdian Kepada Masyarakat (LPPM) STMIK ROYAL 

Copyright © LPPM STMIK ROYAL

 

Lisensi Creative Commons
Ciptaan disebarluaskan di bawah Lisensi Creative Commons Atribusi-BerbagiSerupa 4.0 Internasional.