Performance Evaluation on the Effect of Different Text Representation Models on the Image Captioning Systems

Authors

  • Jafar Alkheir
  • Samer Sulaiman
  • Rasha Mualla

Keywords:

Deep Learning, Natural Language Processing, Image Representation, Text Representation, FastText Model, GloVe Model.

Abstract

This research deals with one of the most important and recent topics in the field of machine learning in general and deep learning in particular, which is image Captioning systems. In this research, an image-captioning system is built based on the ResNet50 model, which is a deep learning network based on convolution neural networks CNN, through which the features of the image representation are obtained. As for the textual representation, five different models are proposed, based mainly on the GloVe and FastText models provided by Twitter and Facebook, respectively. The effect of different vocabulary dictionaries on the performance of the proposed system is studied. A global MS-COCO dataset is used, from which a subset of 10,000 images is token, 9,000 images from them are chosen for the Training and validation group. While the testing process includes 1000 images varying from the training-set. This test-set is applied to the five designed models.         To find out the precision of the results used by the five proposed systems as well as how well they match between the original description sentences and the resulting description ones, performance measures are used such Accuracy, Average of Depth Similarity, Top-1, Top-5 and BLEU. The results show the superiority of systems based on FastText models although they take longer time than GloVe models.    

Published

2020-10-01

How to Cite

1.
Alkheir J, Sulaiman S, Mualla R. Performance Evaluation on the Effect of Different Text Representation Models on the Image Captioning Systems. Engineering Sciences Series [Internet]. 2020Oct.1 [cited 2021Jan.16];42(4). Available from: http://journal.tishreen.edu.sy/index.php/engscnc/article/view/9915