In this work, different Long Short-Term Memory (LSTM) encoder-decoder artificial neural networks are investigated. These networks differ in their complexity. The aim of this work is to evaluate ...
Captioning an image involves using a combination of vision and language models to describe the image in an expressive and concise sentence. Successful captioning task requires extracting as much ...