Show simple item record

dc.contributor.advisorFadi Al Machot
dc.contributor.advisorHabib Ullah
dc.contributor.authorSrivastava, Sushant Kumar
dc.date.accessioned2024-08-23T16:28:46Z
dc.date.available2024-08-23T16:28:46Z
dc.date.issued2024
dc.identifierno.nmbu:wiseflow:7110333:59110544
dc.identifier.urihttps://hdl.handle.net/11250/3147981
dc.description.abstractIn the dynamic arena of automated image captioning, significant resources, including energy and manpower, are required to train state-of-the-art models. These models, though effective, necessitate frequent and costly retraining to maintain or enhance their performance. Our Motivation in this thesis has been to explore alternative methods that improve caption accuracy, addressing the unsustainable need for constant retraining. This study assesses the performance of existing state-of-the-art models like BLIP, and GPT-2 on two key datasets: COCO and FLICKR. It evaluates their effectiveness in generating captions and their potential biases across different image types, using metrics such as BLEU, METEOR, and ROUGE. Our primary goal in this thesis was to develop innovative approaches that produce captions more akin to human-generated text, aiming to surpass existing models in quality and efficiency without the need for retraining. We introduced a technique called ‘Weighted Summarization,’ combining artificial neural networks with strategic refinements to leverage the strengths of pre-trained models and set a new benchmark in automated image captioning. Our approach achieved scores on the COCO dataset (BLEU: 0.322, METEOR:0.328, ROUGE-1 f: 0.452, ROUGE-2 f: 0.187, ROUGE-L f: 0.415) and on the FLICKR dataset (BLEU: 0.181, METEOR: 0.300, ROUGE-1 f: 0.348, ROUGE-2 f: 0.107, ROUGE-L f: 0.311), demonstrating enhanced performance over existing models and improved caption quality.
dc.description.abstractIn the dynamic arena of automated image captioning, significant resources, including energy and manpower, are required to train state-of-the-art models. These models, though effective, necessitate frequent and costly retraining to maintain or enhance their performance. Our Motivation in this thesis has been to explore alternative methods that improve caption accuracy, addressing the unsustainable need for constant retraining. This study assesses the performance of existing state-of-the-art models like BLIP, and GPT-2 on two key datasets: COCO and FLICKR. It evaluates their effectiveness in generating captions and their potential biases across different image types, using metrics such as BLEU, METEOR, and ROUGE. Our primary goal in this thesis was to develop innovative approaches that produce captions more akin to human-generated text, aiming to surpass existing models in quality and efficiency without the need for retraining. We introduced a technique called ‘Weighted Summarization,’ combining artificial neural networks with strategic refinements to leverage the strengths of pre-trained models and set a new benchmark in automated image captioning. Our approach achieved scores on the COCO dataset (BLEU: 0.322, METEOR:0.328, ROUGE-1 f: 0.452, ROUGE-2 f: 0.187, ROUGE-L f: 0.415) and on the FLICKR dataset (BLEU: 0.181, METEOR: 0.300, ROUGE-1 f: 0.348, ROUGE-2 f: 0.107, ROUGE-L f: 0.311), demonstrating enhanced performance over existing models and improved caption quality.
dc.languageeng
dc.publisherNorwegian University of Life Sciences
dc.titleSemantic Enhancements in Image Captioning: Leveraging Neural Networks to Improve BLIP and GPT-2
dc.typeMaster thesis


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record