Vis enkel innførsel

dc.contributor.advisorStefan Schrunner
dc.contributor.advisorPål Halvorsen
dc.contributor.advisorSteven Hicks
dc.contributor.authorHelland, Eirik Duesund
dc.date.accessioned2023-07-26T16:27:15Z
dc.date.available2023-07-26T16:27:15Z
dc.date.issued2023
dc.identifierno.nmbu:wiseflow:6839521:54591693
dc.identifier.urihttps://hdl.handle.net/11250/3081492
dc.description.abstractIn lower-resource language settings, domain-specific tasks such as paragraph classification of football articles present significant challenges. Traditional machine learning models face difficulties in effectively capturing the linguistic complexities inherent in the paragraphs, emphasizing the need for more advanced approaches to overcome these obstacles. This thesis investigates the potential of Norwegian pre-trained BERT (Bidirectional Encoder Representations from Transformers) models for paragraph classification tasks in the context of Norwegian football articles, a domain requiring a nuanced understanding of the Norwegian language. BERT is a powerful model architecture for language-specific processing tasks, which learns from the context of words in a sentence in both directions. Specifically, this thesis compares the performance of Transformer-based BERT models with traditional machine learning models in multi-class and multi-label classification tasks. An existing dataset of about 5,500 football article paragraphs is utilized to evaluate multi-class classification results. In addition, a newly annotated multi-label dataset of just over 2,000 samples is introduced for the multi-label classification assessment. The results reveal promising performance for the Norwegian pre-trained BERT models in both classification tasks, achieving an accuracy of ∼ 0.88 and a weighted-average F1-score of ∼ 0.87 in the multi-class classification task and accuracy of ∼ 0.40 and a weighted-average F1-score of ∼ 0.58 in the multi-label classification task, significantly outperforming the results of the traditional machine learning models. This study highlights the effectiveness of Transformer-based models in lower-resource language settings. It emphasizes the need for continued research and development in Natural Language Processing for underrepresented languages.
dc.description.abstract
dc.languageeng
dc.publisherNorwegian University of Life Sciences
dc.titleTackling Lower-Resource Language Challenges: A Comparative Study of Norwegian Pre-Trained BERT Models and Traditional Approaches for Football Article Paragraph Classification
dc.typeMaster thesis


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel