Vis enkel innførsel

dc.contributor.advisorTomic, Oliver
dc.contributor.advisorLiland, Kristian Hovde
dc.contributor.advisorBerget, Ingunn
dc.contributor.authorOlavsrud, Marius Aleksander
dc.date.accessioned2021-01-06T09:12:36Z
dc.date.available2021-01-06T09:12:36Z
dc.date.issued2020
dc.identifier.urihttps://hdl.handle.net/11250/2721646
dc.description.abstractThe purpose of this thesis is to examine how topic modeling can be used as a tool to explore large sets of text data. This thesis is written on assignment from Nofima Food Research Institute. A set of about 52 000 unknown texts of various lengths were downloaded using an external web-harvesting company (Webhose.io). The texts are collected with a specific search query consisting of food related vegetarian and vegan based keywords as this is a field of interest with Nofima. Latent Dirichlet Allocation, known as LDA, is used to create and model these topics. LDA is a method that allows unobserved groups of similar data to be explained by a group of words known as a topic. The collected texts are split into smaller subsections based on the type and lengths before being preprocessed for non-relevant information. A subset of medium length texts are used for the modeling. Further, the data is analysed with LDA, us- ing coherence score as a metric to determine the optimal number of topics. The results are visualised using pyLDAvis. Lastly, a small subset of the same texts are manually read by a group of employees at Nofima to validate the quality of the results in order to get a better understanding of the type of data that is anal- ysed. The study discovered that topic modeling can be used to explore a large set of data and get some meaningful insight of parts of the content. Several topics were found to include vegetarian and vegan related words. Some of these words were found to have a high probability of existence within the topic in question. The process revealed numerous concerns which needed to be addressed. Some examples were many non-related documents, large amounts of words that were not related to a given topic, deciding upon the optimal number of topics as well as visualisation of the topics.en_US
dc.language.isoengen_US
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/deed.no*
dc.subjectTopic modelingen_US
dc.subjectLDAen_US
dc.titleNatural Language Processing and Topic Modeling for Exploring the Vegetarian and Vegan Trendsen_US
dc.typeMaster thesisen_US
dc.description.versionsubmittedVersionen_US
dc.source.pagenumber90en_US
dc.description.localcodeM-DVen_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal