Maximum Entropy COICOP Classification using Entity Forest
Master thesis
Permanent lenke
https://hdl.handle.net/11250/3085703Utgivelsesdato
2023Metadata
Vis full innførselSamlinger
- Master's theses (KBM) [888]
Sammendrag
This thesis proposes a generative approach to COICOP classification using entity resolution and maximum entropy classification as a formal framework. The current limitations in COICOP classification are related to the corpus of item descriptions and lack of data. I propose a new perspective on the classification task at hand, as I argue that the underlying problem in classification is the data itself. Therefore, corpus and feature engineering are crucial when improving classification. The proposed approach aims to engineer the corpus to construct an entity forest from the item descriptions, where terms in the description are mapped to the roots and branches of trees in the entity forest. The results of the proposed approach are illustrated by a proof-of-concept with data from Statistics Norway. This thesis provides insight into the problems with previous approaches to COICOP classification and shows how we potentially can achieve true resolution and more accurate classification.