Enhancing Food Composition Databases: Predicting Missing Values via Knowledge Graph Embeddings
M. Možina, S. Žitnik, B. Koroušić Seljak, T. Eftimov
The 2023 KDD Undergraduate Consortium (KDD-UC), 2023 SIKDD
Long Beach, CA, USA, 6 - 10 August, 2023
Food composition databases (FCDBs) have presented an integral part of food and nutritional research, dietary assessment, and related (e.g., health, environmental) fields. However, as with other scientific disciplines, the domain of nutrition and food composition is no exception to the problem of missing data. This can significantly reduce the accuracy and reliability of analyses based on food composition, as it introduces an element of ambiguity and can, therefore, limit their usage. To address this issue, researchers have explored various methods for imputing missing data. The easiest and most common approach to this problem is to calculate the mean or median from available data in the same FCDB or to borrow values from other FCDBs. However, such simple methods may produce notable errors. In this paper, we investigate the use of knowledge graph embedding models for borrowing and imputing missing values in FCDB. We used the ComplEx model from the Ampligraph library and the results are very promising. By embedding the nodes in a low-dimensional space, the model can capture the underlying structure and relationships in the data, providing accurate imputations even when there are missing values. Ultimately, the use of the proposed technique could lead to more accurate and reliable analyses in the field of nutritional research and dietary monitoring.
BIBTEX copied to Clipboard