Data Catalogs: A solid Basis for Semantic Integration

Student: Johannes Schrott     (2022)

Supervisor: a.Univ.-Prof. DI Dr. Wolfram Wöß


Data catalogs are aimed to provide an overview on the data available in an organization as they manage the metadata of data sources and annotate them with further (business) context. Initially, background knowledge on data catalogs is provided through presenting definitions and descriptions about them. As there is no detailed description on how data catalogs are realized from a conceptual perspective, a systematic literature review is presented. Its main objectives are 1) the identification of a data catalogs conceptual components (metadata management, business context, data responsibility roles and the FAIR principles) and 2) finding implementation guidelines for data catalogs. Using the components from the systematic literature review, a selection of data catalog tools is analyzed. The majority of these tools does not use standardized, semantic technologies for representing the business context component. Consequently, we developed GOLDCASE, which is an ontology layer to be placed on an existing data catalog tool. It connects the existing business context of the data catalog tool with a more expressive business ontology. As a result, a preliminary stage of semantic integration is reached. Through the querying of GOLDCASE, which is done using classes of the business ontology, the data sources holding the corresponding data are determined. A proof-of-concept implementation of GOLDCASE is realized at an industry partner and evaluated against the components from the literature review.