Student: Katharina Wolf (Start: 2019)
Supervisor: a.Univ.-Prof. DI Dr. Wolfram Wöß
Co-Supervisor: DI Lisa Ehrlinger, BSc
Motivation and Challenges
Data profiling is the process of examining the data available from an existing information source (e.g., a database or a file) and collecting statistics or informative summaries about that data. Such a data profile could serve as basis for ongoing data quality measurement. Abedjan et al. (2015) provide a comprehensive classification of different data profiling tasks for relational data. At our institute, we developed a program called BlocK-DaQ (Blockchain-based Knowledge Graph for Data Quality Measurement), which allows to create a reference data profile for relational data. However, so far, there is no work on data profiling for data stored in graph DBs so far.
The aim of this thesis is to create a concept of how a data profile for a graph could look like, what information it contains, and in which format it should be ideally stored. In addition, a program should be implemented to automatically generate a reference data set for a Graph DB.