Data Quality Measurement: Readability Dimension
Supervisor: a.Univ.-Prof. DI Dr. Wolfram Wöß
Co-Supervisor: DI Lisa Ehrlinger, BSc
Motivation and Challenges
Data is central to decision-making in enterprises and organizations (e.g., smart factories and predictive maintenance) as well as in private life (e.g., booking platforms). Especially in artificial intelligence applications, like self-driving cars, trust in data-driven decisions depends directly on the quality of the underlying data. Therefore, it is essential to know the quality of the data in order to assess the trustworthiness of the derived decisions.
A Java-based tool (QuaIIe) has been developed at our institute that analyzes different information sources and calculates metrics to estimate an information system's data and schema quality. Currently, it is possible to calculate metrics for the quality dimensions accuracy, correctness, completeness, pertinence, timeliness, minimality, and normalization. However, an investigation of the readability dimension, on both, schema- and data-level is missing.
Objective
The main objective of this master's thesis is to evaluate current DQ approaches for assessing the readability dimension in terms of its definition and possible metrics. Based on existing work, an approach should be developed how readability could be actually measured (e.g., including intelligent string-matching or dictionary-based approaches using tools like WordNet). The developed metric should be implemented and evaluated in the framework of our existing DQ tool QuaIIe.