Measuring the Quality of Knowledge Graphs
Supervisor: a.Univ.-Prof. DI Dr. Wolfram Wöß
Co-Supervisor: DI Lisa Ehrlinger, BSc
Motivation and Challenges
The term "knowledge graph" is influenced by the introduction of Google's Knowledge Graph in 2012 and is now used to describe large open-source graphs like DBpedia or Wikidata as well as smaller “corporate knowledge graphs” in companies. Most of those graphs are based on W3C’s Semantic Web Standards (such as RDF, RDF-Schema, …) and store data in human-readable textual form. The aim is to allow an intuitive retrieval of data that is supported by contextual information (e.g., does an entity "Buffalo" refer to the animal or the Japanese company?).
Since open-source knowledge graphs have often poor data quality (i.e., a lot of missing or wrong facts), quality assessment in KGs has received increasing attention over the last few years. In addition to the completeness and accuracy of the data in a KG, the readability of the textual information is an important aspect. A KG should be of good readability to represent the modeled domain in a natural and clear way and to be self-explanatory to the user. Good readability also supports semantic expressiveness and enables automatic mappings to other KGs.
The objective of this master’s thesis is to evaluate current attempts that measure the quality of knowledge graphs, with a special focus on readability. In addition, a concept should be developed how readability of the textual information in knowledge graphs could be actually measured, for example, by using string-matching techniques. The concept should be implemented with a programming language of choice (e.g., Java, Python …) and the readability measure should be demonstrated by means of a popular KG (e.g., DBpedia).