Student: Patrick Haidinger (2022)
Supervisor: a.Univ.-Prof. DI Dr. Wolfram Wöß
In this Bachelor's thesis, a system for an efficient storage of unstructured data and its automatic evaluation is presented. To do so, an architecture for the cloud provider AWS is designed, which processes and stores the data. The implemented algorithms on this infrastructure extract user data from the HTML dome into natural language and its semantic evaluation. By choosing the right components, an efficient and highly scalable solution was created to a) evaluate and process large amounts of data within a few minutes and b) visualize this data. Based on the experiments conducted, it was found that stochastic algorithms such as Trafilatura show the highest performance when it comes to the extraction of the user data and new Deep Learning based algorithms for the natural language processing. The use of AWS's highly scalable DynamoDB
database and AWS Lambda computational environment enables close to real-time evaluations. In the appendix of this paper, some of the evaluated data is visualized to show the performance of this deployed infrastructure. In this context, concise visualisations of the unstructured data can be seen. The overall goal is to gain an overview of a specific subject area with the help of this data and efficient algorithms.