Go to JKU Homepage
Institute of Computational Perception
What's that?

Institutes, schools, other departments, and programs create their own web content and menus.

To help you better navigate the site, see here where you are at the moment.

Khaled Koutini

Image showing Khaled Koutini

My PhD project

Learning general-purpose audio representations with deep neural networks

Audio classification and tagging are central tasks in the field of machine listening. They are essential for machines to recognize the environment and identify events in their surroundings, and thus a critical component of machine perception. These tasks are relevant in a wide range of applications, including content-based multimedia 
information retrieval, context-aware smart devices, and monitoring systems. One significant barrier to machine audio recognition is the high cost and scarcity of high-quality labeled data, particularly for powerful learners such as deep neural networks. We investigate the incorporation of inductive biases into neural network architectures and the training process in order to improve generalization when training on small audio datasets. We also investigate how to extract representations from models trained on large-scale, general-purpose datasets, to be transferred to specialized tasks.


Supervisor: Gerhard Widmer, JKU Linz, Austria

Dates: 01 Jan 2020 – Ongoing