CP's team (Khaled Koutini, Jan Schlüter, Hamid Eghbal-zadeh, Gerhard Widmer) scores top places in different tasks of HEAR 2021 NeurIPS Challenge, Holistic Evaluation of Audio Representations.
The aim of the challenge is to push machine listening models to be as holistic as the human ear, i.e., develop a model that performs well across a variety of everyday domains. Models are meant to produce a general-purpose audio representation as a strong basis for audio classification and sequence labeling. Representations are evaluated using a benchmark suite across a variety of domains, including speech, environmental sound, and music. Results for all tasks are published at https://neuralaudio.ai/hear2021-leaderboard.html, opens an external URL in a new window.
CP's team used their latest state-of-the-art transformer model, the details of which are published in a preprint (https://arxiv.org/abs/2110.05069, opens an external URL in a new window). The source code for training such a model, and pre-trained models are available at https://github.com/kkoutini/PaSST, opens an external URL in a new window.