ValMIR

On Valid and Reliable Experiments in Music Information Retrieval

Project Summary

Every experimental science is based on the notion of valid and reliable experiments, i.e. experiments that really measure what one wants to examine and experiments which yield repeatable results. Music Information Retrieval (MIR), as the interdisciplinary science of retrieving information from music, conducts experiments with a multitude of methods from machine learning, statistics, signal processing, artificial intelligence, etc. It relies on the proper evaluation of all these methods to measure the success of new algorithms, or, in more general terms, chart the progress of the whole field of MIR. The principal role of computer experiments and their statistical evaluation within MIR is now widely accepted and understood, but the more fundamental notions of validity and reliability in MIR experiments are still rarely discussed within the field.

This lack of awareness for valid and reliable MIR experimentation is at the heart of a number of seemingly puzzling phenomena in recent MIR research. Marginally and imperceptibly altered data, so-called adversarial examples, are able to drastically reduce performance of state of the art MIR systems. It has even been claimed that such easily fooled MIR systems therefore do not use musical knowledge at all. Other authors have pointed out that, due to a lack of inter-rater agreement when annotating ground truth data, performance in many MIR tasks can never exceed a certain glass ceiling, since it is not meaningful for an algorithm to model specific raters. A problem of algorithmic bias are difficulties of learning in high dimensional spaces, where some data objects act as `hubs', being abnormally close to many other data objects thereby causing disturbances in music recommendation, since hub songs are being recommended over and over again.

Although a small but growing body of work and literature concerning these MIR problems exists, what is still lacking is an understanding of their true nature: they are problems of validity and reliability in MIR experimentation. Since a failure to comprehend this fundamental issue at the heart of MIR is severely impeding progress in the field, our main goals in this project are: (i) to provide a framework for valid and reliable experimentation in MIR; (ii) to advance the state of the art concerning adversarial examples, inter-rater agreement and algorithmic bias by conducting exemplary valid and reliable MIR experiments.

The main focus of this project is on MIR where the above mentioned phenomena are especially apparent, but the very same problems of course have ramifications in general machine learning also, making sure that our research has the potential to advance the progress in MIR and far beyond.

Project Details

Funding Type

Austrian National Science Foundation (FWF), opens an external URL in a new window

Project Number

P31988

Principal Investigator

Arthur Flexer, opens an external URL in a new window

Project Period

May 2019 - April 2023

Funding Amount

€ 347.476,50

People and Cooperations

Katharina Hoedt

Arthur Flexer, opens an external URL in a new window

Results

Feldbauer R., Rattei T., Flexer A.: scikit-hubness: Hubness Reduction and Approximate Neighbor Search, Journal of Open Source Software, 5(45), 1957, 2020. DOI: https://doi.org/10.21105/joss.01957, opens an external URL in a new window

Flexer A., Lallai T., Rašl K.: On Evaluation of Inter- and Intra-Rater Agreement in Music Recommendation, Transactions of the International Society for Music Information Retrieval, 4(1), pp.182–194, 2021. DOI: http://doi.org/10.5334/tismir.107, opens an external URL in a new window

Flexer A., Lallai T.: Can We Increase Inter- and Intra-Rater Agreement in Modeling General Music Similarity?, in Proceedings of 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, 2019. also available as: OFAI-TR-2019-01, opens an external URL in a new window.

Foscarin F., Hoedt K., Praher V., Flexer A., Widmer G.: Concept-Based Techniques for "Musicologist-friendly" Explanations in a Deep Music Classifier, opens an external URL in a new window, in Proceedings of the 23rd International Society for Music Information Retrieval Conference, 2022.

Hoedt K., Flexer A., Widmer G.: Defending a Music Recommender Against Hubness-Based Adversarial Attacks, opens an external URL in a new window, in Proceedings of the 19th Sound and Music Computing Conference, 2022.

Hoedt K., Praher V., Flexer A., Widmer G.: Constructing adversarial examples to investigate the plausibility of explanations in deep audio and image classifiers, Neural Computing and Applications, 35, 10011-10029, 2023. DOI: https://doi.org/10.1007/s00521-022-07918-7, opens an external URL in a new window

Paischer F., Prinz K., Widmer G.: Audio Tagging With Convolutional Neural Networks Trained With Noisy Data, opens an external URL in a new window, Technical Report, DCASE2019 Challenge, 2019.

Praher V., Prinz K., Flexer A., Widmer G.: On the Veracity of Local, Model-agnostic Explanations in Audio Classification: Targeted Investigations with Adversarial Examples, opens an external URL in a new window, in Proceedings of the 22nd International Society for Music Information Retrieval Conference (ISMIR'21), 2021.

Prinz K., Flexer A.: End-to-End Adversarial White Box Attacks on Music Instrument Classification, arXiv:2007.14714, opens an external URL in a new window [eess.AS], 2020.

Prinz K., Flexer A.: Weak Multi-Label Audio-Tagging with Class Noise, opens an external URL in a new window, Late Breaking/Demo at the 20th International Society for Music Information Retrieval, Delft, The Netherlands, 2019.

Prinz K., Flexer A., Widmer G.: On End-to-End White-Box Adversarial Attacks in Music Information Retrieval, Transactions of the International Society for Music Information Retrieval, 4(1), pp.93–104, 2021. DOI: http://doi.org/10.5334/tismir.85, opens an external URL in a new window

Prinz K., Flexer A., Widmer G.: The Impact of Label Noise on a Music Tagger, In Proceedings of the 13th International Workshop on Machine Learning and Music, opens an external URL in a new window, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, opens an external URL in a new window, 2020. see also arXiv:2008.06273 [eess.AS], opens an external URL in a new window

Sturm B.L.T, Flexer A.: A Review of Validity and its Relationship to Music Information Research, opens an external URL in a new window, in Proc. of the 24th Int. Society for Music Information Retrieval Conf., Milan, Italy, 2023.

Sturm B.L.T., Flexer A.: Validity in Music Information Research Experiments, arXiv:2301.01578, opens an external URL in a new window [cs.SD], 2023.

October 14, 2020: organized a special session named "Do we really care about the validity of MIR research?" together with Bob Sturm and Julian Urbano at the 21st International Society for Music Information Retrieval Conference (ISMIR), opens an external URL in a new window

February 27, 2020: Research visit by Bob Sturm including a public talk "On Horses in Machine Learning", opens an external URL in a new window

Name	Purpose	Lifetime	Provider
CookieConsent	This cookie saves your settings about cookie-handling at this website.	1 year	JKU
se_mode	This cookie is used for settings of the site search.	1 year	JKU

Name	Purpose	Lifetime	Provider
_gcl_au	This cookie is used by Google Analytics to understand user interaction with the website.	3 months	Google
_ga	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.	2 years	Google
_gid	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.	1 day	Google
_gat_UA-112203476-1	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.	1 minute	Google
_pk_id	This cookie is used to store a few details about the user such as the unique visitor ID.	13 months	JKU
_pk_ses	This cookie is a short lived cookie used to temporarily store data for the visit.	30 minutes	JKU
_pk_ref	This cookie is used to store the attribution information, the referrer initially used to visit the website.	6 months	JKU

Name	Purpose	Lifetime	Provider
_gcl_au	This cookie is used by Google Analytics to understand user interaction with the website.	3 months	Google
_ga	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.	2 years	Google
_gid	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.	1 day	Google
_gac_UA-112203476-1	Contains campaign related information for the user and measures the AdWords campaign success.	90 days	Google
test_cookie	This cookie is set to determine if the website visitor's browser supports cookies. Doesn't contain personal identifier.	15 minutes	Google
IDE	This cookie carries out information about how the end user uses the website and any advertising that the end user may have seen before visiting the said website.	1 year	Google
_gcl_aw	This cookie is set when a user clicks an ad to reach our website. It informs about the success of campaigns and allows to connect ads to conversion targets.	3 months	Google
AMCV_xx	This is a pattern type cookie name associated with Adobe Marketing Cloud. It stores a unique visitor identifier, and uses an organisation identifier to allow a company to track users across their domains and services.	3 years	LinkedIn
bcookie	Contains a browser ID.	2 years	LinkedIn
bscookie	Contains a browser ID for a secure connection.	2 years	LinkedIn
lang	This cookie is used to store the language preference of our visitors	Session	LinkedIn
lidc	This cookie carries out information about how the end user uses the website and any advertising that the end user may have seen before visiting the said website.	1 day	LinkedIn
lissc	This cookie is used to analyze how a visitor interacts with embedded services.	1 year	LinkedIn
UserMatchHistory	This cookie is set when a user clicks an ad to reach our website. It informs about the success of campaigns and allows to connect ads to conversion targets.	30 days	LinkedIn
fr	This cookie is set when a user clicks an ad to reach our website. It informs about the success of campaigns and allows to connect ads to conversion targets.	90 days	Facebook
fbp	This cookie is used to display advertisings, for example third-party real time offers.	90 days	Facebook
sc_at	This cookie is used to identify a visitor across multiple domains.	1 year	Snap
sc-country	This cookie is used to determine a visitor's country.	1 day	Snap
uid	This cookie sets a random User-ID and helps at real time bidding for display advertising to targeted audiences.	60 days	Adform
C	This cookie identifies if user’s browser accepts cookies. 1 – Cookies are allowed, 3 – Opt-out.	30 days	Adform

ValMIR

ValMIR

Project Summary

People and Cooperations

Project Team Members

Results

Scientific Publications

Events