Machine Learning for Symbolic Music Processing

Student Projects in Machine Learning for Symbolic Music Processing

Contacts (if not stated otherwise): Carlos Cancino-Chacón, Silvan Peter

These student projects can be started any time (including holidays) and can span over semester boundaries.

Remark: We are open for new proposals - if you are interested in Symbolic Music Processing, feel free to contact us!

Topics on Expressive Performance Modeling and Generation

Reimplementation and ablation study of VirtuosoNet for Expressive Performance Generation

VirtuosoNet is a state-of-the-art neural network for expressive piano performance generation. The model relies on a complex hierarchical attention and recurrent neural network (RNN) based architecture, variational autoencoder (VAE) learning, and a multilevel loss function. The aim of this project is both reimplementing of the model in our framework for symbolic music processing (partitura, Basis Mixer) in a clean and testable fashion as well as conducting ablation studies concerning the major architectural building blocks of this model.

paper: http://archives.ismir.net/ismir2019/paper/000112.pdf, opens an external URL in a new window

repo: https://github.com/jdasam/virtuosoNet, opens an external URL in a new window

Keywords: Deep Learning, VAE, Ablation study, Models of Expressive Performance

Transformer Models for Expressive Performance Generation

Description: Transformers are neural networks for sequence processing based entirely on an attention mechanism and discarding any recurrent neural network (RNN) based structure. This makes them an interesting and novel candidate for expressive piano performance generation. This project aims at implementing, testing, and developing a transformer-based model for expressive performance in our framework for symbolic music processing (partitura, Basis Mixer) in a clean and reusable fashion.

paper: https://arxiv.org/pdf/1706.03762.pdf, opens an external URL in a new window

Keywords: Deep Learning, Transformer, Models of Expressive Performance

Temporal Convolutional Network (“Wavenet”) based Expressive Performance Generation

Description: Temporal convolutional networks (TCN) are sequence-processing convolutional neural networks (CNN) using dilated convolutions with hierarchically increasing receptive fields. Sometimes called Wavenets after their first model in audio processing they discard any recurrent neural network (RNN) based structure. This makes them an interesting and novel candidate for expressive piano performance generation. This project aims at implementing, testing, and developing a TCN-based model for expressive performance in our framework for symbolic music processing (partitura, Basis Mixer) in a clean and reusable fashion.

paper: https://arxiv.org/abs/1609.03499, opens an external URL in a new window

Keywords: Machine Learning, TCN, Models of Expressive Performance

GAN (and possibly CAN) training of Expressive Performance Generation models

Description: Generative adversarial networks (GAN) are a class of neural network models and a corresponding training paradigm. In short, GANs train a generator network by propagating a loss through a discriminator network that is trained at the same time. In doing so, they effectively use the discriminator as a complex loss function, circumventing problematic domain-agnostic loss function definitions such as the L2 norm. This is an acute problem in models of expressive performance and hence makes GANs interesting model/training candidates for this task. A possible extension of this project is the development of a creative adversarial network (CAN) framework for expressive performance.
This project aims at implementing, testing, and developing a GAN-based model for expressive performance in our framework for symbolic music processing (partitura, Basis Mixer) in a clean and reusable fashion.

paper GAN: https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf, opens an external URL in a new window
paper CAN: https://research.fb.com/wp-content/uploads/2017/08/creative-adversarial-networks.pdf, opens an external URL in a new window

Keywords: Machine Learning, GAN, CAN, Models of Expressive Performance

Topics on Music Alignment

Symbolic online score-to-performance alignment using Hidden semi-Markov Models

Description: In symbolic music processing, score-to-performance alignment refers to matching (i.e., aligning) the notes of a MIDI performance with the corresponding notes in the score (generally provided in a format such as MusicXML/MEI/MIDI). Hidden Markov models (HMMs) provide a convenient probabilistic framework for alignment systems, but have multiple issues aligning complex polyphonic music. Hidden semi-Markov models (HSMMs) are an extension of HMMs that allow for explicitly modeling the duration of a state. This project aims to develop, evaluate and implement an HSMM-based system for real time score-to-performance alignment. Of particular interest is to do a systematic comparison with other common alignment frameworks (DTW, HMM) in the context of aligning complex pieces of piano music. The implementation is expected to be part of our framework for symbolic music processing and alignment (partitura, maps) in a clean and reusable fashion.

Keywords: Statistical Models, HSMM, HMM, Alignment

Deep Dynamic Programming for Robust Music Alignment

Description: In symbolic music processing, score-to-performance alignment refers to matching (i.e., aligning) the notes of a MIDI performance with the corresponding notes in the score (generally provided in a format such as MusicXML/MEI/MIDI). Dynamic programming based methods like dynamic time warping (DTW), and its online counterpart online time warping (OLTW) are common approaches, particularly for aligning audio, although they have not been as thoroughly explored for MIDI. Two aspects that dramatically affect the performance of music alignment based on these methods are the choice of features (i.e., the representation of the input music signals), and the choice of a local metric (distance measure) for comparing the inputs. The goal of this research is to develop neural models that learn both features and metrics for aligning musical sequences (in MIDI) using DTW-based loss functions. Of particular interest is to develop methods that work well for real time score following (using OLTW). The implementation is expected to be part of our framework for symbolic music processing and alignment (partitura, maps) in a clean and reusable fashion.

Keywords: Alignment, DTW, Deep Learning

Topics on Music Structure Analysis

IDyOM for polyphonic music

Description: Information Dynamics of Music (IDyOM) is a framework for constructing multiple-viewpoint, variable-order Markov models for predictive modelling of probabilistic structure in symbolic music. IDyOM computes conditional probability distributions representing the estimated likelihood of each event in a sequence. IDyOM is currently available as LISP implementation and formalised for monophonic sequences of discrete symbols. The aim of this project is both reimplementation of IDyOM in our framework for symbolic music processing (partitura) in a clean and testable fashion as well as deriving possible extensions towards a polyphonic formalisation.

repo: https://github.com/mtpearce/idyom/wiki, opens an external URL in a new window

Keywords: Statistical Models, Music Structure

Information-theoretic segmentation of piano music

Description: Musical form refers to the structural organization of musical material of a piece of music. In Western classical music, pieces are usually comprised by sections, which can be divided into smaller segments, usually referred to as phrases. Phrases are musical segments that have a complete musical sense of their own. Work on segmentation of classical music has focused mostly on melodic segmentation, but segmentation of complex polyphonic music remains an open problem. Cognitively plausible models such as Information Dynamics of Music (IDyOM), have been used to predict segmentation by providing an information theoretic framework for modeling musical expectation. This project aims to develop models of music segmentation for piano music using an information-theoretic approach based on neural networks for predicting musical expectation. The outcomes of this project are expected to contribute to in our framework for symbolic music processing (partitura) in a clean and reusable fashion.

Keywords: Statistical Models, Machine Learning, Deep Learning, Form Segmentation, Structure Segmentation, Data Annotation

Segmentation of piano music using score and performance information

Description: Musical form refers to the structural organization of musical material of a piece of music. In Western classical music, pieces are usually comprised by sections, which can be divided into smaller segments, usually referred to as phrases. Phrases are musical segments that have a complete musical sense of their own. Musicians use performance cues, such as variations in tempo and dynamics to clarify the musical structure (in a role similar to punctuation in spoken language). The aim of this project is to relate both score- and performance-based features for segmenting piano music into musical phrases, as well as to study the relation between structural information of a piece of music and the way it is performed. The outcomes of this project are expected to contribute to in our frameworks for symbolic music processing and expressive performance generation (partitura, Basis Mixer) in a clean and reusable fashion.

Keywords: Statistical Models, Machine Learning, Performance Analysis, Structure Segmentation, Data Annotation

Convolutional Networks for predicting difficulty of piano pieces

Description: Determining the difficulty a piano piece is, i.e., determining how hard/complicated the piece is to play, is a complex and subjective task that usually requires a lot of expertise. With more people turning to self-learning or online music education services (such as Yousician), a system that determines the difficulty of a piece could be very useful, either on its own, or as part of a recommendation system that suggests new pieces to the user. This project aims to develop and test CNN-based models for predicting the difficulty of piano music using on 2D representations of a musical score (like a piano roll). The implementation of such a system is intended to be a part of our framework for symbolic music processing (partitura), in a clean and reusable fashion.

Keywords: Machine Learning, Deep Learning, Classification, Supervised Learning, Data Annotation

Predict Score Notation from MIDI performance

Description: Transcription models commonly start in the audio domain and end in a MIDI or piano roll representation. From a musicological perspective however, there is a crucial step missing: the derivation of a score from a MIDI or piano roll representation. Musical scores in modern staff notation contain a lot of information besides the note pitches, onsets, and offsets. This information includes clefs, key signatures, time signatures, pitch spelling, measures, quantised beat positions (position of a note relative to measure), quantised note durations, expression markings such as slurs and crescendo markings, repeats, and so on. Deriving this information from MIDI information is a complex multitask learning problem of interdependent classification tasks. This project aims at implementing, testing, and developing a multitask model for score prediction based on MIDI in our framework for symbolic music processing (partitura) in a clean and reusable fashion. The model can be either a tractable probabilistic model, a (deep) neural network, or a combination thereof.

Keywords: Statistical Models, Machine Learning, Multitask Learning, Classification

Name	Purpose	Lifetime	Provider
CookieConsent	This cookie saves your settings about cookie-handling at this website.	1 year	JKU
se_mode	This cookie is used for settings of the site search.	1 year	JKU

Name	Purpose	Lifetime	Provider
_gcl_au	This cookie is used by Google Analytics to understand user interaction with the website.	3 months	Google
_ga	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.	2 years	Google
_gid	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.	1 day	Google
_gat_UA-112203476-1	This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.	1 minute	Google
_pk_id	This cookie is used to store a few details about the user such as the unique visitor ID.	13 months	JKU
_pk_ses	This cookie is a short lived cookie used to temporarily store data for the visit.	30 minutes	JKU
_pk_ref	This cookie is used to store the attribution information, the referrer initially used to visit the website.	6 months	JKU

Name	Purpose	Lifetime	Provider
_gcl_au	This cookie is used by Google Analytics to understand user interaction with the website.	3 months	Google
_ga	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.	2 years	Google
_gid	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.	1 day	Google
_gac_UA-112203476-1	Contains campaign related information for the user and measures the AdWords campaign success.	90 days	Google
test_cookie	This cookie is set to determine if the website visitor's browser supports cookies. Doesn't contain personal identifier.	15 minutes	Google
IDE	This cookie carries out information about how the end user uses the website and any advertising that the end user may have seen before visiting the said website.	1 year	Google
_gcl_aw	This cookie is set when a user clicks an ad to reach our website. It informs about the success of campaigns and allows to connect ads to conversion targets.	3 months	Google
AMCV_xx	This is a pattern type cookie name associated with Adobe Marketing Cloud. It stores a unique visitor identifier, and uses an organisation identifier to allow a company to track users across their domains and services.	3 years	LinkedIn
bcookie	Contains a browser ID.	2 years	LinkedIn
bscookie	Contains a browser ID for a secure connection.	2 years	LinkedIn
lang	This cookie is used to store the language preference of our visitors	Session	LinkedIn
lidc	This cookie carries out information about how the end user uses the website and any advertising that the end user may have seen before visiting the said website.	1 day	LinkedIn
lissc	This cookie is used to analyze how a visitor interacts with embedded services.	1 year	LinkedIn
UserMatchHistory	This cookie is set when a user clicks an ad to reach our website. It informs about the success of campaigns and allows to connect ads to conversion targets.	30 days	LinkedIn
fr	This cookie is set when a user clicks an ad to reach our website. It informs about the success of campaigns and allows to connect ads to conversion targets.	90 days	Facebook
fbp	This cookie is used to display advertisings, for example third-party real time offers.	90 days	Facebook
sc_at	This cookie is used to identify a visitor across multiple domains.	1 year	Snap
sc-country	This cookie is used to determine a visitor's country.	1 day	Snap
uid	This cookie sets a random User-ID and helps at real time bidding for display advertising to targeted audiences.	60 days	Adform
C	This cookie identifies if user’s browser accepts cookies. 1 – Cookies are allowed, 3 – Opt-out.	30 days	Adform

Machine Learning for Symbolic Music Processing

Student Projects in Machine Learning for Symbolic Music Processing

Topics on Expressive Performance Modeling and Generation

Reimplementation and ablation study of VirtuosoNet for Expressive Performance Generation

Transformer Models for Expressive Performance Generation

Temporal Convolutional Network (“Wavenet”) based Expressive Performance Generation

GAN (and possibly CAN) training of Expressive Performance Generation models

Topics on Music Alignment

Symbolic online score-to-performance alignment using Hidden semi-Markov Models

Deep Dynamic Programming for Robust Music Alignment

Topics on Music Structure Analysis

IDyOM for polyphonic music

Information-theoretic segmentation of piano music

Segmentation of piano music using score and performance information

Convolutional Networks for predicting difficulty of piano pieces

Predict Score Notation from MIDI performance