Machine Learning for Symbolic Music Processing

Student Projects in Machine Learning for Symbolic Music Processing

Contacts (if not stated otherwise): Carlos Cancino-Chacón, Silvan Peter

These student projects can be started any time (including holidays) and can span over semester boundaries.

Remark: We are open for new proposals - if you are interested in Symbolic Music Processing, feel free to contact us!

Topics on Expressive Performance Modeling and Generation

  • Reimplementation and ablation study of VirtuosoNet for Expressive Performance Generation

VirtuosoNet is a state-of-the-art neural network for expressive piano performance generation. The model relies on a complex hierarchical attention and recurrent neural network (RNN) based architecture, variational autoencoder (VAE) learning, and a multilevel loss function. The aim of this project is both reimplementing of the model in our framework for symbolic music processing (partitura, Basis Mixer) in a clean and testable fashion as well as conducting ablation studies concerning the major architectural building blocks of this model.

paper: http://archives.ismir.net/ismir2019/paper/000112.pdf

repo:  https://github.com/jdasam/virtuosoNet

Keywords: Deep Learning, VAE, Ablation study, Models of Expressive Performance

  • Transformer Models for Expressive Performance Generation

Description: Transformers are neural networks for sequence processing based entirely on an attention mechanism and discarding any recurrent neural network (RNN) based structure. This makes them an interesting and novel candidate for expressive piano performance generation. This project aims at implementing, testing, and developing a transformer-based model for expressive performance in our framework for symbolic music processing (partitura, Basis Mixer) in a clean and reusable fashion.

paper: https://arxiv.org/pdf/1706.03762.pdf

Keywords: Deep Learning, Transformer, Models of Expressive Performance

  • Temporal Convolutional Network (“Wavenet”) based Expressive Performance Generation

Description: Temporal convolutional networks (TCN) are sequence-processing convolutional neural networks (CNN) using dilated convolutions with hierarchically increasing receptive fields. Sometimes called Wavenets after their first model in audio processing they discard any recurrent neural network (RNN) based structure. This makes them an interesting and novel candidate for expressive piano performance generation. This project aims at implementing, testing, and developing a TCN-based model for expressive performance in our framework for symbolic music processing (partitura, Basis Mixer) in a clean and reusable fashion.

paper: https://arxiv.org/abs/1609.03499

Keywords: Machine Learning, TCN, Models of Expressive Performance

  • GAN (and possibly CAN) training of Expressive Performance Generation models

Description: Generative adversarial networks (GAN) are a class of neural network models and a corresponding training paradigm. In short, GANs train a generator network by propagating a loss through a discriminator network that is trained at the same time. In doing so, they effectively use the discriminator as a complex loss function, circumventing problematic domain-agnostic loss function definitions such as the L2 norm. This is an acute problem in models of expressive performance and hence makes GANs interesting model/training candidates for this task. A possible extension of this project is the development of a creative adversarial network (CAN) framework for expressive performance.
This project aims at implementing, testing, and developing a GAN-based model for expressive performance in our framework for symbolic music processing (partitura, Basis Mixer) in a clean and reusable fashion.

paper GAN: https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
paper CAN: https://research.fb.com/wp-content/uploads/2017/08/creative-adversarial-networks.pdf

Keywords: Machine Learning, GAN, CAN, Models of Expressive Performance

Topics on Music Alignment

  • Symbolic online score-to-performance alignment using Hidden semi-Markov Models

Description: In symbolic music processing, score-to-performance alignment refers to matching (i.e., aligning) the notes of a MIDI performance with the corresponding notes in the score (generally provided in a format such as MusicXML/MEI/MIDI). Hidden Markov models (HMMs) provide a convenient probabilistic framework for alignment systems, but have multiple issues aligning complex polyphonic music. Hidden semi-Markov models (HSMMs) are an extension of HMMs that allow for explicitly modeling the duration of a state. This project aims to develop, evaluate and implement an HSMM-based system for real time score-to-performance alignment. Of particular interest is to do a systematic comparison with other common alignment frameworks (DTW, HMM) in the context of aligning complex pieces of piano music. The implementation is expected to be part of our framework for symbolic music processing and alignment (partitura, maps) in a clean and reusable fashion.

Keywords: Statistical Models, HSMM, HMM, Alignment

  • Deep Dynamic Programming for Robust Music Alignment

Description: In symbolic music processing, score-to-performance alignment refers to matching (i.e., aligning) the notes of a MIDI performance with the corresponding notes in the score (generally provided in a format such as MusicXML/MEI/MIDI). Dynamic programming based methods like dynamic time warping (DTW), and its online counterpart online time warping (OLTW) are common approaches, particularly for aligning audio, although they have not been as thoroughly explored for MIDI. Two aspects that dramatically affect the performance of music alignment based on these methods are the choice of features (i.e., the representation of the input music signals), and the choice of a local metric (distance measure) for comparing the inputs. The goal of this research is to develop neural models that learn both features and metrics for aligning musical sequences (in MIDI) using DTW-based loss functions. Of particular interest is to develop methods that work well for real time score following (using OLTW). The implementation is expected to be part of our framework for symbolic music processing and alignment (partitura, maps) in a clean and reusable fashion.

Keywords: Alignment, DTW, Deep Learning

Topics on Music Structure Analysis

  • IDyOM for polyphonic music

Description: Information Dynamics of Music (IDyOM) is a framework for constructing multiple-viewpoint, variable-order Markov models for predictive modelling of probabilistic structure in symbolic music. IDyOM computes conditional probability distributions representing the estimated likelihood of each event in a sequence. IDyOM is currently available as LISP implementation and formalised for monophonic sequences of discrete symbols. The aim of this project is both reimplementation of IDyOM in our framework for symbolic music processing (partitura) in a clean and testable fashion as well as deriving possible extensions towards a polyphonic formalisation.

repo: https://github.com/mtpearce/idyom/wiki

Keywords: Statistical Models, Music Structure

  • Information-theoretic segmentation of piano music

Description: Musical form refers to the structural organization of musical material of a piece of music. In Western classical music, pieces are usually comprised by sections, which can be divided into smaller segments, usually referred to as phrases. Phrases are musical segments that have a complete musical sense of their own. Work on segmentation of classical music has focused mostly on melodic segmentation, but segmentation of complex polyphonic music remains an open problem. Cognitively plausible models such as Information Dynamics of Music (IDyOM), have been used to predict segmentation by providing an information theoretic framework for modeling musical expectation. This project aims to develop models of music segmentation for piano music using an information-theoretic approach based on neural networks for predicting musical expectation. The outcomes of this project are expected to contribute to in our framework for symbolic music processing (partitura) in a clean and reusable fashion.

Keywords: Statistical Models, Machine Learning, Deep Learning, Form Segmentation, Structure Segmentation, Data Annotation

  • Segmentation of piano music using score and performance information

Description: Musical form refers to the structural organization of musical material of a piece of music. In Western classical music, pieces are usually comprised by sections, which can be divided into smaller segments, usually referred to as phrases. Phrases are musical segments that have a complete musical sense of their own. Musicians use performance cues, such as variations in tempo and dynamics to clarify the musical structure (in a role similar to punctuation in spoken language). The aim of this project is to relate both score- and performance-based features for segmenting piano music into musical phrases, as well as to study the relation between structural information of a piece of music and the way it is performed. The outcomes of this project are expected to contribute to in our frameworks for symbolic music processing and expressive performance generation (partitura, Basis Mixer) in a clean and reusable fashion.

Keywords: Statistical Models, Machine Learning, Performance Analysis, Structure Segmentation, Data Annotation

  • Convolutional Networks for predicting difficulty of piano pieces

Description: Determining the difficulty a piano piece is, i.e., determining how hard/complicated the piece is to play, is a complex and subjective task that usually requires a lot of expertise. With more people turning to self-learning or online music education services (such as Yousician), a system that determines the difficulty of a piece could be very useful, either on its own, or as part of a recommendation system that suggests new pieces to the user. This project aims to develop and test CNN-based models for predicting the difficulty of piano music using on 2D representations of a musical score (like a piano roll). The implementation of such a system is intended to be a part of our framework for symbolic music processing (partitura), in a clean and reusable fashion.

Keywords: Machine Learning, Deep Learning, Classification, Supervised Learning, Data Annotation

  • Predict Score Notation from MIDI performance

Description: Transcription models commonly start in the audio domain and end in a MIDI or piano roll representation. From a musicological perspective however, there is a crucial step missing: the derivation of a score from a MIDI or piano roll representation. Musical scores in modern staff notation contain a lot of information besides the note pitches, onsets, and offsets. This information includes clefs, key signatures, time signatures, pitch spelling, measures, quantised beat positions (position of a note relative to measure), quantised note durations, expression markings such as slurs and crescendo markings, repeats, and so on. Deriving this information from MIDI information is a complex multitask learning problem of interdependent classification tasks. This project aims at implementing, testing, and developing a multitask model for score prediction based on MIDI in our framework for symbolic music processing (partitura) in a clean and reusable fashion. The model can be either a tractable probabilistic model, a (deep) neural network, or a combination thereof.

Keywords: Statistical Models, Machine Learning, Multitask Learning, Classification