Towards Expressivity-aware Computer Systems in Music

Project Summary

What makes music so important, what can make a performance so special and stirring? It is the things the music expresses, the emotions it induces, the associations it evokes, the drama and characters it portrays. The sources of this expressivity are manifold: the music itself, its structure, orchestration, personal associations, social settings, but also - and very importantly - the act of performance, the interpretation and expressive intentions made explicit by the musicians through nuances in timing, dynamics etc. 

Thanks to research in fields like Music Information Research (MIR), computers can do many useful things with music, from beat and rhythm detection to song identification and tracking. However, they are still far from grasping the essence of music: they cannot tell whether a performance expresses playfulness or ennui, solemnity or gaiety, determination or uncertainty; they cannot produce music with a desired expressive quality; they cannot interact with human musicians in a truly musical way, recognising and responding to the expressive intentions implied in their playing. 

The project is about developing machines that are aware of certain dimensions of expressivity, specifically in the domain of (classical) music, where expressivity is both essential and - at least as far as it relates to the act of performance - can be traced back to well-defined and measurable parametric dimensions (such as timing, dynamics, articulation). We will develop systems that can recognise, characterise, search music by expressive aspects, generate, modify, and react to expressive qualities in music. To do so, we will (1) bring together the fields of AI, Machine Learning, Music Information Retrieval (MIR), and Music Performance Research; (2) integrate theories from Musicology to build more well-founded models of music understanding; (3) support model learning and validation with massive musical corpora of a size and quality unprecedented in computational music research.

In terms of computational methodologies, we will rely on, and improve, methods from Artificial Intelligence - particularly: probabilistic models (for information fusion, tracking, reasoning and prediction); machine learning - particularly: deep learning techniques (for learning musical features, abstractions, and representations from musical corpora, and for inducing mappings for expression recognition and prediction); audio signal processing and pattern recognition (for extracting musical parameters and patterns relevant to expressivity); and information theory (for modelling musical expectation, surprise, uncertainty, etc.). This will be combined with high-level concepts and models of structure perception from fields like systematic and cognitive musicology, in order to create systems that have a somewhat deeper 'understanding' of music, musical structure, music performance, and musical listening, and the interplay of these factors in making music the expressive and rewarding art that it is. (A more detailed discussion of how we believe all these things relate to each other can be found in the "Con Espressione Manifesto").

With this research, we hope to contribute to a new generation of MIR systems that can support musical services and interactions at a new level of quality, and to inspire expressivity-centered research in other domains of the arts and human-computer interaction (HCI).

Project Details

Call identifier


Project Number


Principal Investigator

Gerhard Widmer

Project Period

Jan 2016 - Dec 2021

Funding Amount

€ 2,318,750.00

The Con Espressione Game ...

Play and contribute: this is what the project is about:

Do you have a bit of time to listen to some bits of music (and maybe contribute some empirical data to the project)? This is what the project is about:

Play The Con Espressione Game

... or, if you have very little time:

The Con Espressione Manifesto

Our Guiding Strategic Document

Here is our view (2017) on the current research landscape, and what research needs to be done in the coming years (within and beyond Con Espressione):

(If you are interested in applying for a research position in the project, please think about how your research ideas and plans would fit in this scheme (or go beyond it, because we may have missed some crucial directions ...). 






  • 2017-10-27:
    Demonstration of first prototype of our expressive accompaniment system The Accompanion v0.1 at the Late Breaking / Demo Papers Session at the ISMIR 2017 Conference, Suzhou, China.
    Demo Videos on a Bösendorfer CEUS: Mozart Sonata K.545, 2nd mvt. (Werner Goebl); "The Wild Geese" (Gerhard Widmer)

  • 2017-10-14:
    Gerhard Widmer as studio guest in the main evening news of Slovenian public TV station RTV Slovenia (via RTV Slo Archive)
  • 2017-10-14:
    Gerhard Widmer to give keynote lecture at Slovenian Conference on Artificial Intelligence, Ljubljana
  • 2017-04-13:
    Gerhard Widmer to talk at the Karajan Music Tech Conference as part of the 50th Easter Festival (Osterfestspiele), Salzburg.

  • 2017-01-13:
    Our BasisMixer computational model of expressive music performance is said to have passed a musical "Turing Test" (producing a piano performance whose "humanness" as judged by listeners is undistinguishable from a human musician's [and ranking best, in this respect, among a number of algorithms]) in a recent study ("Algorithms Can Mimic Human Piano Performance") by E. Schubert et al., published in J.New.Mus.Res. (Jan. 2017)

Open Research Positions

Sept. 2019: Open Position for PostDoc or PhD Student

August 2019:

We are looking for a PostDoc or a PhD Student to work on the ERC project Con Espressione (see below). The position will be for 1 to 1.5 years (PostDoc) or 3 years (PhD student). The PhD student will have the opportunity to complete a doctorate in computer science at the Johannes Kepler University. The position and place of work itself will be at the Austrian Research Institute for Artificial Intelligence (OFAI), Vienna.

The Project:

The Con Espressione Project is funded by the European Research Council (ERC) in the form of an ERC Advanced Grant. Its overall goal is to develop machines that are aware of certain dimensions of expressivity in music. A particular focus of our research is on expressivity in musical performance, and on predictive computational performance models.
More information can be found on the project web page.

Research Focus:

Your research in the project will focus on improving our expressive and reactive/predictive piano accompaniment system (the "ACCompanion", see here for an early short description: <>), turning it into musical companion that recognises and anticipates expressive intentions by the soloist, learns predictive tempo and performance models on-line, during rehearsal, and combines this with its own internal model of expressive performance in order to create, in real time, a natural and musically expressive accompaniment.

Required Qualifications and Skills:

  • Completed master's degree or PhD in AI/ML & Music, MIR, Musical Informatics
  • Experience in Machine Learning and Deep Neural Networks
  • Experience in, and a good understanding of probabilistic models (HMMs etc.) and inference
  • Interest in computational models of expressive music performance
  • Programming skills: Python, real-time programming
  • Strong background and interest in (classical) music and, ideally, piano


We offer a very competitive salary of

  • approx. EUR 55.000,- per year (before taxes) for a PostDoc
  • approx. EUR 40.000,- per year (before taxes) for a PhD student

for full-time employment (40 hours / week); social security and medical insurance are automatically included.

The PhD student will be employed at a 75% employment level for the first year, which will be raised to 100% if things go well.

Applications should consist of

  • a motivation letter
  • a curriculum vitae
  • a list of publications (or, for PhD candidates: a copy of your master thesis)
  • a reference letter

and whatever else you may consider informative.

Please send your application, via email <>, to Gerhard Widmer.

Please make sure that your motivation letter explains your research background and experience and how this matches the above-mentioned requirements, and demonstrates that you have studied the Con Espressione project pages.

We encourage traditionally underrepresented groups, such as minorities and women, to apply.


Results 1:
Publications, Presentations, Media Coverage

Scientific Publications

Want to know more about the scientific work and results of the project?

Here's an up-to-date list of our scientific publications related to the project.

Public Presentations

Media Coverage

Results 2:
Demonstrators and Prototypes

Autonomous Expressive Accompaniment:
The "ACCompanion"

The ACCompanion (work in progress) is an automatic accompaniment system designed to accompany a pianist in music for two pianists (or two pianos). The ACCompanion will not just follow the soloist in real time and synchronise with her playing, but will also recognise and anticipate expressive intentions and playing style of the soloist, and contribute its own expressive interpretation of the accompaniment part (via the "BasisMixer" expressive performance model).

Aug. 2019: First demonstrations with polyphonic music:


Man-Machine Collaboration in Expressive Performance:
The "Con Espressione!" Exhibit

The Con Espressione! Exhibit is an interactive system designed for popular science exhibitions. It demonstrates and enables joint human-computer control of expressive performance: the visitor controls overall tempo and loudness of classical piano pieces (such as Beethoven's "Moonlight" sonata Op.27 No.2) via hand movements (tracked by a LeapMotion sensor). In the background, our "Basis Mixer" expressive performance model adds subtle modifications to the performance, such as articulation and micro-timing (e.g., slight temporal differences in the note onsets when playing chords). The contribution of the Basis Mixer can be controlled and experimented with via a slider.

The exhibit was first shown at the La La Lab Science Exhibition ("The Mathematics of Music") in Heidelberg, Germany (May 2019 - April 2020; here is a video from the Exhibition Opening). The source code is openly available via a github repository.

Here is a video from the Heidelberg Laureate Forum 2019, with Andreas Daniel Matt (managing director, explaining the La La Lab and the Con Espressione! Exhibit.

And here is a video showcasing the Bösendorfer CEUS computer-monitored grand piano.
(This was our birthday gift for the 20th Anniversary of the ISMIR Conference Series (Sept. 2019)).

A Generative Model of Expressive Piano Performance:
The "Basis Mixer"

The Basis Mixer is a comprehensive computational model of expressive music performance that predicts musically appropriate patterns for various performance parameters (tempo, timing, dynamics, articulation, ...) as a function of the score of a given piece. It is based on so-called basis functions (feature functions that describe various relevant aspects of the score) and state-of-the-art deep learning methods. A comprehensive description can be found in Carlos Cancino's Ph.D. thesis (Dec. 2018). The model has been used as an experimental tool for studying or verifying various hypotheses related to expressive piano performance, and can also be used to generate expressive performances for new pieces. An early version of the model is said to have passed a "musical Turing Test", producing a piano performance whose "humanness" as judged by listeners was undistinguishable from a human musician's [and ranking best, in this respect, among a number of algorithms]) in a recent study (E. Schubert et al., "Algorithms Can Mimic Human Piano Performance: The Deep Blues of Music",  E. Schubert et al., J.New.Mus.Res. 2017)

The Basis Mixer is used as an autonomous expressive performance generator in several of our demonstrators (e.g., the ACCompanion and the Con Espressione! Exhibit).

Here is a web-based tool for experimenting with the model.
(Note [May 2019]: this is outdated; a new version with a better trained model and a new interface will come soon, in the context of the ISMIR 2019 Tutorial on Computational Modeling of Musical Expression given by Carlos Cancino et al.)


Emotion Recognition with Explanations:
The "Two-level Mood Recogniser"

Our Two-level Mood Recogniser is a deep neural network that learns to recognise emotional characteristics of a musical piece from audio, together with (and based on) human-interpretable, mid-level perceptual features. This permits it not only to make predictions regarding some emotion/mood-related qualities that humans may perceive in a piece of music, but also to provide explanations for its predictions, in terms of general musical concepts (such as "melodiousness" or "rhythmic complexity") that most (Western) music listeners may intuitively understand. These "mid-level" concepts can then be further traced back to aspects of the actual audio in a second level of explanation, if the user so wishes.

Here are two little demo pages with examples:


Generation of Expressive Rhythms with Decoupled Track Timing:
The "Non Sequitur" Sequencer

Non Sequitur is an experimental implementation of a tool for generating complex rhythms with complex poly-rhythms and micro-timing. It is based on several individual, partly dependent clocks realised as oscillators that can influence each others' periodicities by virtue of being connected in a network.

Play with our interactive Web Prototype.
The sequencer can also be used to control, e.g., a grand piano, to generate polyrhythmic "minimal music"...

The technical background is briefly decribed in a demo paper (Sound & Music Computing (SMC) Conference 2019, Malaga, Spain).

Automatic Sound and Music Recognition:
The "Listening Machine" at the Ars Electronica Center

The Listening Machine is an interactive exhibit designed for the Ars Electronica Center (AEC), to demonstrate real-time computational sound/music perception to the general public. It is based on a deep neural network that has been trained, via machine learning methods and using thousands of sound examples, to recognise different kinds of sounds, by finding out what patterns in the sound signal are characteristic of certain classes – for example, what distinguishes a flute from a trumpet, or spoken language from singing.

To come: video documentary produced by AEC on the occasion of the opening of the new AEC permanent exhibition (May 2019).

Rhythm Recognition and Tempo Tracking for Automatic Accompaniment:
Our "robod" Robo-Drummer

The robod is our little robot drummer that continually listens to its surroundings through a microphone, recognises when music is played, automatically determines the meter and downbeat, and accompanies the musicians in real time, adapting to expressive changes of tempo.
It was designed on the occasion of the BE OPEN Public Science Festival in the city center of Vienna (Sept. 2018), organised by the Austrian Science Fund (FWF) on the occasion of its 50th anniversary.

To come (Aug. 2019): Demonstration video.


Results 3:
Some Special Things

Awards, Competitions, and a "Turing Test"



This project receives funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme under grant agreement No 670035.

In addition, we gratefully acknowledge material support for this research (in the form of music, scores, access to musical instruments and performance spaces) from the following institutions:

The Bösendorfer Piano Company, Vienna, Austria

The Royal Concertgebouw Orchestra (RCO), Amsterdam, The Netherlands