Con Espressione

Towards Expressivity-aware Computer Systems in Music

We would like to point out that when playing the video, data may be transmitted to external parties. Learn more by reading our data privacy policy
Data protection information

Project Summary

What makes music so important, what can make a performance so special and stirring? It is the things the music expresses, the emotions it induces, the associations it evokes, the drama and characters it portrays. The sources of this expressivity are manifold: the music itself, its structure, orchestration, personal associations, social settings, but also - and very importantly - the act of performance, the interpretation and expressive intentions made explicit by the musicians through nuances in timing, dynamics etc. 

Thanks to research in fields like Music Information Research (MIR), computers can do many useful things with music, from beat and rhythm detection to song identification and tracking. However, they are still far from grasping the essence of music: they cannot tell whether a performance expresses playfulness or ennui, solemnity or gaiety, determination or uncertainty; they cannot produce music with a desired expressive quality; they cannot interact with human musicians in a truly musical way, recognising and responding to the expressive intentions implied in their playing. 

The project is about developing machines that are aware of certain dimensions of expressivity, specifically in the domain of (classical) music, where expressivity is both essential and - at least as far as it relates to the act of performance - can be traced back to well-defined and measurable parametric dimensions (such as timing, dynamics, articulation). We will develop systems that can recognise, characterise, generate, modify, and react to expressive qualities in music. To do so, we will (1) bring together the fields of AI, Machine Learning, Music Information Retrieval (MIR), and Music Performance Research; (2) integrate knowledge from musicology to build more well-founded models of music understanding; (3) train and validate computational models with massive musical corpora.

In terms of computational methodologies, we will rely on, and improve, methods from Artificial Intelligence - particularly: (deep) machine learning (for learning musical features, abstractions, and representations from musical corpora, and for inducing mappings for expression recognition and prediction); probabilistic modeling (for information fusion, tracking, reasoning and prediction); and audio signal processing and pattern recognition (for extracting musical parameters and patterns relevant to expressivity). This will be combined with models of structure perception from fields like systematic and cognitive musicology, in order to create systems that have a somewhat deeper 'understanding' of music, musical structure, music performance, and musical listening, and the interplay of these factors in making music the expressive art that it is. (A more detailed discussion of how we believe all these things relate to each other can be found in the "Con Espressione Manifesto").

With this research, we hope to contribute to a new generation of MIR systems that can support musical services and interactions at a new level of quality, and to inspire expressivity-centered research in other domains of the arts and human-computer interaction (HCI).

UPDATE (2021): Our journey will continue, in our new ERC project "Whither Music?" ...

Project Details

Call identifier


Project Number


Principal Investigator

Gerhard Widmer

Project Period

Jan 2016 - Dec 2021

Funding Amount

€ 2,318,750.00

The Con Espressione Game ...

Play and contribute: this is what the project is about:

Do you have a bit of time to listen to some bits of music (and maybe contribute some empirical data to the project)? This is what the project is about:

Play The Con Espressione Game

... or, if you have very little time: a very short version

New (2020/21): Evaluation and interactive visualisation of the Con Espressione Game data, and open research dataset

After about 2 years of data gathering in the Con Espressione Game, we collected, cleaned, and analysed the performance characterisations (thanks to all of you who contributed!), and can now offer the following results:

  • A first analysis of the data (in our ISMIR 2020 paper)
  • The Con Espressione Game data has also been released on as an open research dataset.
  • In 2021, we conducted a pile sorting experiment (ICMPC 2021), where music experts sorted the most commonly used terms into thematic clusters ("piles"). Here is an interactive visualisation tool that permits you to explore the resulting concepts, and their connections to expressivity characterisations and the corresponding performances (scroll down for instructions)

The Con Espressione Manifesto

Our Guiding Strategic Document

Here is our view (2017) on the current research landscape, and what research needs to be done in the coming years (within and beyond Con Espressione):

(If you are interested in applying for a research position in the project, please think about how your research ideas and plans would fit in this scheme (or go beyond it, because we may have missed some crucial directions ...). 







  • 2017-10-27:
    Demonstration of first prototype of our expressive accompaniment system The Accompanion v0.1 at the Late Breaking / Demo Papers Session at the ISMIR 2017 Conference, Suzhou, China.
    Demo Videos on a Bösendorfer CEUS: Mozart Sonata K.545, 2nd mvt. (Werner Goebl); "The Wild Geese" (Gerhard Widmer)

  • 2017-10-14:
    Gerhard Widmer as studio guest in the main evening news of Slovenian public TV station RTV Slovenia (via RTV Slo Archive)
  • 2017-10-14:
    Gerhard Widmer to give keynote lecture at Slovenian Conference on Artificial Intelligence, Ljubljana
  • 2017-04-13:
    Gerhard Widmer to talk at the Karajan Music Tech Conference as part of the 50th Easter Festival (Osterfestspiele), Salzburg.

  • 2017-01-13:
    Our BasisMixer computational model of expressive music performance is said to have passed a musical "Turing Test" (producing a piano performance whose "humanness" as judged by listeners is undistinguishable from a human musician's [and ranking best, in this respect, among a number of algorithms]) in a recent study ("Algorithms Can Mimic Human Piano Performance") by E. Schubert et al., published in J.New.Mus.Res. (Jan. 2017)

Results 1:
Publications, Resources, Presentations, Media Coverage

Scientific Publications (+ associated code & research data)

Want to know more about the scientific work and results of the project?

Here's an up-to-date list of our scientific publications related to the project.

Resources (Software & Data)

Publishing our methods, experimental software, and data is one of our guiding principles, and we try to do it wherever legal restrictions (e.g., copyright on music data) permit it. Research software and data associated with specific scientific papers is linked to from our publications page (so that you have the appropriate papers to go with the resource).

More general resources (such as our computational model of expressive performance (the "Basismixer"), the Con Espressione Dataset, or the Partitura score manipulation software) can be found here.

Public Presentations

Media Coverage

Results 2:
Demonstrators and Prototypes

Autonomous Expressive Accompaniment:
The "ACCompanion"

The ACCompanion (work in progress) is an automatic accompaniment system designed to accompany a pianist in music for two pianists (or two pianos). The ACCompanion will not just follow the soloist in real time and synchronise with her playing, but will also recognise and anticipate expressive intentions and playing style of the soloist, and contribute its own expressive interpretation of the accompaniment part (via the "BasisMixer" expressive performance model).

Aug. 2019: First demonstrations with polyphonic music:

Late 2020 (hopefully): Much more spectacular music (to come - as soon as Covid-19 permits it ...)

Man-Machine Collaboration in Expressive Performance:
The "Con Espressione!" Exhibit

The Con Espressione! Exhibit is an interactive system designed for popular science exhibitions. It demonstrates and enables joint human-computer control of expressive performance: the visitor controls overall tempo and loudness of classical piano pieces (such as Beethoven's "Moonlight" sonata Op.27 No.2) via hand movements (tracked by a LeapMotion sensor). In the background, our "Basis Mixer" expressive performance model adds subtle modifications to the performance, such as articulation and micro-timing (e.g., slight temporal differences in the note onsets when playing chords). The contribution of the Basis Mixer can be controlled and experimented with via a slider.

The exhibit was first shown at the La La Lab Science Exhibition ("The Mathematics of Music") in Heidelberg, Germany (May 2019 - August 2020). The source code is openly available via a github repository.


Some videos with and about the Con Espressione Exhibit:

A Generative Model of Expressive Piano Performance:
The "Basis Mixer"

The Basis Mixer is a comprehensive computational model of expressive music performance that predicts musically appropriate patterns for various performance parameters (tempo, timing, dynamics, articulation, ...) as a function of the score of a given piece. It is based on so-called basis functions (feature functions that describe various relevant aspects of the score) and state-of-the-art deep learning methods. A comprehensive description can be found in Carlos Cancino's Ph.D. thesis (Dec. 2018). The model has been used as an experimental tool for studying or verifying various hypotheses related to expressive piano performance, and can also be used to generate expressive performances for new pieces. An early version of the model is said to have passed a "musical Turing Test", producing a piano performance whose "humanness" as judged by listeners was undistinguishable from a human musician's [and ranking best, in this respect, among a number of algorithms] in a recent study (E. Schubert et al., "Algorithms Can Mimic Human Piano Performance: The Deep Blues of Music",  E. Schubert et al., J.New.Mus.Res. 2017)

The Basis Mixer is used as an autonomous expressive performance generator in several of our demonstrators (e.g., the ACCompanion and the Con Espressione! Exhibit).



Emotion Recognition with Explanations:
The "Two-level Mood Recogniser"

Our Two-level Mood Recogniser is a deep neural network that learns to recognise emotional characteristics of a musical piece from audio, together with (and based on) human-interpretable, mid-level perceptual features. This permits it not only to make predictions regarding some emotion/mood-related qualities that humans may perceive in a piece of music, but also to provide explanations for its predictions, in terms of general musical concepts (such as "melodiousness" or "rhythmic complexity") that most (Western) music listeners may intuitively understand. These "mid-level" concepts can then be further traced back to aspects of the actual audio in a second level of explanation, if the user so wishes.

Here are two little demo pages with examples:


Generation of Expressive Rhythms with Decoupled Track Timing:
The "Non Sequitur" Sequencer

Non Sequitur is an experimental implementation of a tool for generating complex rhythms with complex poly-rhythms and micro-timing. It is based on several individual, partly dependent clocks realised as oscillators that can influence each others' periodicities by virtue of being connected in a network.


  • Play with our interactive Web Prototype.
    (Quick start: choose a preset from the Reset button pull-down menu ..).
    (For documentation, push the INFO button)
  • The technical background is briefly decribed in a demo paper (Sound & Music Computing (SMC) Conference 2019, Malaga, Spain).
  • The sequencer can also be used to control, e.g., a grand piano, to generate polyrhythmic "minimal music"...

Automatic Sound and Music Recognition:
The "Listening Machine" at the Ars Electronica Center

The Listening Machine (picture) is an interactive exhibit designed for the permanent exhibition "Understanding AI" at the  Ars Electronica Center (AEC), to demonstrate real-time computational sound/music perception to the general public. It is based on a deep neural network that has been trained, via machine learning methods and using thousands of sound examples, to recognise different kinds of sounds, by finding out what patterns in the sound signal are characteristic of certain classes – for example, what distinguishes a flute from a trumpet, or spoken language from singing.

Here is a video documentary produced by the AEC on the occasion of the opening of the new AEC permanent exhibition (May 2019).

Rhythm Recognition and Tempo Tracking for Automatic Accompaniment:
Our "robod" Robo-Drummer

The robod is our little robot drummer that continually listens to its surroundings through a microphone, recognises when music is played, automatically determines the meter and downbeat, and accompanies the musicians in real time, adapting to expressive changes of tempo.
It was designed on the occasion of the BE OPEN Public Science Festival in the city center of Vienna (Sept. 2018), organised by the Austrian Science Fund (FWF) on the occasion of its 50th anniversary.

To come (still in the making ...): Demonstration video.


Results 3:
Some Special Things

Awards, Competitions, and a "Turing Test"



This project receives funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme under grant agreement No 670035.

In addition, we gratefully acknowledge material support for this research (in the form of music, scores, access to musical instruments and performance spaces) from the following institutions:

The Bösendorfer Piano Company, Vienna, Austria

The Royal Concertgebouw Orchestra (RCO), Amsterdam, The Netherlands