Go to JKU Homepage
Institute of Computational Perception
What's that?

Institutes, schools, other departments, and programs create their own web content and menus.

To help you better navigate the site, see here where you are at the moment.

Multimedia Data Mining

Student Projects in Multimedia Data Mining

Contacts (if not stated otherwise): Markus Schedl, Shah Nawaz

These student projects can be started any time (including holidays) and can span over semester boundaries.

Remark: We are open for new proposals - if you are interested in Multimedia Data Mining, feel free to contact us!


Contact Markus Schedl for the following topics:

  • Extracting music listening intents/purposes from music-related multimedia and behavioral data
  • Music lyrics analysis
  • Tracing geographic spread/flow of music around the globe
  • Predict popularity of online multimedia content
  • Intelligent music browsing interfaces
  • Personality prediction from social media/multimedia data
  • Zero-Trust Framework for Adversarial data rating system


Contact Shah Nawaz for the following topics:

  • Multimodal Learning System Robust to Missing Modalities: Multimodal data collected from the real-world are often imperfect due to missing modalities, resulting in a significantly deteriorated performance. The goal of the project is to develop a multimodal system robust to missing modalities. The work will focus on investigating: are single stream transformers robust to missing modality.
  • Face-voice Association and Impact of Multiple Languages: Face-voice association is established between faces and voices of a celebrity by using cross-modal verification, matching and retrieval tasks. The goal of the project is to establish the relationship between faces and voices and analyse the impact of multiple languages on this association.
  • Single-branch Network for Multimodal Training: With the rapid growth of social media platforms, users are sharing billions of multimedia posts containing audio, images, and text. Researchers have focused on building autonomous systems capable of processing such multimedia data to solve challenging multimodal tasks including cross-modal retrieval, matching, and verification. Existing works use separate networks to extract embeddings of each modality to bridge the gap between them. The modular structure of their branched networks is fundamental in creating numerous multimodal applications and has become a defacto standard to handle multiple modalities. The goal of the project is to bridge the gap between multiple modalities with a single-branch network. Moreover, the study can focus on internal working of a single-branch network.
  • Emotion Recognition in Speech using Cross-Modal Transfer in the Wild: The goal of the project is to learn a representation that is aware of the emotional content in speech prosody by transferring such emotional knowledge from face images extracted synchronously. For this to be possible, the emotional content of speech must correlate with the facial expression of the speaker.
  • Multimodal Pre-train then Transfer Learning Approach: The goal of the project is to investigate if transforming all modalities in the same space with weight sharing helps to improve downstream vision or language tasks.