MuSe 2021

The Multimodal Sentiment Challenge

Sentiment, Emotion, Physiological-Emotion, and Stress

A full day workshop in conjunction with ACM Multimedia 2021

Thank you for this year's incredible participation of 61 teams from 16 countries and 47 academic institutions! Thanks also to ACM Multimedia organisers at Changu, China. See you all next year!

Challenge: The Multimodal Sentiment Analysis Challenge series (MuSe 2021) focuses on multimodal sentiment recognition of user-generated content and in stress-induced situations. The competition is a satellite event of the 29th ACM International Conference on Multimedia (Chengdu, China), aimed to compare multimedia processing and deep learning methods for automatic audiovisual, biological, and textual based sentiment and emotion sensing, under a common experimental condition set.

The goal of the MuSe is to provide a common benchmark test set for multimodal information processing and to bring together the Affective Computing, Sentiment Analysis, and Health Informatics research communities, to compare the merits of multimodal fusion for a large amount of modalities under well-defined conditions.

data: here || baseline paper: here || baseline git code: here || results on test: here || paper submission: here

We are pleased to announce that MuSe 2020 and MuSe 2021 data are now available after the challenges have closed for research projects (academic institutions only)! To get access to the MuSe-CaR and/or Ulm-TSST data sets and the corresponding challenge labels, please download the respective EULA(s) - End User License Agreement(s).

MuSe 2021 featured four sub-challenges:

Based on last years' MuSe-CaR dataset, extended by a novel gold standard fusion method:

  1. Multimodal Continuous Emotions in-the-Wild Sub-challenge (MuSe-Wilder): Predicting the level of emotional dimensions (arousal, valence) in a time-continuous manner from audio-visual recordings.

  2. Multimodal Sentiment Sub-challenge (MuSe-Sent): Predicting advanced intensity classes of emotions based on valence and arousal for segments of audio-visual recordings.

Based on the novel audio-visual-text Ulm-TSST dataset, covering people in stressed dispositions:

  1. Multimodal Emotional Stress Sub-challenge (MuSe-Stress): Predicting the level of emotional arousal and valence in a time-continuous manner from audio-visual recordings.

  2. Multimodal Physiological-Arousal Sub-challenge (MuSe-Physio): Predicting the level of psycho-physiological arousal from a) human annotations fused with b) galvanic skin response (also known as Electrodermal Activity (EDA)) signals from the stressed people as regression task. Audio-visual recordings as well as other biological signals (heart rate and respiration) are offered for modelling.

Symbolic photo for MuSe-TSST instead by Tim Gouw on Unsplash
MuSe-CAR including features of GoCaRD

Academic Institutions

Carousel imageCarousel imageCarousel imageCarousel imageCarousel image

Industrial Sponsors

Carousel imageCarousel imageCarousel image