Abstract
Recent advances in Multimodal Emotion Recognition in Conversations (MERC) focus on speaker-aware context modeling and multimodal fusion. However, there remain three challenges: (1) Lack of standardized evaluation protocols complicates fair model comparison; (2) Static datasets fixed speakers, topics, and corpora hinder generalization to unseen scenarios; (3) Overconfident predictions undermine reliability. To address these, this paper explores three critical questions: (1) Are existing MERC models performing adequately? (2) Can they generalize to diverse scenarios? (3) Can we trust their confidence? Through a rigorous reassessment of existing models, including tests on unseen scenarios, we identify key strengths, weaknesses, and synergies across different models, while highlighting the crucial role of confidence calibration in improving model reliability and fairness.2
| Original language | English |
|---|---|
| Article number | 113087 |
| Journal | Pattern Recognition |
| Volume | 175 |
| DOIs | |
| State | Published - Jul 2026 |
| Externally published | Yes |
Scopus Subject Areas
- Software
- Signal Processing
- Computer Vision and Pattern Recognition
- Artificial Intelligence
Keywords
- Calibration
- Conversational emotion recognition
- Generalization
Fingerprint
Dive into the research topics of 'Is multimodal conversational emotion recognition satisfactory? Exploring the gaps in performance, generalization, and confidence'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver