Emotions are messy. They don't just live in our words — they hide in our tone, our facial twitches, and the way we pause mid-sentence. That’s exactly why unimodal emotion recognition (just text or just audio) often falls short. So, for my most recent...