Responding appropriately to others' facial expressions is key to successful social functioning. Despite the large body of work on face perception and spontaneous responses to static faces, little is known about responses to faces in dynamic, naturalistic situations, and no study has investigated how goal directed responses to faces are influenced by learning during dyadic interactions. To experimentally model such situations, we developed a novel method based on online integration of electromyography signals from the participants’ face (corrugator supercilii and zygomaticus major) during facial expression exchange with dynamic faces displaying happy and angry facial expressions. Fifty-eight participants learned by trial-and-error to avoid receiving aversive stimulation by either reciprocate (congruently) or respond opposite (incongruently) to the expression of the target face. Our results validated our method, showing that participants learned to optimize their facial behaviour, and replicated earlier findings of faster and more accurate responses in congruent versus incongruent conditions. Moreover, participants performed better on trials when confronted with smiling, when compared with frowning, faces, suggesting it might be easier to adapt facial responses to positively associated expressions. Finally, we applied drift diffusion and reinforcement learning models to provide a mechanistic explanation for our findings which helped clarifying the underlying decision-making processes of our experimental manipulation. Our results introduce a new method to study learning and decision-making in facial expression exchange, in which there is a need to gradually adapt facial expression selection to both social and non-social reinforcements.