Vocalic Markers of Deception and Cognitive dissonance for Automated Emotion Detection Systems

Aaron Elkins
Post-Doctoral Research Associate, University of Arizona/Post Doctoral Research Associate, Imperial College, London
This dissertation investigates vocal behavior, measured using standard acoustic and commercial vocal analysis software, as it occurs naturally while lying, experiencing cognitive dissonance, or receiving a security interview conducted by an Embodied Conversational Agent (ECA). In study one, vocal analysis software used for credibility assessment was investigated experimentally. Using a repeated measures design, 96 participants lied and told the truth during a multiple question interview. The vocal analysis software's built-in deception classifier performed at the chance level. When the vocal measurements were analyzed independent of the software's interface, the variables FMain (Stress), AVJ (Cognitive Effort), and SOS (Fear) significantly differentiated between truth and deception. Using these measurements, a logistic regression and machine learning algorithms predicted deception with accuracy up to 62.8%. Using standard acoustic measures, vocal pitch and voice quality was predicted by deception and stress. In study two, deceptive vocal and linguistic behaviors were investigated using a direct manipulation of arousal, affect, and cognitive difficulty by inducing cognitive dissonance. Participants (N=52) made verbal counter-attitudinal arguments out loud that were subjected to vocal and linguistic analysis. Participants experiencing cognitive dissonance spoke with higher vocal pitch, response latency, linguistic Quantity, and Certainty and lower Specificity. Linguistic Specificity mediated the dissonance and attitude change. Commercial vocal analysis software revealed that cognitive dissonance induced participants exhibited higher initial levels of Say or Stop (SOS), a measurement of fear. Study three investigated the use of the voice to predict trust. Participants (N=88) received a screening interview from an Embodied Conversational Agent (ECA) and reported their perceptions of the ECA. A growth model was developed that predicted trust during the interaction using the voice, time, and demographics. In study four, border guards participants were randomly assigned into either the Bomb Maker (N = 16) or Control (N = 13) condition. Participants either did or did not assemble a realistic, but non-operational, improvised explosive device (IED) to smuggle past an ECA security interviewer. Participants in the Bomb Maker condition had 25.34% more variation in their vocal pitch than the control condition participants. This research provides support that the voice is potentially a reliable and valid measurement of emotion and deception suitable for integration into future technologies such as automated security screenings and advanced human-computer interactions.