Final Report

Abstract

The AAAI symposium on Turn-taking and Coordination in Human-Machine Interaction was held at Stanford University in Stanford, California from March 23-25, 2015. The symposium brought together researchers from multiple disciplines–including multimodal systems, human-robot interaction, embodied conversational agents, computational linguistics, and spoken dialogue systems–to discuss a topic of common interest: the modeling, realization, and evaluation of turn-taking and real-time action coordination between humans and artificial interactive systems.

Final Report

Turn-taking in human-human conversation is an interactive, mixed-initiative, highly coordinated, and inherently multimodal process by which participants synchronize their verbal and non-verbal exchanges during interactions. Regulating turn-taking with artificial systems hinges critically on multimodal sensing, making decisions under uncertainty and time constraints, and coordinating behaviors across different output modalities. The turn-taking system must extract and integrate multiple audio-visual signals and knowledge sources to make inferences about user utterances, transition relevance places, floor control actions, speech sources and addresses in multiparty settings, and more.

This symposium brought together researchers from different fields that approach the common problem of turn-taking and coordination from somewhat different angles. The primary purpose was to build more common ground for researchers from these different backgrounds, to share perspectives, methodologies, and results from different investigations into the problem of turn-taking and coordination, and to promote communication, collaboration, and discussion on how to make progress in this space.

A large number of diverse themes were featured in the presentations, including, but not limited to: models of turn-taking; multimodal sensing, inference and decision-making for turn-taking; annotations and characterizations of various turn-taking phenomena; the role of gaze and gesture in turn-taking and grounding; experimental methodologies like Wizard-of-Oz; understanding how various contextual and cultural factors, genre, rapport, or social relationships affect turn-taking; other coordinative processes like engagement; and more broadly, various applications, opportunities and challenges in this domain. While a significant proportion of the work presented focused on turn-taking in spoken language interaction, a number of other domains, e.g., interactions and coordination with robots were also prominently featured.

The symposium also featured two invited talks. The first talk, given by Jill Fain Lehman from Disney Research Pittsburgh, showcased her group’s work over the past several years on developing language-based interaction systems and games for small groups of young children. Jill highlighted the difficulties of coordinating turns with groups of children, especially as the children begin having more fun and the situation becomes chaotic. The second invited talk was given by Jeremy Frank from NASA Ames in which he presented work from NASA’s Autonomous Mission Operations project that investigates the impact of long communication time delays on the ability to conduct human spaceflight mission operations. The contrast between these two different ends of a spectrum for turn taking — from very fast-paced, chaotic yet fun interactions with children, all the way to systems that support long communication delays in spaceflight operations, frames the diversity of approaches into the complex coordination problems that were discussed throughout the symposium.

In a breakout session, the symposium participants discussed the challenges shared between different research fields concerning turn-taking with highly interactive systems. They also expressed a desire for the development of common infrastructure, frameworks and tools, as well as shared corpora in this space. Open questions were suggested, such as how might computer scientists engage more with social scientists, designers, and artists that might bring a different perspective and knowledge of turn-taking? What should be the agreed upon units of interaction in turn-taking, choosing amongst whole turns, utterances, fixed time-slices, and so on? Can the community converge on a shared challenge problem for the field? The symposium participants finally expressed interest in attending future symposia or workshops within this space, possibly targeting specific conferences in relevant fields.

Sean Andrist (University of Wisconsin-Madison), Dan Bohus (Microsoft Research), Eric Horvitz (Microsoft Research), Bilge Mutlu (University of Wisconsin-Madison), and David Schlangen (Bielefeld University) served as co-chairs of this symposium. The papers of the symposium were published as AAAI Press Technical Report SS-15-07.