Speech Interaction Theory

Any advanced interactive speech system has many of the elements described in the overview, but no current system has them all. Together, the elements determine the observable behaviour or performance of the system during interaction. The system’s performance itself has a number of more or less complex properties that emerge from the nature of the elements presented above and which should be considered during development. We discuss these interdependent properties in terms of the performance elements co-operativity, initiative and the system’s influence on user behaviour. The text is adapted for the web from [Bernsen et al. 1998b].

Co-operativity

Habitable user-system interaction requires that both user and system behaviour be co-operative. It is a well-established fact that today’s interactive speech systems are based on the assumption of co-operative user dialogue behaviour [Eckert and McGlashan 1993, Smith and Hipp 1994]. This fact does not, however, pose much of a problem for dialogue developers because the penalty for non-co-operativity is that users fail to get their task done. There is no point in designing the dialogue for non-co-operative users who do not care if they succeed with their task or not. Indeed, this design goal is impossible to achieve in the foreseeable future. However, if the system fails to be co-operative, penalties can be severe, ranging from users having to repeatedly initiate clarification and repair meta-communication with the system through to failing to get the task done or abandoning interactive speech systems technologies altogether. We believe that system co-operativity is crucial to successful interaction model development: it contributes to smooth interaction and reduces the need for meta-communication.

Initiative

The interlocutor who determines the current topic of the interaction is said to have the initiative or to control the course of the interaction. Initiative appears to be a function of the speech acts performed by the interlocutors. Depending on the speech act performed, a speaker who already has the initiative may offer it to the interlocutor as in the question (a): "How may I help you?"; or show a wish to keep the initiative as in the question (b): "Where does the journey start?". The interlocutor may leave the initiative with the speaker, for instance in responding to question (b): "The journey starts in Copenhagen"; take the offered initiative in responding to question (a): "I would like to book a ticket from Copenhagen to Aalborg"; or take the initiative without having been invited in responding to question (b): "I want to travel on Monday". The relationship between speech act and initiative is potentially useful to system developers. Whittaker and Stenton [1988] propose generalisations such as the following: the speaker has control in a request unless followed by a request or directive; in an assertion unless it is a response to a request; and in directives (commands); the listener has (gets) the control in a prompt because the speaker abdicates control. If valid, such rules may enable the system to derive who has the initiative once it has identified the speech act. The latter is difficult, however. For instance, both (a) and (b) above look like requests (for information) but (a) acts as a prompt that gives initiative away whereas (b) acts as a request that preserves initiative.

It is useful to distinguish between the following modes of interaction from the point of view of who has the initiative or who controls the course of the dialogue. An interactive speech system is called system directed if the system has the initiative throughout the interaction; user directed if the user has the initiative throughout; and mixed initiative if both (or all) interlocutors may take the initiative at some or all points during interaction. These modes of interaction may all be found in today’s interactive speech systems except, perhaps, the "free" variety of mixed initiative interaction in which any interlocutor may take the initiative at any time. Several advanced interactive speech systems, such as the Danish Dialogue System and the Philips train timetable inquiry system, use limited mixed initiative interaction in which one of the interlocutors may take the initiative at some points during interaction. Free mixed initiative systems do not yet appear feasible for any but the simplest of tasks. In the Sundial project, experiments were made with free mixed initiative dialogue openings of the "Can I help you?" - type. This opening turned out to strongly invite human-human-style, lengthy and complex accounts from users which the system had no chance of understanding. As the modes of interaction have been defined above, most future advanced interactive speech systems may be expected to have limited mixed initiative. A further distinction among such systems is proposed by [Smith and Hipp 1994].

Influencing user behaviour

By contrast with the system and its behaviour, users are system-external factors that cannot be controlled directly. The fact is, however, that interactive speech systems are vastly inferior to ordinary humans as communication partners. If users do not realise this, they may have unnecessary difficulty completing their interactive task with the system. Somehow, therefore, a reasonably adequate model of how to interact with the system must be communicated to users. Part of this user interaction model can be directly and explicitly conveyed. However, it would be counter-productive to try to explicitly communicate all the system’s peculiarities and relative deficiencies as an interactor. Rather, at least the following sources may help users build a reasonable user interaction model:

Implicit system "instructions" is the most interesting item on this list. What we call "implicit instructions" build on the fact that speakers adapt their behaviour to the observed properties of the listener. Some of these "instructions" are provided through the systems vocabulary, grammar and style . Moreover, it appears that people tend to use less sophisticated spoken language when they believe that they communicate with a computer system rather than a human being [Amalberti et al. 1993]. This is useful, and whatever strategy may be found which induces users to treat the system as an idiot savant, should be considered by developers. Finally, of course, the system’s repair and clarification meta-communication will affect the user interaction model by making some of the system’s recognition and understanding difficulties clear to users. However, developers should not interpret the latter point as a license to ignore the central goal of optimising system co-operativity. Strong system meta-communication facilities are not an acceptable alternative to smooth interaction which requires little or no meta-communication. Furthermore, strong meta-communication facilities do not yet exist in interactive speech systems.

Explicit designer instructions comprise all sorts of (system-) external information provided to users prior to use of the system. The provision of such information may make sense in, e.g., controlled user tests. Similarly, speaker-dependent interactive speech systems may come with ample written instruction to their users. One of the crucial advantages of advanced interactive speech systems, however, is that speaker-independent spontaneous speech is a highly natural modality which is extremely well suited to walk-up-and-use applications. And for such systems it is often not possible to provide written instructional material.