[< | >] Contents

In the section headings the links '<', ' | ', and '>' refer to previous heading, contents, and next heading, respectively.

[< | >] The need for guidelines

Current interactive speech systems design is subject to many constraints on the interaction between user and system. These constraints partly derive from technology, partly from limitations in engineering skills, and partly from insufficient theoretical foundations. Interaction design is complex and not fully understood. Yet it is possible to design fully usable or habitable advanced interactive speech systems for certain classes of task.

A key to successful interaction design, we claim, is to ensure adequate cooperativity on the part of the system during interaction. Habitable user-system interaction requires that not only the user but also the system's interaction behaviour be cooperative. If this is not the case, penalties can be severe, ranging from users having to repeatedly initiate clarification and repair meta-communication with the system through to failing to get their task done or abandoning interactive speech system technologies altogether.

A crucial interaction design goal, therefore, is to optimise system cooperativity in order to facilitate smooth interaction in domain communication, meta-communication and other types of communication. Miscommunication always leads to additional user-system exchanges and causes bumpy interaction. Cooperative communication facilitates smooth interaction and prevents unnecessary user-initiated clarification and repair meta-communication, as well as other kinds of unexpected user behaviour with which the system cannot cope. This is important because, with current technologies, the possibilities of on-line handling of clarification and repair meta-communication are seriously limited. It is sometimes assumed that, as long as the system has powerful meta-communication abilities, it matters less how it behaves during domain communication. This is false already because the generation of bumpy interaction is always inefficient and induces user dissatisfaction. What is worse, however, is that really powerful meta-communication abilities are not still not available. User needs for clarification meta-communication that arise from the way the system addresses the domain, can easily surpass its meta-communication skills. For instance, if the system uses a patently ambiguous term it is unlikely that it will be able to respond sensibly to the user who asks what the system means by that term. And if the user unknowingly selects a non-intended meaning of an ambiguous term, the interaction may be well underway towards failure without the system being able to do much about it.

High-quality, on-line repair and clarification meta-communication skills only constitute one aspect of what it means to have a cooperative system. Such skills are of course needed and important. In particular the speech recognition capabilities of interactive speech systems are still fragile. Meta-communication functionality is needed to overcome the effects of system misrecognitions. Users will also sometimes need to have the system's latest utterance repeated, for instance because they did not pay enough attention to what the system just said. Beyond these two unavoidable types of user-initiated repair meta-communication, however, the system should not cause the need for other kinds of clarification and repair meta-communication. It is particularly important to avoid all of most forms of user-initiated clarification meta-communication. The users are likely to cause the need for additional meta-communication functionality, but that is a different matter which may sometimes pose hard problems to interaction model developers. The levels of interaction concept is another aspect of system cooperativity. It is useful for preventing transaction failure when user input is particularly difficult to understand. In such cases the system may ask the user to provide the problematic piece of information in a more and more simple way. The bottom level may be a question to which the user should only answer yes or no. This process is called graceful degradation. However, system cooperativity requires more than meta-communication and graceful degradation.

Speaking generally, the system should always behave in a way which optimises the likelihood that cooperative users get their task done. At any stage during interaction, the cooperative user should know what to do and how to do it, without having been misled or left without guidance by a non-cooperative system. Cooperative interaction design addresses all forms of system communication and it might be asked if there is anything else to good interaction design apart from the design of a cooperative system. Perhaps there is, and politeness design might be a case in point but we shall not address this issue here. The practical problem therefore becomes: how to design cooperative system behaviour? To our knowledge, this question has not been addressed in any systematic way. Answering the question appears to generate the set of guidelines for cooperative spoken interaction design presented below.

[< | >] Interaction aspects, generic and specific guidelines

The guidelines cover seven different aspects of interaction. The distinction between guideline and aspect is important because an aspect serves to highlight the property of interaction addressed by a particular guideline, thus identifying dimensions of cooperativity over and above the level of the cooperative guidelines themselves.

We distinguish between generic and specific guidelines. A generic guideline is general and typically states: "Do (make, be, avoid, provide etc.) X". A generic guideline may subsume one or more specific guidelines related to the generic guideline in a kind-of relationship. Specific guidelines specialise the generic guideline to certain classes of phenomena. Although subsumed by generic guidelines, the specific guidelines are important in interaction design because they serve to elaborate what the interaction model developer should be looking for when designing cooperative system behaviour.

[< | >] Guidelines may overlap and conflict

It should be noted that guidelines may support one another as well as conflict when applied during actual interaction design. When guidelines conflict, the designers have to trade off different design options against one another, with each option having a different weighting of the guidelines. When designing a system introduction, for instance, developers may find that GG2 (don't say too much) conflicts with GG1 (say enough), SG4 (tell what the system can and cannot do) and SG5 (instruct on how to interact with the system). If the introduction is long and complex, and even if all the points made are valid and important, users tend to get bored and inattentive. On the other hand, if the introduction is brief or even non-existent, important information may have been left out, increasing the likelihood of miscommunication during task performance.

[< | >] Background and development of the guidelines

During the design, implementation and test of the interaction model for the Danish dialogue system (1991-1996) [Bernsen et al. 1998b] we developed a set of guidelines for the design of cooperative spoken human-machine interaction [Bernsen et al. 1996a]. A first set of guidelines was developed on the basis of analysis of 120 examples of user-system interaction problems identified in a corpus of dialogues from the Wizard of Oz (WOZ) simulations of the Danish Dialogue System. The guidelines were refined and consolidated through comparison with a well-established body of maxims of cooperative human-human dialogue [Grice 1975] which turned out to form a subset of our guidelines. The consolidated guidelines were then tested as a tool for the diagnostic evaluation of a corpus of 57 dialogues collected during a scenario-based, controlled user test of the implemented Danish Dialogue System. We found that nearly all dialogue design errors in the user test corpus could be classified as violations of our guidelines. Two specific guidelines on meta-communication, SG10 and SG11, had to be added, however. This was no surprise as meta-communication had not been simulated and therefore was mostly absent in the WOZ corpus.

Testing the generality and transferability of the guidelines started in late 1996 [Bernsen et al. 1997b, Dybkjær et al. 1997c]. Generality is being tested by applying them to

  1. systems that are different from the Danish Dialogue System; and
  2. which cover different task domains. Moreover,
  3. the guidelines are being applied as a design guide prior to implementation rather than to the diagnostic evaluation of an implemented system. Finally,
  4. they are being applied in less controlled circumstances compared to those that are obtained in a controlled user test.

Transferability is being tested by investigating:

  1. what it takes for a novice interactive speech system developer to learn to master the guidelines; and
  2. how the required learning steps may be supported and "packaged" for transfer to other developers so that they can easily learn how to use the guidelines.

As regards generality we have so far applied the guidelines as a dialogue design guide to part of a corpus from the Sundial project, and we have used them to evaluate a set of field test dialogues from the Philips corpus. This worked well and no need for new guidelines was observed in any of the two cases. With respect to transferability we trained a visiting researcher in using the guidelines. The results obtained seemed encouraging, taking into account the nature and amount of introductory material provided to the new person.

This web-based tutorial is intended to be used by anybody who would like to learn how to use the guidelines. Any comment which may help us improve the tutorial will be very welcome.

An initial set of guidelines was produced on the basis of a corpus of simulated human-computer dialogues from Wizard of Oz experiments in the Danish Dialogue Project.
The guidelines were compared to Grice's maxims for cooperative human-human conversation [Grice 1975], leading to refinement of the guidelines.
The resulting set of guidelines was tested as a tool for diagnostic evaluation of spoken interaction in the controlled user test of the implemented Danish dialogue system.
The guidelines were tested on other corpora and by different evaluators to investigate their generality and transferability.
A web-based guide on how to use the guidelines was developed.