How to use the guidelines as a design guide

This document provides a cookbook description of how to use the guidelines as a design guide during the development of a spoken language dialogue system. The same procedure may also be used for the purpose of diagnostic evaluation. The method is illustrated through a walkthrough of two dialogues from an early Sundial Wizard of Oz (WOZ) corpus which concerns flight information. Interaction problems are identified and analysed on the basis of the guidelines for cooperative human-computer dialogue. In addition to the walkthrough, an overview in terms of a typology of the problems observed in the corpus is provided.

The dialogues are mixed-initiative and we do not have the scenarios. Problem identification must therefore be done by applying the guidelines to each system utterance, isolated and in context, thus revealing potential as well as actual problems. Analysis of identified problems is design oriented.

[< | >] Table of contents

In the section headings the links '<', ' | ', and '>' refer to previous heading, contents, and next heading, respectively.

[< | >] Analysis of two Sundial WOZ dialogues

In the following we propose an overall approach to using the guidelines as an early design guide based on the experiences from the analysis of the Sundial corpus. The approach is illustrated through a walkthrough of two different dialogues from the Sundial corpus.

[< | >] Cookbook description of the method

  1. Select a few dialogues, e.g. 3-5, for careful analysis. For each utterance, check each aspect and guideline in turn to see if it is violated. Make extensive notes on all possible problems identified in these dialogues.

    To get an idea of how to do this, look at the guidelines and at dialogue 8:9. Study the first system utterance, S8:9-1, for a moment.

    Then look at the analysis of the first system turn, S8:9-1. Did you find that many violations? Now consider the second system utterance, S8:9-2, and consider how many violations you can find. Our analysis is in S8:9-2.

  2. Categorise the identified problems into a typology of task dependent violations. Group them according to the violated generic and specific guidelines.

    See the Sundial typology to get an idea of the level of abstraction which seems appropriate.

  3. Now select a larger number of dialogues for analysis. Check each identified problem against the typology of violations. If a corresponding type is found, the violation is categorised as a case of this type, otherwise a new type is introduced and added to the typology.

    Consider the analysed dialogue 10:6. The annotation concerning guideline violations is much briefer than the one used in dialogue 8:9. When constructing the initial typology a detailed and careful analysis is needed whereas a faster approach can be used when one has become more acquainted with the corpus. Expectably, however, more complex cases will still need a deeper analysis in order to resolve the design decisions which may include heavy trade-offs between e.g. naturalness, technical constraints, and economy.

[< | >] Dialogue 8:9

Scenario 2b
S8:9-1 flight information can I help you (2.5)
U8:9-1 yeah good afternoon er:m (1.5) tomorrow (.5) eve- ahr yes tomorrow night (.5) (sa-em should be a flight arriving from warsaw (.5) er:m: it arrives (.5) I think a heathrow terminal two at around about half past nine (.7) could you confirm that please (5.5)
S8:9-2 please wait (37) I'm sorry there are no british airways flights from warsaw please try polish airlines (5)
U8:9-2 hm: (3)
S8:9-3 no (5.5)
U8:9-3 er:m (1) okay is there a flight that arrives (.5) at heathrow terminal two at nine thirty pee em tomorrow (4.5)
S8:9-4 please wait (31) flight pee oh two eight one from warsaw to london heathrow terminal two arrives at nine forty pee em (2.3)
U8:9-4 great (.5) that's it then (.5) thanks (3.5)
S8:9-5 thank you good bye

[< | >] Dialogue 8:9, analysed

Scenario 2b
S8:9-1 flight information can I help you (2.5)
Violation guideline GG3: The system apparently only handles BA flight information although sometimes it also possesses other information. Anyway, the introduction announces a much broader domain coverage than the system actually has. This may easily mislead users to ask out-of-domain questions which cannot be handled. (type GG3-SD1)
Violation guideline SG3: Probably deliberate (experimental) variation in system opening. The system's introduction varies from dialogue to dialogue. This probably reflects a deliberate decision to test the effect on users of various system introductions. Still, in a working application this may cause confusion in users resulting in user questions which the system cannot understand. The core problem is that not all system introductions provide the same information or are equally informative. The designers must optimise the introduction rather than varying it. (type SG3-SD1)
Violation guideline SG4: The system provides too little information about its capabilities and limitations. It is of course an ideal that little information is necessary. However, the risk is that the user will be misled and assume stronger or weaker system capabilities than are actually present. Designers should look out for symptoms to this effect. The present introduction suggests that users can ask about anything to do with flight information. No current system is able to do that. (type SG4-SD1)
Violation guideline SG5: The system provides no information on how to interact with it. It is of course an ideal that this should not be necessary. However, the risk is that the user will be misled and assume that one may interact with the system just like with a human operator, which is not possible today. Designers should look out for symptoms to this effect. (type SG5-SD1)
U8:9-1 yeah good afternoon er:m (1.5) tomorrow (.5) eve- ahr yes tomorrow night (.5) (sa-em should be a flight arriving from warsaw (.5) er:m: it arrives (.5) I think a heathrow terminal two at around about half past nine (.7) could you confirm that please (5.5)
User symptom guideline SG5: The user's question is rather verbose and redundant and the system has done nothing to prevent this by informing on how to interact with it. (type SG5-SD1)
S8:9-2 please wait (37) I'm sorry there are no british airways flights from warsaw please try polish airlines (5)
Violation guideline GG1: The system does not provide the requested information although it has it, cf. S8:9-4. (type GG1-SD7)
Violation guideline GG5: The system has announced that it can handle flight information in general. Therefore it seems irrelevant to tell the user that there are no British Airways flights from Warsaw. This was not what the user wanted to know, and apparently the system actually has the desired information, cf. utterance S8:9-4. (type GG5-SD4)
Violation guideline SG3: Different formulation from S8:1-6 and from S8:3-3. (type SG3-SD1)
Violation guideline SG8: Since the system has announced that it can handle flight information in general, the user can rightly expect the system to answer his question. However, this does not happen in this system utterance. (type SG8-SD1)
U8:9-2 hm: (3)
Note: Interestingly, the user questions the system's status as a perfect domain expert. This should be virtually inconceivable in an implemented system.
S8:9-3 no (5.5)
Note: The system is able to discuss its knowledge at meta-level. This is beside the point in current information systems which should be perfect domain experts.
Violation guideline GG4: The system should never say something for which it does not have sufficient evidence. (type GG4-SD1)
U8:9-3 er:m (1) okay is there a flight that arrives (.5) at heathrow terminal two at nine thirty pee em tomorrow (4.5)
Note: This is basically the same question as in U8:9-1 just formulated in a different way. So the user still believes that the system might have the requested information and tries to find out how to get it.
S8:9-4 please wait (31) flight pee oh two eight one from warsaw to london heathrow terminal two arrives at nine forty pee em (2.3)
Note: The system delivers a standard flight no. response package. Note that it turns out to know about flights other than British Airways. The user has no idea about the extent of the system's knowledge.
Violation guideline SG2: No feedback on day. The user asked about a flight tomorrow. This is not reflected in the system's feedback. Is there a flight every day from Warsaw to London arriving at 9:40 pm?. (type SG2-SD1)
Violation guideline GG7: The system does not explicitly distinguish between scheduled and actual arrival time. It is not clear which day the system is talking about and thus it is also not clear if the time provided is the expected arrival time for the flight today and not the arrival time indicated in the timetable. (type GG7-SD5)
U8:9-4 great (.5) that's it then (.5) thanks (3.5)
S8:9-5 thank you good bye
Violation guideline SG3: Another variation of the final system phrase in each dialogue. (type SG3-SD1)

[< | >] Dialogue 10:6

Scenario 19
S10:6-1 flight information british airways good day can I help you (1.5)
U10:6-1 yes I was wondering (.3) whether flight number bee ay two three eight from orlando has arrived (.3) it was supposed to arrive at twenty five to eight (8.5)
S10:6-2 please wait (37) flight bee ay two three eight from orlando has been delayed (1.7) it will be arriving at eleven fifteen (2)
U10:6-2 is that eleven sixteen in the evening (5)
S10:6-3 please wait (13) flight bee ay two three eight from orlando has been delayed it will be arriving at eleven fifteen (1.5)
U10:6-3 is that eleven sixteen (.) pee em: or ay: em: (5)
S10:6-4 please wait (17) eleven fifteen ay em
U10:6-4 thank you (4)
S10:6-5 thank you good bye

[< | >] Dialogue 10:6, analysed

Faster analysis after creating a task specialised typology of problem. For phenomena already in the typology, only the reference is inserted.

Scenario 19
S10:6-1 flight information british airways good day can I help you (1.5)
Violation: guideline SG3 type SG3-SD1
Violation: guideline SG4 type SG4-SD1
Violation: guideline SG5 type SG5-SD1
U10:6-1 yes I was wondering (.3) whether flight number bee ay two three eight from orlando has arrived (.3) it was supposed to arrive at twenty five to eight (8.5)
S10:6-2 please wait (37) flight bee ay two three eight from orlando has been delayed (1.7) it will be arriving at eleven fifteen (2)
Violation: guideline GG7 type GG7-SD6
Violation: guideline SG3 type SG3-SD1
U10:6-2 is that eleven sixteen in the evening (5)
Note: The user now needs clarification because of the ambiguity of S10:6-2.
S10:6-3 please wait (13) flight bee ay two three eight from orlando has been delayed it will be arriving at eleven fifteen (1.5)
Violation: guideline GG5 type GG5-SD1
Violation: guideline GG7 type GG7-SD6
Violation: guideline GG2 type GG2-SD3: System was asked only to clarify the ambiguous time specification. Flight number and "Orlando" are only repetitions of the feedback in S10:6-2-b.
Violation: guideline SG10 type GG13-SD1: The user has misunderstood the time of arrival mentioned by the system. The system should initiate repair of this misunderstanding.
U10:6-3 is that eleven sixteen (.) pee em: or ay: em: (5)
Note: The user still needs clarification because of the failure to provide clarification in S10:6-3.
S10:6-4 please wait (17) eleven fifteen ay em
Violation: guideline SG10 type GG13-SD1.
U10:6-4 thank you (4)
S10:6-5 thank you good bye
Violation: guideline SG3 type SG3-SD1 Another variation of the final system phrase in each dialogue.

[< | >] Typology of system problems found in the Sundial corpus

Early WOZ dialogues seem to produce more, and often more complex, violations, i.e. one system utterance violates several different guidelines, than dialogues from later systems development phases. In a corpus containing as many guideline violations as the Sundial WOZ corpus, it will be very time consuming if not practically impossible to find all the individual violations. It is also unnecessary, because what is needed for repairing the dialogue design are the types of guideline violations that occur. The number of individual violations may support estimates of system performance and acceptability but is of little importance otherwise, as many violations are identical. We therefore established a typology of guideline violations during the analysis of the corpus. This, highly task dependent, typology provides an overview of the different ways in which each individual guideline was violated in the corpus. The typology is useful for revising the dialogue model.

Guideline Violation
Numbered relative to the guidelines. "SD" refers to "SunDial". Violations may occur under more than one guideline, in which case they are cross-referenced.
GG1
Violation: System provides less information than required.
SD1. actual arrival/departure not stated (GG7-SD1)
SD2. scheduled arrival/departure not stated (GG7-SD2)
SD3. failed S clarification (GG5-SD1)
SD4. S should offer phone no.
SD5. S should specify the information it needs
SD6. S provides insufficient information for the user to determine if it is the wanted answer
SD7. S has information but does not provide it
SG1
Violation: System not fully explicit in communicating to users the commitments they have made
SG2
Violation: Missing system feedback on user information.
SD1. no feedback on arrival/departure day, on BA and/or on route
SD2. missing/ambiguous feedback on time (GG7-SD3)
SD3. U: arriving flights?, S: leaving flights: imprecise feedback
GG2
Violation: System provides more information than required.
SD1. U: has phone no. S: offers phone no. (GG5-SD2).
SD2. S repeats more than the 4 phone no. digits asked for
SD3. flight no. and Orlando are superfluous
GG3
Violation: System provides false information.
SD1. "flight info." known to be false: S knows only BA
GG4
Violation: System provides information for which it lacks evidence.
SD1. system says it is not sure of the information it provided
GG5
Violation: System provides irrelevant information.
SD1. failed S clarification (GG1-SD3)
SD2. U: has phone no. S: offers phone no. (GG2-SD1)
SD3. departure time instead of arrival time provided
SD4. S: handles all flights - "BA does not handle Airline X."
SD5. S: encourages inquiry on airline unknown to it
GG6
Violation: Obscure system utterance.
SD1. S: "no flights are leaving Crete today" (GG7-SD4)
SD2. S: "flights between London and Aberdeen are not part of the BA shuttle service, there is a service from London Heathrow terminal one" (rc from GG5 to GG6)
GG7
Violation: Ambiguous system utterance.
SD1. actual arrival/departure not stated (GG1-SD1)
SD2. scheduled arrival/departure not stated (GG1-SD2)
SD3. missing/ambiguous feedback on time (SG2-SD2)
SD4. S: "no flights are leaving Crete today" (GG6-SD1)
SD5. scheduled vs. actual arrival/dep. time not distinguished
SD6. AM and PM not distinguished
SG3
Violation: System does not provide same formulation of the same question to users everywhere in its dialogue turns.
SD1. many variations in S's phrases
GG8
Violation: Too lengthy expressions provided by system.
GG9
Violation: System provides disorderly discourse.
GG10
Violation: System does not inform users of important non-normal characteristics which they should, and are able to, take into account to behave co-operatively in dialogue.
SG4
Violation: Missing or unclear information on what the system can and cannot do.
SD1. too little said on what system can and cannot do: BA often missing, time-table enquiries always missing (SG8-SD1)
SG5
Violation: Missing or unclear instructions on how to interact with the system.
SD1. open S intro requires interaction instructions on waiting, verbosity etc.
GG11
Violation: System does not take users' relevant background knowledge into account.
SG6
Violation: Lacking anticipation of domain misunderstanding by analogy.
SG7
Violation: System does not separate when possible between the needs of novice and expert users.
GG12
Violation: System does not consider legitimate user expectations as to its own background knowledge.
SG8
Violation: Missing system domain knowledge and inference.
SD1. Too little said on what the system can and cannot do; this creates user expectations which the system cannot meet (SG4-SD1)
GG13
Violation: System does not initiate repair or clarification meta-communication in case of communication failure.
SD1. S should initiate repair of arrival time misunderstood by the user.
SG9
Violation: System does not initiate repair if it has failed to understand the user.
SG10
Violation: Missing clarification of inconsistent user input.
SG11
Violation: Missing clarification of ambiguous user input.