How to use the guidelines as a design guide
This document provides a cookbook description of how to use the guidelines as a design guide during the development of a spoken language dialogue system. The same procedure may also be used for the purpose of diagnostic evaluation. The method is illustrated through a walkthrough of two dialogues from an early Sundial Wizard of Oz (WOZ) corpus which concerns flight information. Interaction problems are identified and analysed on the basis of the guidelines for cooperative human-computer dialogue. In addition to the walkthrough, an overview in terms of a typology of the problems observed in the corpus is provided.
The dialogues are mixed-initiative and we do not have the scenarios. Problem identification must therefore be done by applying the guidelines to each system utterance, isolated and in context, thus revealing potential as well as actual problems. Analysis of identified problems is design oriented.
- Analysis of two Sundial WOZ dialogues
In the section headings the links '<', ' | ', and '>' refer to previous heading, contents, and next heading, respectively.
In the following we propose an overall approach to using the guidelines as an early design guide based on the experiences from the analysis of the Sundial corpus. The approach is illustrated through a walkthrough of two different dialogues from the Sundial corpus.
Select a few dialogues, e.g. 3-5, for careful analysis. For each utterance, check each aspect and guideline in turn to see if it is violated. Make extensive notes on all possible problems identified in these dialogues.
Then look at the analysis of the first system turn, S8:9-1. Did you find that many violations? Now consider the second system utterance, S8:9-2, and consider how many violations you can find. Our analysis is in S8:9-2.
Categorise the identified problems into a typology of task dependent violations. Group them according to the violated generic and specific guidelines.
See the Sundial typology to get an idea of the level of abstraction which seems appropriate.
Now select a larger number of dialogues for analysis. Check each identified problem against the typology of violations. If a corresponding type is found, the violation is categorised as a case of this type, otherwise a new type is introduced and added to the typology.
Consider the analysed dialogue 10:6. The annotation concerning guideline violations is much briefer than the one used in dialogue 8:9. When constructing the initial typology a detailed and careful analysis is needed whereas a faster approach can be used when one has become more acquainted with the corpus. Expectably, however, more complex cases will still need a deeper analysis in order to resolve the design decisions which may include heavy trade-offs between e.g. naturalness, technical constraints, and economy.
|S8:9-1||flight information can I help you (2.5)|
|Violation guideline GG3: The system apparently only handles BA flight information although sometimes it also possesses other information. Anyway, the introduction announces a much broader domain coverage than the system actually has. This may easily mislead users to ask out-of-domain questions which cannot be handled. (type GG3-SD1)|
|Violation guideline SG3: Probably deliberate (experimental) variation in system opening. The system's introduction varies from dialogue to dialogue. This probably reflects a deliberate decision to test the effect on users of various system introductions. Still, in a working application this may cause confusion in users resulting in user questions which the system cannot understand. The core problem is that not all system introductions provide the same information or are equally informative. The designers must optimise the introduction rather than varying it. (type SG3-SD1)|
|Violation guideline SG4: The system provides too little information about its capabilities and limitations. It is of course an ideal that little information is necessary. However, the risk is that the user will be misled and assume stronger or weaker system capabilities than are actually present. Designers should look out for symptoms to this effect. The present introduction suggests that users can ask about anything to do with flight information. No current system is able to do that. (type SG4-SD1)|
|Violation guideline SG5: The system provides no information on how to interact with it. It is of course an ideal that this should not be necessary. However, the risk is that the user will be misled and assume that one may interact with the system just like with a human operator, which is not possible today. Designers should look out for symptoms to this effect. (type SG5-SD1)|
|U8:9-1||yeah good afternoon er:m (1.5) tomorrow (.5) eve- ahr yes tomorrow night (.5) (sa-em should be a flight arriving from warsaw (.5) er:m: it arrives (.5) I think a heathrow terminal two at around about half past nine (.7) could you confirm that please (5.5)|
|User symptom guideline SG5: The user's question is rather verbose and redundant and the system has done nothing to prevent this by informing on how to interact with it. (type SG5-SD1)|
|S8:9-2||please wait (37) I'm sorry there are no british airways flights from warsaw please try polish airlines (5)|
|Violation guideline GG1: The system does not provide the requested information although it has it, cf. S8:9-4. (type GG1-SD7)|
|Violation guideline GG5: The system has announced that it can handle flight information in general. Therefore it seems irrelevant to tell the user that there are no British Airways flights from Warsaw. This was not what the user wanted to know, and apparently the system actually has the desired information, cf. utterance S8:9-4. (type GG5-SD4)|
|Violation guideline SG3: Different formulation from S8:1-6 and from S8:3-3. (type SG3-SD1)|
|Violation guideline SG8: Since the system has announced that it can handle flight information in general, the user can rightly expect the system to answer his question. However, this does not happen in this system utterance. (type SG8-SD1)|
|Note: Interestingly, the user questions the system's status as a perfect domain expert. This should be virtually inconceivable in an implemented system.|
|Note: The system is able to discuss its knowledge at meta-level. This is beside the point in current information systems which should be perfect domain experts.|
|Violation guideline GG4: The system should never say something for which it does not have sufficient evidence. (type GG4-SD1)|
|U8:9-3||er:m (1) okay is there a flight that arrives (.5) at heathrow terminal two at nine thirty pee em tomorrow (4.5)|
|Note: This is basically the same question as in U8:9-1 just formulated in a different way. So the user still believes that the system might have the requested information and tries to find out how to get it.|
|S8:9-4||please wait (31) flight pee oh two eight one from warsaw to london heathrow terminal two arrives at nine forty pee em (2.3)|
|Note: The system delivers a standard flight no. response package. Note that it turns out to know about flights other than British Airways. The user has no idea about the extent of the system's knowledge.|
|Violation guideline SG2: No feedback on day. The user asked about a flight tomorrow. This is not reflected in the system's feedback. Is there a flight every day from Warsaw to London arriving at 9:40 pm?. (type SG2-SD1)|
|Violation guideline GG7: The system does not explicitly distinguish between scheduled and actual arrival time. It is not clear which day the system is talking about and thus it is also not clear if the time provided is the expected arrival time for the flight today and not the arrival time indicated in the timetable. (type GG7-SD5)|
|U8:9-4||great (.5) that's it then (.5) thanks (3.5)|
|S8:9-5||thank you good bye|
|Violation guideline SG3: Another variation of the final system phrase in each dialogue. (type SG3-SD1)|
Faster analysis after creating a task specialised typology of problem. For phenomena already in the typology, only the reference is inserted.
|S10:6-1||flight information british airways good day can I help you (1.5)|
|Violation: guideline SG3 type SG3-SD1|
|Violation: guideline SG4 type SG4-SD1|
|Violation: guideline SG5 type SG5-SD1|
|U10:6-1||yes I was wondering (.3) whether flight number bee ay two three eight from orlando has arrived (.3) it was supposed to arrive at twenty five to eight (8.5)|
|S10:6-2||please wait (37) flight bee ay two three eight from orlando has been delayed (1.7) it will be arriving at eleven fifteen (2)|
|Violation: guideline GG7 type GG7-SD6|
|Violation: guideline SG3 type SG3-SD1|
|U10:6-2||is that eleven sixteen in the evening (5)|
|Note: The user now needs clarification because of the ambiguity of S10:6-2.|
|S10:6-3||please wait (13) flight bee ay two three eight from orlando has been delayed it will be arriving at eleven fifteen (1.5)|
|Violation: guideline GG5 type GG5-SD1|
|Violation: guideline GG7 type GG7-SD6|
|Violation: guideline GG2 type GG2-SD3: System was asked only to clarify the ambiguous time specification. Flight number and "Orlando" are only repetitions of the feedback in S10:6-2-b.|
|Violation: guideline SG10 type GG13-SD1: The user has misunderstood the time of arrival mentioned by the system. The system should initiate repair of this misunderstanding.|
|U10:6-3||is that eleven sixteen (.) pee em: or ay: em: (5)|
|Note: The user still needs clarification because of the failure to provide clarification in S10:6-3.|
|S10:6-4||please wait (17) eleven fifteen ay em|
|Violation: guideline SG10 type GG13-SD1.|
|U10:6-4||thank you (4)|
|S10:6-5||thank you good bye|
|Violation: guideline SG3 type SG3-SD1 Another variation of the final system phrase in each dialogue.|
Early WOZ dialogues seem to produce more, and often more complex, violations, i.e. one system utterance violates several different guidelines, than dialogues from later systems development phases. In a corpus containing as many guideline violations as the Sundial WOZ corpus, it will be very time consuming if not practically impossible to find all the individual violations. It is also unnecessary, because what is needed for repairing the dialogue design are the types of guideline violations that occur. The number of individual violations may support estimates of system performance and acceptability but is of little importance otherwise, as many violations are identical. We therefore established a typology of guideline violations during the analysis of the corpus. This, highly task dependent, typology provides an overview of the different ways in which each individual guideline was violated in the corpus. The typology is useful for revising the dialogue model.
Numbered relative to the guidelines. "SD" refers to "SunDial". Violations may occur under more than one guideline, in which case they are cross-referenced.
Violation: System provides less information than required.
|SD1. actual arrival/departure not stated (GG7-SD1)|
|SD2. scheduled arrival/departure not stated (GG7-SD2)|
|SD3. failed S clarification (GG5-SD1)|
|SD4. S should offer phone no.|
|SD5. S should specify the information it needs|
|SD6. S provides insufficient information for the user to determine if it is the wanted answer|
|SD7. S has information but does not provide it|
Violation: System not fully explicit in communicating to users the commitments they have made
Violation: Missing system feedback on user information.
|SD1. no feedback on arrival/departure day, on BA and/or on route|
|SD2. missing/ambiguous feedback on time (GG7-SD3)|
|SD3. U: arriving flights?, S: leaving flights: imprecise feedback|
Violation: System provides more information than required.
|SD1. U: has phone no. S: offers phone no. (GG5-SD2).|
|SD2. S repeats more than the 4 phone no. digits asked for|
|SD3. flight no. and Orlando are superfluous|
Violation: System provides false information.
|SD1. "flight info." known to be false: S knows only BA|
Violation: System provides information for which it lacks evidence.
|SD1. system says it is not sure of the information it provided|
Violation: System provides irrelevant information.
|SD1. failed S clarification (GG1-SD3)|
|SD2. U: has phone no. S: offers phone no. (GG2-SD1)|
|SD3. departure time instead of arrival time provided|
|SD4. S: handles all flights - "BA does not handle Airline X."|
|SD5. S: encourages inquiry on airline unknown to it|
Violation: Obscure system utterance.
|SD1. S: "no flights are leaving Crete today" (GG7-SD4)|
|SD2. S: "flights between London and Aberdeen are not part of the BA shuttle service, there is a service from London Heathrow terminal one" (rc from GG5 to GG6)|
Violation: Ambiguous system utterance.
|SD1. actual arrival/departure not stated (GG1-SD1)|
|SD2. scheduled arrival/departure not stated (GG1-SD2)|
|SD3. missing/ambiguous feedback on time (SG2-SD2)|
|SD4. S: "no flights are leaving Crete today" (GG6-SD1)|
|SD5. scheduled vs. actual arrival/dep. time not distinguished|
|SD6. AM and PM not distinguished|
Violation: System does not provide same formulation of the same question to users everywhere in its dialogue turns.
|SD1. many variations in S's phrases|
Violation: Too lengthy expressions provided by system.
Violation: System provides disorderly discourse.
Violation: System does not inform users of important non-normal characteristics which they should, and are able to, take into account to behave co-operatively in dialogue.
Violation: Missing or unclear information on what the system can and cannot do.
|SD1. too little said on what system can and cannot do: BA often missing, time-table enquiries always missing (SG8-SD1)|
Violation: Missing or unclear instructions on how to interact with the system.
|SD1. open S intro requires interaction instructions on waiting, verbosity etc.|
Violation: System does not take users' relevant background knowledge into account.
Violation: Lacking anticipation of domain misunderstanding by analogy.
Violation: System does not separate when possible between the needs of novice and expert users.
Violation: System does not consider legitimate user expectations as to its own background knowledge.
Violation: Missing system domain knowledge and inference.
|SD1. Too little said on what the system can and cannot do; this creates user expectations which the system cannot meet (SG4-SD1)|
Violation: System does not initiate repair or clarification meta-communication in case of communication failure.
|SD1. S should initiate repair of arrival time misunderstood by the user.|
Violation: System does not initiate repair if it has failed to understand the user.
Violation: Missing clarification of inconsistent user input.
Violation: Missing clarification of ambiguous user input.