Reviews

In recent years several introductory and research books related to spoken dialogue systems (SDSs) have appeared. We see this as a sign that the current state-of-the-art of speech recognition really has matured into a commercial reality.

This page presents a number of them, and some of them have a more detailed review attached.

Notes on Balentine and Morgan 2001: How to Build a Speech Recognition Application - A Style Guide for Telephony Dialogues
Probably the most systematic and extensive collection of style guidelines for spoken dialogue systems. This book is a must for dialogue designers. Every guideline is systematically presented with examples, when to use it, when not to use it, competing guidelines, and references to supporting literature. Only two minuses: The binding is horrible, and it is out of print.
Notes on Bernsen et al. 1998b: Designing Interactive Speech Systems. From First Ideas to User Testing
Surely, we cannot honestly review this book. Nevertheless, here are some notes. The book is a generalised description of the design development of a task-oriented telephone-based spoken dialogue system from a research project taking place in the period 1991-1997 [Bækgaard et al. 1995b]. It contains a speech interaction theory framework, a new theory of extending Grice's maxims to a set of Cooperativity Guidelines and how to apply them in design and evaluation. Finally, a complete account of the Wizard-of-Oz technique is given. To our own surprise, Springer had to reprint the book, and though is it today definitely out of print, its theories and results are referenced in and further developed by later work, such as [Harris 2005], [Möller 2004], and [McTear 2004].
Notes on Cohen et al. 2004: Voice User Interface Design
Cohen is a cofounder of Nuance, and the book shows that he and his coauthors have solid knowledge on what they are writing about. They have clearly been in the field and have practical experience. The book fairly completely covers the entire design process, from requirements and high-level design over detailed design, persona, cognitive load, prosody and voice actors, to grammars, testing and tuning. And indeed, much more. This is really a good entry book, and a book where experienced practitioners can get their knowledge of spoken dialogue systems design refreshed and expanded.
Notes on Geis 1995: Speech Acts and Conversational Interaction
This book presents Dynamic Speech Act Theory (DSAT) as an attempt to account for competence in naturally occurring dialogue. The book is relevant both to conversation theorists and to developers of conversational systems, such as advanced spoken dialogue systems, working in the area of dialogue management. Review
Notes on Harris 2005: Voice Interaction Design - crafting the new conversational speech systems
Highly recommended reading. Harris has a lingustic background with SDS experience from both academia and industry. The book puts much emphasis on user centered design, includes a lot of examples (many of them from our work), and in general covers most aspects of spoken dialogue design. Some of the highlights are:
  • A coherent presentation of speech theory which is comprehensive and better linguistically founded than most other literature related to spoken dialogue systems. This includes: Linguistic basics (sound, words, syntax, semantics, prosody, 30 pages). Speech acts and maxims/implicatures (Searle and Grice, 50 pages). Conversational analysis (turns and grounding, 30 pages). Rhetorical structure theory (Glue, 30 pages). Style (Diction, 10 pages).
  • Building speech interfaces, stressing cross-disciplinarity and user centeredness. Crafting, habitability etc. 20 pages. An unusual, but useful section on putting the team together (15 pages). Users and tasks (20 pages). Building the discourse model (linguistically based, 35 pages). Casting, personality etc. (45 pages). Dialogue patterns (75 pages). A very lovely section on Scripting which emphasizes application of electronic dialogue models, something far too few people designing VUI's do (50 pages). Iterative evaluation and Wizard of Oz testing (40 pages).
  • Includes a 30 pages glossary.
In summary, a very comprehensive book giving a coherent introduction to linguistic background theory, a useful, normative "how to" for designing voice interfaces, and a fair summary of methods for testing and evaluating voice interfaces. Should I say something negative, Harris sometimes puts up statements that he can easily oppose, like "speech is ungrammatical" (which of course is not true); but then again, the area of spoken dialogue has seen many engineers with no linguistic knowledge, or computational translation linguists with little knowledge of spoken language (I myself have heard one of these worriedly exclaim "there is almost only ellipses, how can we analyse that?", essentially meaning that anything not reducing to an S is not well-formed ;-)
Notes on Kotelly 2003: The Art and Business of Speech Recognition - Creating the Noble Voice
Kotelly was creative director at SpeechWorks, and this book documents deep insight into what makes a speech user interface design excellent. He cares equally about two sides of design: The customer (company branding, production, and deployment), and users (natural flow, effective prompts, usability testing). I have many favourite sections in this book, but I would like to emphasise Section Five "Developing the design" which fits very well with the design efforts I'm working with now, and which we are trying to capture in the DialogDesigner tool. Read this book if you're trying to do more than touchtone-sounding mechanical voice interfaces.
Notes on Larson 2003: VoiceXml - Introduction to Developing Speech Applications
VoiceXML is becoming the vendor-independent language for specifying telephony-based spoken dialogue. Several books on VoiceXML have appeared, and most of them contain sections relevant to dialogue designers. Larson is central to the VoiceXML work, e.g. he is chairing the W3C Voice Browser Working Group responsible for VoiceXML 2.0. His book is an extremely well written account of VoiceXML and its use, and also contains fine sections with advice on dialogue design, including the Equal Error Rate approach to adjusting recognition score thresholds.
Notes on McTear 2004: Spoken Dialogue Technology. Toward the Conversational User Interface
A monograph targetted at both developers and students. It contains several chapters with tutorials on developing spoken and multimodal dialogue using the CSLU toolkit as well as VoiceXML and VoiceXML+XHTML and SALT. Recommended reading.
Notes on Möller 2004: Quality of Telephone-based Spoken Dialogue Systems
Möller attacks the difficult topics of what quality is and how to measure it. He starts by establishing a structure of spoken dialogue systems. The structure includes the speech interaction theory and the cooperativity guidelines of Bernsen's and our own work, but supplements and extends it for the purposes of achieving a framework for quality measurement. He then goes on to describe in detail several assessment methods, and even quality prediction methods. He shows the hypothesis that "quality models for the overall interaction with the SDS can cover only a part of the factors influencing perceived quality". The book represents an extensive and solid piece of research which is synthesized into a theory of quality that can prove quite useful in practical SDS development and certainly in future research. The book is well-written and is highly recommended reading. It includes a brief glossary and extensive lists of abbreviation and interaction parameter definitions.
Notes on Smith and Hipp 1994: Spoken Natural Language Dialog Systems — A Practical Approach
The book is a description of a trouble shooting spoken dialogue system "Circuit Fix-It Shop". The main focus of the book is on managing mixed-initiative and parsing input. The dialogue management is based on a theory-proving model of dialogue, with user input seen as supplying missing axioms. Highly recommended reading, even now more than 10 years later.
Notes on Steensig 2001: Sprog i virkeligheden - Bidrag til en interaktionel lingvistik
Every dialogue designer should pay close attention to conversational analysis — this is the discipline that makes apparent all the human conversational complexities that designers wish would silently disappear. This book presents an empirical method, Interaction Linguistics, of describing human spoken dialogue. The method views language as interactional, i.e. linguistic production and understanding during dialogue is created in an interaction between humans, rather than separately by each human participating in the conversation. Recommended reading. Only in Danish.