One should not forget that speech is a very complex phenomenon which we are only on the edge of making computers understand. Below we list just some of the characteristics that the human cognitive system somehow is able to implement (see also [Bernsen et al. 1998b, Figure 2.1]).
- Recognition of spontaneous speech, including the ability to recognise words and intonational patterns, generalising across differences in gender, age, dialect, ambient noise level, signal strength etc.
- A very large vocabulary of words from widely different domains.
- Syntactic-semantic parsing of the complex, prosodic, non-fully-sentential grammar of spoken language, including characteristics of spontaneous speech input such as hesitations (“ah”, “ehm”), repetitions (“could could I …”), false starts (“on Saturday, no, Sunday”), stress, glottal stops, and non-words (coughs, the sound of keystrokes).
- Resolution of discourse phenomena such as anaphora and ellipsis, and tracking of discourse structure including discourse focus and discourse history.
- Inferential capabilities ranging over knowledge of the domain, the world, social life, the shared situation and the participants themselves.
- Planning and execution of domain tasks and meta-communication tasks.
- Dialogue turn-taking according to clues, semantics, plans etc., the interlocutor reacting in real time while the speaker still speaks, i.e. the conversation is collaborative.
- Generation of language characterised by complex semantic expressiveness and style adapted to situation, message and dialogue interlocutor(s).
- Speech generation including phenomena such as stress and intonation.
- Extensive social communication (greetings, excuses etc.) and non-taskoriented dialogue with social functions.