[ Contents ]

Markup of transcriptions

Hans Dybkjær, Laila Dybkjær, and Niels Ole Bernsen
email: dybkjaer@pdc.dk
Prolog Development Center A/S PDC

Written by Hans Dybkjær while at:
Centre for Cognitive Science, Roskilde University
P.O. Box 260, DK-4000 Roskilde, Danmark

[« | »] Abstract

The creation and markup of the user test transcriptions from the Danish spoken language dialogue system are described. The experiences on the use of the TEI Guidelines are reported. The TEI Guidelines have been extended with annotation for internal system modules.

keywords: spoken dialogue corpus, system dialogue, markup, TEI

Navigation

The hyper text links in front of each section header allow navigation between sections:

«
Go back to the previous section at the same level.
|
Go to the contents which link to individual sections.
»
Go forward to the next (sub-) section.

The navigation between scenarios in the transcription files works in a similar way.

[« | »] Contents

[« | »] List of tables

  1. Transcriptions survey.
  2. Database request types.
  3. Database return types.
  4. Database status values.
  5. Semantic slots.
  6. Topics.

1 [« | »] Introduction and survey

The Danish spoken language dialogue project has produced a demonstrator of an airline ticket reservation system to be used via the telephone [Baekgaard et al. 1995]. A user test of this system has been performed [Dybkjær et al. 1995a]. In the test, users would call the system, a wizard would type in user utterances, and the system would take care of the rest: text recognition, parsing, dialogue handling, database communication, and system phrase generation and replaying. In total, 12 subjects generated 2468 tokens and 168 types using 998 turns and 57 dialogues, cf. Table 1. Each subject solved four task scenarios.

The user sessions were logged: system internal key communication on a file and spoken conversation on tape. Afterwards the data were transformed into formal transcriptions. The present report describes this transformation and the resulting transcriptions.

The transcriptions serve several purposes:

In order to serve these purposes the TEI Guidelines have been used as a basis [Sperberg-McQueen and Burnard 1994]. The Guidelines define a document type definition (dtd) in SGML, thereby providing a formal, machine manipulable structure for text markup. Furthermore, the Guidelines aim at supporting the exchange of electronic texts between people.

Section 2 describes the transcription process. The full transcriptions format is defined and discussed in Section 3. These transcriptions form the basis of all further analysis of the user test material. In Section 4 one such important use is described: the extraction and pretty-print for purposes of qualitative user problem analyses. Finally, Section 5 discusses the transcriptions and the use of the TEI structure. After the references appendices contain a number of code tables (A), a brief survey of software used (B), and a small example of the log files (C).

Table 1 [« | »] Transcriptions survey.
The subjects used scenarios with either graphics representation (group G) or text representation (group T) of time information.
The subject numbers provide links to the transcriptions.
Subject* group Date Dialogues User
turns
User
tokens
User
types
2G 1995-1-13 4 90 139 17
3T 1995-1-13 4 67 160 53
4G 1995-1-16 4 72 235 59
5T 1995-1-16 5 85 299 45
6T 1995-1-16 5 86 261 55
7G 1995-1-19 5 97 158 32
8T 1995-1-19 4 68 139 44
9G 1995-1-19 4 61 107 25
10G 1995-1-15 4 64 114 30
11T 1995-2-9 7 139 450 76
12G 1995-1-25 4 67 109 26
13T 1995-2-10 7 102 297 67
Total121995 57 998 2468 188**
*) Subject 1 was one of the designers used for set-up test, and the dialogue was never transcribed.
**) Total types is the union of the individual subject types.

2 [« | »] Transcription method

During sessions the conversation and the system module communication were logged, and later-on the logged data were converted into machine manipulable, electronic, structured transcriptions. These transcriptions were then used both for statistical processing and for extractions of various parts for the purpose of more qualitative analyses. This section describes how logged data were transformed into transcriptions. A description of the transcription markup elements is given in Section 3.

2.1 [« | »] Logging

Each subject had one session with the system, solving the four task scenarios in one or more telephone calls. During a session a log file was created containing, cf. Appendix C:

The communication with the devices was time stamped although no direct link to the discourse structure can be inferred. The other data (keyed-in phrases and system phrases) would appear in roughly chronological order although in particular the keyed-in phrases appear delayed.

In addition the conversation between the system and the subject was recorded on a tape recorder.

2.2 [« | »] Transformation to marked-up transcriptions

For each session the log file and the tape recording were merged into one transcription following an extension of the TEI markup. The conversion was semi-automated, using emacs-lisp functions in several, iterative phases and with visual inspection of the transcriptions to check correctness and completeness. Overall, the following steps were taken:

The use of the keyed-in phrases as a first approximation of the user utterances saved a lot of time in the transcription process since the transcriber could concentrate on the relative few errors in what the wizard had keyed in.

An illustration of a log file and the corresponding transcription can be found in Appendix C.

3 [« | »] Transcription elements

This section describes the markup of the corpus. For the transcription markup the TEI Guidelines [Sperberg-McQueen and Burnard 1994, Ide and Véronis 1995] were used as a basis. The Guidelines provide a concrete set of SGML document type definitions (dtds) for the structured markup of any kind of text and contain recommendations on their use. In particular, the Guidelines contain a proposal for the markup of transcriptions of spoken language. Even though extensions have been necessary, the Guidelines provide a standard that may make other uses and others' uses of the material easier, and standard SGML tools can be applied in the manipulation of the transcriptions.

The transcriptions are TEI conformant in the TEI local processing format [Sperberg-McQueen and Burnard 1994, Chapter 28], with the possible exception of additional structure in parse and database transcriptions which have not been of primary interest. The transcriptions are in TEI interchange format with the exception that the common corpus header and the address of Centre for Cognitive Science have been delegated to separate files by means of external entities. The corpus header information is a subset of the information in the teiheader, and the address is highly redundant. Each transcription of a session (one subject) is placed on a separate file mainly due to processing difficulties caused by their size.

Briefly, the transcriptions contain the following elements, not mentioning the enclosing header and body elements. The elements <scenario> and <div> structure the transcriptions into dialogues and subdialogues, respectively. A dialogue contains a sequence of utterances <u>. Related to user utterances are what the user keyed in <keyed>, what was recognised <recognised>, and semantic information which the parser could extract <parse>. In between utterances, requests to the database occur <database>. Finally, within utterances certain lexical and extralinguistic information is recorded <t> <pause> <vocal> <kinesic>.

Each of the primary elements of the transcriptions are described in the subsections below. For each element the purpose, an example, the document type definition (dtd), and some comments are given. The full definitions are in the files dialogue.ent and dialogue.dtd.

3.1 [« | » html] Scenario (<scenario>)

Marks a new call to the system or a new task and identifies the task scenario.

example: <scenario representation=G group=1 scenario=2 variation=a>
This scenario code indicates a graphic representation (G) of the 1st task set, the 2nd scenario, in date variation a.

dtd:


<!ENTITY %  scenario 'INCLUDE' >

<![ %scenario; [

<!ELEMENT %n.scenario;         - o  EMPTY		>

<!ATTLIST %n.scenario;

          representation     (G | T)             #IMPLIED

          group              NUMBER              %INHERITED

          scenario           NUMBER              %INHERITED

          variation          CDATA               'a'

          TEIform            CDATA               'scenario'		>

]]>



<!ENTITY % newbody 'INCLUDE' >

<![ %newbody; [

<!ELEMENT %n.body;      - O  ((%n.div; | %n.scenario;)+) >

<!ATTLIST %n.body;           %a.global;

                             %a.declaring;

          TEIform            CDATA               'body'         >

]]>

      

comments: The structural place relative to the divisions is not enforced by the the dtd. Moreover, the redeclaration of <body> is the only unclean modification [Sperberg-McQueen and Burnard 1994, Section 29.1] of the TEI dtd. Both defects are insignificant for the current purposes and may be repaired by a more clever contents model for <body>.

3.2 [« | » html] Divisions (<div>)

Mark overall discourse segments within a dialogue. The divisions are defined relative to the dialogue model structure.

example: <div n="1" type="pre" id="C2-1">
The first division (n="1", -1) of subject two (C2) is a "pre" segment (pre), i.e. initial greetings and user modeling.

dtd: This element is unchanged from the TEI Guidelines [Sperberg-McQueen and Burnard 1994, Section 7.1.1]. The attributes have the following restrictions in their use:

type
The type attribute could be constrained to:
pre
Initial greetings and user modeling ("Do you know the system?").
task
Solving the task (always a reservation)
mid
Transition from one task to another ("Do you want more?").
post
Final "Do you want more?" and farewell.
n
Divisions are numbered consecutively throughout the whole transcription for each subject.
id
format C m - n
where m is the subject number and n is the value of the n attribute.

3.3 [« | » html] Turns (<u>)

Describes what is said, who said it, and what is talked about.

example: <u id="S2-3a" who="S" topic=customer> <pause dur=1>kundenummer - <t type=cardinal value="4">4</t></u>
This is the system's (S) first part (a) of the third turn (3) during conversation with the second subject (2). The utterance concerns the customer number (topic=customer).

dtd:


<!ENTITY % newu 'INCLUDE' >

<![ %newu; [

<!ELEMENT %n.u;         - -  ((%phrase | %m.comp.spoken)+)      >

<!ATTLIST %n.u;              %a.global;

                             %a.timed;

          trans              (smooth | latching | overlap |

                             pause)              smooth

          who                IDREF               %INHERITED

          topic              CDATA               %INHERITED

          TEIform            CDATA               'u'            >

]]>

      

This element extends the TEI Guidelines with the topic attribute [Sperberg-McQueen and Burnard 1994, Section 11.2.7]. In the user test transcriptions it can take the values listed in Table 6. The attributes have the following meanings:

id
The identification of the turn. The identification is unique globally over all the transcriptions of all subjects.

format: speaker subject - number part

speaker
Either S (system) or U (user).
subject
Number identifying the current subject (alias the user).
number
Running turn number which is separate for S and U. Starting from 1 for each subject.
part
For the system, the turn is divided into more sub-utterances (typically feedback plus new request) each concerning one topic.
trans
Not used here, but see Section 3.4 on overlap.
who
The speaker of the utterance. Either S for system or U for user (alias the subject).
topic
The topic of the sub-utterance, cf. Table 6. Only one topic is assigned.

comments: For the system, turns have been divided into several (sub-) utterances each denoted with a separate <u> element. This is feasible for the highly structured phrases in the current system. In general it may not be possible to separate the turns into utterances with a single, distinct topic, and a more sophisticated markup may become necessary.

3.4 [« | » html] Overlap (<u>)

User utterances overlapping with system speech.

example: <u who="C-2" trans=overlap> Ni ti.</u>
The overlap is with the previous phrase, or just with the current turn. In most cases the user says something during a long pause in a system turn, but the system does not listen because it does not expect users to speak during its turn.

dtd: The same as for turns.

comments: Overlapping speech is in principle not different from other utterances, but since turns are defined relative to the system and it is not listening during its turns, overlapping speech is marked as utterances, but not as part of the turns. For this reason there is no reference to the turn it overlaps and it must be placed textually in the right context.

3.5 [« | » html] Keyed (<keyed>)

As typed in by the wizard, but after expansion of abbreviations, e.g. "29" abbreviates "niogtyve (twentynine)". The abbreviations are listed in the file P1.abb and were introduced in order to speed up the wizard's typing. The keyed-in is input to the simulated recogniser.

example: <keyed which="U3-21a">onsdagfo8rstefebruar</keyed>
The which attribute refers to the immediately preceeding utterance.

dtd:


<!ENTITY %  keyed 'INCLUDE' >

<![ %keyed; [

<!ELEMENT %n.keyed;         - -  ((%phrase | %m.comp.spoken)+)		>

<!ATTLIST %n.keyed;          %a.global;

          which              IDREF               %INHERITED

          TEIform            CDATA               'keyed'		>

]]>

      

comments: The id attribute must refer to a user utterance, i.e. a <u> with attribute who="U". The abbreviation scheme performed nicely, but since only the expanded forms are recorded, expansion errors are not marked. E.g. there was a clash between the expansion scheme for numbers and the input format of the simulated recogniser: textrec expected 'æ' 'ø' 'å' to be represented as 'a5' 'o8' 'a6', but typing this would be incorrectly expanded into 'afem' 'ootte' 'aseks' by the number expansion. For this reason e.g. 'lørdag' was typed 'lordag' which would be correctly recognised as 'lo8rdag' with the score of -3.0, a score close to zero which tentatively means the match is good.

3.6 [« | » html] Recognised (<recognised>)

The result of the simulated recognition by textrec. This simulated recogniser use the same search algorithms as the real recogniser but with character based textual word models in stead of Markov models. The result is input to the parsing.

example:

<recognised which="U3-21a" grammar="Date" score=-60.000000>fo8rste februar</recognised>

The keyed-in utterance "onsdagfo8rstefebruar" is recognised as "fo8rste februar" with score -60.0 according to the Date grammar. The score -60.0 is neither good nor bad.

dtd:


<!ENTITY %  recognised 'INCLUDE' >

<![ %recognised; [

<!ELEMENT %n.recognised;         - -  ((%phrase)+)		>

<!ATTLIST %n.recognised;              %a.global;

          which              IDREF               %INHERITED

          grammar            CDATA               %INHERITED

          score              CDATA               %INHERITED

          TEIform            CDATA               'recognised'		>

]]>

      

comments: The id attribute must refer to a user utterance, i.e. a <u> with attribute who="U". A zero score means perfect match, and the more negative the worse the match. The grammars hopefully have intuitive names, but are defined in the file gram. The textrec ensures that the parser input is within the vocabulary known by teh system and in a form similar to output from the real recogniser. As a positive side effect most typing errors were correctly eliminated.

3.7 [« | » html] Semantics (<parse>)

The semantic parsing result as transferred to the dialogue handling.

example:


<parse which="U3-21a">

| Current parsecontext: grammarset:  Command Command Command Yesno Date

| semantic objects:

| actionso(action ActionSO) [action --NULL--]

| yesnoso(choice BooleanSO) [choice --NULL--]

| dayso(day_of_week Day_of_weekSO, year BooleanSO, month IntSO, day IntSO) [day_of_week --NULL--, year --NULL--, month [ones [number --NULL--, sign --NULL--], tens [number --NULL--, sign --NULL--], hundreds [number --NULL--, sign --NULL--]], day [ones [number --NULL--, sign --NULL--], tens [number --NULL--, sign --NULL--], hundreds [number --NULL--, sign --NULL--]]]

| Resulting Parse Tree # 0

| Subgrammar[ 1 ]: Date

| l:[s_1,sem={month={ones={number=2}},day={ones={number=1}}}]:{cat=s}.[

|   l:[date_p_7,sem={month={ones={number=2}},day={ones={number=1}}}]:{cat=date_p}.[

|     l:[ord_p_2]:{cat=ord_p,scat=date,mth=yes}.[

|       l:[fo8rste_1]:{cat=ord,mth=yes,dalu=fo8rste,lex=fo8rste,scat=date,post_comb=no,int=1}.[

|       ]

|     ],

|     l:[februar_1]:{cat=month,dalu=februar,lex=februar,gend=comm,nb=sing,defs=indef,semtype=time,count=no,mth_nb=2,case=no}.[

|     ]

|   ]

| ]

| set Slot number "2"

|  set Slot number "1"

</parse>

      

The semantics of "fo8rste februar" (first february) is taken to be month number 2, day number 1. Note that the semantic object names are not part of the logs and must be inferred from the context. In this case the parse tree results in sem={month={ones={number=2}},day={ones={number=1}}}, and month and day unambiguosly belong to to the semantic object dayso.

dtd:


<!ENTITY %  parse 'INCLUDE' >

<![ %parse; [

<!ELEMENT %n.parse;         - -   (#PCDATA)		>

<!ATTLIST %n.parse;              %a.global;

          which              IDREF               %INHERITED

          TEIform            CDATA               'parse'		>

]]>

      

comments: The id attribute must refer to a user utterance, i.e. a <u> with attribute who="U". The parse tree has a formal structure that is not marked up in TEI. The semantic slots are used in qualitative analyses and are included in the HTML extract. Thus at least this field should have had a formal TEI markup. The '|' character is purely cosmetic and prevent too smart auto-fill-modes to spoil the indention structure.

3.8 [« | » html] Database (<database>)

The communication with the database is shown with all domain, type and status arguments.

example:


<database type="query" modifier="DAY">

| date: year:  VOID

|       month: 02

|       day:   1

|       dow: VOID

|       relday: VOID

|       sameday: VOID

</database>

<database type="answer" modifier="DAY">

| status: DB_OK

| date: year:  1995

|       month: 02

|       day:   1

|       dow: WED

|       relday: 19

|       sameday: VOID

</database>

      

In the example, the validity of the first day in the second month is checked, and the database answers positively and gives the missing information (completion).

dtd:


<!ENTITY %  database 'INCLUDE' >

<![ %database; [

<!ELEMENT %n.database;         - -  (#PCDATA)		>

<!ATTLIST %n.database;                   %a.global;

          type               (query | answer)    #IMPLIED

          modifier           CDATA               %INHERITED

          TEIform            CDATA               'database'		>

]]>

      

comments: Chronologically, the communication takes place at the point of occurrence in the transcriptions. The modifier types and the statuses are listed in tables 2, 3, and 4.

3.9 [« | »] Types and tokens (<t>)

Record the type of certain tokens. All other tokens (words) count as distinct types in themselves.

example: <t type=week-day value="WED">onsdag</t>

All week-day tokens mandag...søndag count as one type. The other meta-types are listed in the dtd.

dtd:


<!ENTITY %  t 'INCLUDE' >

<![ %t; [

<!ELEMENT %n.t;         - -  (#PCDATA)		>

<!ATTLIST %n.t;                   %a.global;

          type               (airport | month | week-day | name |

                              cardinal | ordinal | false-start)    #IMPLIED

          value              CDATA               %INHERITED

          TEIform            CDATA               't'		>

]]>

      

comments: In the transcriptions one word is one token. Since the vocabulary size is important in these human-computer spoken dialogue studies, the domain specific tokens with a known range and a frequency known to be skewed are marked as an abstract token <t>. E.g. all months Januay...December should be in the vocabulary even though only January and February occur in the corpus. All other tokens are implicitly taken to be their own type, e.g. 'rabat' is interpreted as if marked <t type=rabat>rabat</t>.

3.10 [« | »] Extra-linguistic elements (<pause> <vocal> <kinesic>)

Pauses, vocal sounds like 'eh' and non-vocal events (kinesic) like keyclicks are recorded and marked.

example:
<pause dur=4>
<kinesic desc="key clicks">
<vocal desc="øh">

dtd: These elements are unchanged from the TEI Guidelines [Sperberg-McQueen and Burnard 1994, Section 11.2.7]. The attributes have the following restrictions in their use:

pause
The value of dur may be '.', '..', and '...' denoting estimated pauses of a duration up to about one second, or a number denoting a measured pause in whole seconds.
kinesic
The desc is always key-clicks. These were used as background noise in slight pauses after longer phrases.
vocal
The desc is e.g. "øh" and "eh".

3.11 [« | »] Intonation ('?' '!' '.' '-')

The signs '?', '!', and '.' have their usual, orthographic meanings, but are not used consistently throughout the transcriptions. Thus no formal markup has been defined for these. They are generated automatically as output from the system.

For the system utterances '-' denotes concatenation of prerecorded phrases. No prerecorded phrase extends over more than one utterance. Thus '-' indicates some of the uneven intonational patterns that easily follows from generating speech through concatenation of prerecorded phrases.

example:

<u id="S3-50a" who="S" topic=outhour> Udrejse! klokken - <t type=cardinal value="11">11</t> - <t type=cardinal value="15">15</t></u>
<u id="S3-50b" who="S" topic=outhour>Er det rigtigt? </u>

Here the '!' after 'Udrejse' is a bit unusual orthograpically, but corresponds to the intonation. After '15' there should have been a '.'. Note how '-' indicates concatenation of prerecorded phrases.

comments: A consistent use of '?', '!', and '.' would have enhanced the readability, which is particularly evident in the HTML version. Currently they seem, however, to have no formal or analytical use.

The use of '-' in system phrases is consistent and could easily be given a TEI conformant markup, e.g.


	<!ELEMENT  %n.sep     - o   EMPTY	>

      

The <sep> could be substituted for '-'.

4 [« | »] HTML transcriptions

The transcriptions are provided in an HTML version for readability and accessibility. They have been transformed automatically from the original transcriptions written in TEI by use of the program trans. In this section the interpretation of the HTML transcriptions is explained, whereas the full transcriptions and their origination are described in Section 3.

The transcriptions of the conversation with each subject is placed in a separate file, e.g. the dialogue transcriptions of subject number two are placed in the file code-2.htm. The header of this file states the subject number and the date and hour for the start of the recordings of the session with this subject. In addition there is a link to the full transcription file, the TEI-file.

After the following example of an HTML transcription the central elements of the HTML transcriptions are described. Each element is described through purpose, format, and an example.

Example:

outhour U2-17a Ni ti .
keyed: niti
recognised [Hour/0.000000] : ni ti
semantics: number "9" number "10"
database [query/HOUR]
database [answer/DEPARTURES] DB_OK
outhour S2-18a [..6] klokken 9 10
database [query/RESERVE]
database [answer/RESERVATION] DB_OK
reservation S2-18b [..5] Der er nu reserveret 2 billetter til idnumre 4 og 2 testMH og testRH fra københavn til karup fredag den 13. i 1. klokken 9 10 Dit referencenummer er 49 Den samlede pris er et tusind og 20 kroner [key clicks]
deliver S2-18c Billetterne vil ligge til afhentning ved checkinpult nummer 19 i lufthavnen! Billetter skal afhentes senest 20 minutter før afgang!
Division 3: post
toptask S2-18d Vil du mere?
toptask U2-18a Nej tak.

In front of each utterance (bold faced) the topic and a unique identification of the utterance are given.

After the user uttered U2-17a: "Ni ti" ("nine ten") the wizard keyed in "niti" which was recognised by the simulated recogniser as "ni ti" using the Hour grammar and perfectly matched (score 0.0). The parser instantiates the semantic number slots of hours ("9") and minutes ("10"), respectively. Then the database is queried and it answers back that this departure is ok. All this yields a pause of 6 seconds ("..6") before the system makes the feedback S2-18a: "klokken 9 10".

Finally, the database is asked to reserve the ticket which is ok, and the system provides summarising feedback in S2-18b before it states a precomputed value of the deliver topic in S2-18c and at last poses the next question in S2-18d.

4.1 [« | » tei] Scenario header

Marks a new call to the system or a new task and identifies the task.

format: Scenario form - task set - number - version

form
Scenarios have time information represented either through graphics or through text. The codes G and T are used for graphic scenarios and text scenarios, respectively.
task set
A number indicating the set of four tasks. Each set comes both as a graphic and a text representation.
number
A number identifying the task scenario within the task set.
version
If a task set was used more than once the versions in the two uses may differ with respect to the precise dates, since during the sessions "today" is assumed to be the real, current day. The versions are chronologically marked with letters a, b, ...

Example: Subject 2 has a scenario G-1-2-a which indicates a graphic representation (G) of the 1st task set, the 2nd scenario, in date variation a.

4.2 [« | » tei] Divisions

Mark overall discourse segments within a dialogue.

format: Division number: segment

number
A running number, starting from 1 for each subject.
segment
A code indicating the type of the segment. The following codes are in use:
pre
Initial greetings and do you know the system.
task
Solving the task (always a reservation).
mid
Intermediation transition from one task to another.
post
Final farewell.

Example: The first division for a subject is Division 1: pre indicating that the first segment is opening of the dialogue.

4.3 [« | » tei] Turns

Describes what is said, who said it, and what is talked about.

format: topic id text

topic
The topic of the sub-utterance. Only one topic is assigned.
id
The identification of the turn. The identification is unique globally over all the transcriptions of all the subjects.

format: speaker subject - number part

speaker
Either S (system) or U (user).
subject
Number identifying the current subject (alias the user).
number
Running turn number which is separate for S and U. Starting from 1 for each subject.
part
For the system, the turn is divided into more sub-utterances (typically feedback plus new request) each concerning one topic
text
Orthographic transcription of speech. Includes certain extra-linguistic information in brackets [], e.g.:
pauses
[.], [..], and [...] denote pauses up to about one second. [..number] denotes a pause measured in whole seconds.
kinesic
E.g. key-clicks.
vocal
E.g. "øh" and "eh".
false starts
E.g. "f".

Example:

outhour S2-16b Vil du have en af disse afgange?
outhour U2-16a Ja. [...] Ni ti .

This is a question and an answer concerning the outhour item. The transcription is for subject 2 and displays the second part, b, (question: "Would you like one of these departures?") of the system's 16th turn followed by the user's 16th turn (an answer "Yes. [pause about one second] nine ten.").

4.4 [« | » tei] Overlap

User utterances overlapping with system speech.

format: (overlap) text

text
The same as for the utterances.

Example:

outhour S2-17a Hvilken?
(overlap) Ni ti.
outhour U2-17a Ni ti .

This is the continuation of the above example. Only "yes" was caught by the system (because the wizard hit return during the pause) and the system asks which (S2-17a: "hvilken?") departure to select. Simultaneously, the user again says "ni ti" (overlap), and afterwards repeats.

4.5 [« | » tei] Keyed

As typed in by the wizard, but after expansion of abbreviations, e.g. ".29" for "niogtyve (twentynine)". This ia the input to the simulated recogniser. It relates to the immediately previous user utterance.

format: keyed string

string
What the wizard keyed in as input to the simulated recogniser. Whitespace is ignored by the simulated recogniser, so the wizard would typically not type in whitespace.

Example (U3-21a): keyed: onsdagfo8rstefebruar (wednesdayfirstfebruary)
Note how whitespace is not present.

4.6 [« | » tei] Recognised

The result of the simulated recognition textrec. This program works like the real recogniser but with character based textual word models in stead of Markov models. output from the simulated recognition is input to the parsing and relates to the immediately preceeding utterance.

format: recognised [grammar/score]: string

grammar
The name of the active grammar matching best the utterance.
score
The score value of the match. Zero means exact match.
string
The recognised string of lexical items.

Example (U3-21a): recognised [Date/-60.000000]: fo8rste februar (first February)
The keyed in utterance "onsdagfo8rstefebruar" is recognised as "fo8rste februar" with score -60.0 according to the Date grammar.

4.7 [« | » tei] Semantics

The semantic parsing result as transferred to the dialogue handling. The result relates to the immediately preceeding utterance.

format: semantics: slots

slots
The values instantiated by the parser in the semantic objects.
format: slot "value"
slot
The name of the slot, cf. table 5. Note that it may occur more than once since more than one semantic object may contain a slot of the given name.
value
The value of the slot.

Example (U3-21a): semantics: number "2" number "1"
The semantics of "fo8rste februar" (first february) is taken to be month number 2, day number 1. Note that the semantic object names are not part of the logs and must be inferred from the context.

4.8 [« | » tei] Database

The communication with the database, abstracted to type and status. Chronologically, the communication takes place at the point of occurrence in the transcriptions.

format: database [QA/item] status

QA
Either query to the database or answer from the database.
item
The item in question, cf. Tables 2 and 3.
status
The status of requests, cf. Table 4.

Example (U3-21a): database [answer/DEPARTURES] DB_OK
A request to the database about the first of February is answered positively (DB_OK), returning a list of feasible departure times.

5 [« | »] Conclusion

The transcriptions from a user test of the Danish spoken language dialogue project have been described. The transcriptions have been succesfully applied in statistical descriptions as well as more qualitative analyses of user problems [Dybkjær 1995, Dybkjær et al. 1995a]. The formal structure derived from SGML has been essential in the automatisation of many of the applied statistics and transformations. Below some more general remarks on the transcription process are given.

Understanding the TEI Guidelines

The Guidelines are voluminous---1400 pages of technical documentation--- and attempt to target many different aspects. This makes it difficult to find exactly the elements intended for solving precisely the problems at hand.

Of course one may expect to use some time in understanding a large apparatus, and more accessible beginner's guidelines are emerging [Sperberg-McQueen 1995, Burnard and Sperberg-McQueen 1995], but well-chosen examples from a range of applications would be helpful.

Software

We have used psgml/emacs for editing and sgmls for verifying. Both tools are general SGML tools and work nicely. However, the setup of directories of element and character files is not trivial.

Moreover, software for viewing/extracting chosen parts of the transcriptions must be programmed from scratch, as must software for counting phenomena in the dialogues. Even though special projects have special needs, it might be possible to construct general viewing and statistics packages for TEI spoken dialogue transcriptions, much like James Clarke is programming a general parsing package for SGML [Clark 1995].

Internal modules

The transcriptions have annotations of internal modules. Apparently the Guidelines did not foresee this need although they do allow conformant extensions for handling this kind of phenomena. In the current transcriptions they have for simplicity been represented directly in separate elements <keyed> <recognised> <semantics> <database>.

The extra elements all capture a call to or result of a system internal module. This generality might be captured by:


<!ENTITY %  process 'INCLUDE' >

<![ %process; [

<!ELEMENT %n.process;         - -  ((%parameter)+)		>

<!ATTLIST %n.process;              %a.global;

          name               IDREF               %INHERITED

          type               (call | result)     %INHERITED

          TEIform            CDATA               'process'	>

	]]>

      

The problem now is to model the parameter elements since these can be expected to vary much in form and function. One might define a general purpose structure:


<!ENTITY %  parameter 'INCLUDE' >

<![ %parameter; [

<!ELEMENT %n.parameter;         - -  ((%parameter)+)		>

<!ATTLIST %n.parameter;              %a.global;

          name               CDATA               %INHERITED

          type               (%m.partype)          %INHERITED

          TEIform            CDATA               'parameter'	>

	]]>





<!ENTITY % x.partype ''                                            >

<!ENTITY % m.partype '%x.partype enum | int | real |

			char | string | list'            >

      

With the list type encompassing records and arrays as well, and with %x.partype to account for user extensions this yields a general model for parameters. However, it does not model pointer structures (graphs) even though this could be remedied by adding a label type and a label attribute, and higher order types have (as usual) no easy textual representation. Worse perhaps, the type structure is not enforced by the SGML structure: this can seemingly only be achieved by splitting the parameter element into distinct elements for each type. First, this forces users/transcribers to declare new specialised parameter elements as they need them, and second, since SGML does not seem to support element type inheritance the commonality between parameters is no longer explicit.

Microtags

At the low level of the corpus some formal tags have been used without defining a TEI conformant markup, cf. e.g. the use of '-' and the inner structure of <semantics> and <database>. The use and choice of these microtags are somewhat arbitrary and relates solely to the convenient readability of the transcriptions and to what structure was in the looging files. The affected structures were not in the focus of our analyses. Even then, with more advanced extraction and viewing tools it might certainly have been better to encode these phenomena TEI conformantly.

[« | »] References

[Baekgaard et al. 1995]
Anders Baekgaard, Niels Ole Bernsen, Tom Brøndsted, Poul Dalsgaard, Hans Dybkjær, Laila Dybkjær, Jan Kristiansen, Lars Bo Larsen, Børge Lindberg, Bente Maegaard, Bradley Music, Lene Offersgaard, and Claus Povlsen: The Danish spoken language dialogue project. A general overview. In proceedings of the ESCA workshop on spoken dialogue systems, Danmark, June 1995, pp 89-92.
[Burnard and Sperberg-McQueen 1995]
Lou Burnard and C. Michael Sperberg-McQueen: TEI Lite: An Introduction to Text Encoding for Interchange. Document No: TEI U 5, June 1995. URL: http://www.uic.edu:80/orgs/tei/intros/teiu5.html
[Bernsen et al. 1996]
Niels Ole Bernsen, Hans Dybkjær and Laila Dybkjær: Cooperativity in human-machine and human-human spoken dialogue. Discourse Processes, vol. 21, no. 2, 1996, 213-236.
[Clark 1995]
James Clark: SP. Parsing package for SGML. URL: http://www.jclark.com/sp.html
[Dybkjær 1995]
Hans Dybkjær: Priming and Vocabulary, [Dybkjær et al. 1995a, Annex 14].
[Dybkjær et al. 1995a]
Laila Dybkjær, Niels Ole Bernsen, and Hans Dybkjær: User test of the Danish spoken language dialogue system. Spoken Language Dialogue Systems, report 9b, Centre for Cognitive Science, Roskilde University. 14+200+105pp.
[Dybkjær et al. 1995b]
Hans Dybkjær, Laila Dybkjær and Niels Ole Bernsen: Design, formalisation and evaluation of spoken language dialogue. In proceedings of TWLT 9, Twente, June. pp 67-82.
[Ide and Véronis 1995]
Nancy Ide and Jean Véronis (eds.): Text Encoding Initiative. Background and Context. Kluwer Academic Publishers, Netherlands, 1995. 242pp.
[Sperberg-McQueen and Burnard 1994]
C. Michael Sperberg-McQueen and Lou Burnard (eds.): Guidelines for Electronic Text Encoding and Interchange. The Association for Computers and the Humanities, The Association for Computational Linugistics, and The Association for Literary and Linguistic Computing. TEI P3, Text Encoding Initiative, Chicago, Oxford, April 8, 1994. xxvi+1290pp. Also URL: http://etext.virginia.edu/TEI.html
[Sperberg-McQueen 1995]
C. Michael Sperberg-McQueen: Bare Bones TEI. A Very Very Small Subset of the TEI Encoding Scheme. Document no. TEI U6, URL: http://www.uic.edu:80/orgs/tei/intros/teiu6.html

Appendix A [« | »] Code tables

Table 2 [« | »] Database request types.
Name Action [parameters] Return
CUSTOMER Check the customer id.
[cust-nr]
STATUS
ROUTE Check route between from and to.
[from, to]
STATUS
PERSON Check id-number and return initials.
[cust-nr, id]
PERSON
DAY Check and complete day of week and date.
[day]
DAY
PERIOD Check if (outdate,homedate) for given discount.
[outday, homeday, farename]
PERIOD
HOUR Check if there is a feasible departure.
[route, date, hour, farename, nr-of-persons]
DEPARTURES
TIMEOFDAY Check if there are feasible departures.
[route, date, time-of-day, farename, nr-of-persons]
DEPARTURES
HOUR_INTERVAL Check if outhour < homehour if outdate = homedate.
[outhour, homehour, outdate, homedate]
STATUS
FARE Check parameters and compute a price.
[cust-nr, route, persons, return, farename]
FARE
RESERVE Check parameters and make a reservation.
[cust-nr, route, persons, outdate, outhour, return, homedate, homehour, farename]
RESERVATION
DELIVERY Check parameters and set delivery type.
[delivery, cust-nr, reference]
STATUS
RESERVATION Delete or retrieve a reservation.
[action, cust-nr, reference]
RESERVATION
Table 3 [« | »] Database return types.
Name Parameters
STATUS Status, see Table 4
PERSON Status, initials
DAY Status, day
PERIOD Status, outday, homeday
DEPARTURES Status, free seats, seats, departures
FARE Status, fare
RESERVATION Status, days left, cust-nr, reference, from, to, outday, outhour, return, homeday, homehour, persons, fare, farename, delivery
Table 4 [« | »] Database status values.
All requests to the database will return at least a status field.
Status Description
DB_ERROR General: error
DB_OK General: no errors found
DB_SOLD_OUT Departure(s): sold out
DB_NONE Departure(s): none such
DB_DISCOUNT Departure(s): no red/green departure(s)
DB_OUT_BEFORE_HOME Period,hour_interval: don't out before home
DB_DAY_OF_MONTH Day: day of month does not fit within month
DB_MONTH Day: month not in JAN...DEC
DB_YEAR Day: year outside program astronomy
DB_DAY_BEFORE_TODAY Day: only queries after now
DB_DOW Day: day of week is inconsistent with date
DB_REL Day: relative day is inconsistent with dow/date
DB_SAMEDAY Day: day not same day as (relative) today
DB_OUTSIDE_PERIODS Day: no timetable for this day
DB_NO_ROUTE Route: no such route
DB_NO_FLIGHTS Flights: no flights at specified date
DB_NO_FLIGHT Flight: no flight with given number
DB_MIN_TIME Reservation: too few days between out and home
DB_MAX_TIME Reservation: too many days between out and home
DB_NO_RESERVATION Reservation: none with tried number
DB_NO_CUST Customer: none with tried number
DB_TWICE Person: Person id chosen for the second time
Table 5 [« | »] Semantic slots.
Slot Description
action correct | repeat | help
choice 0 (no) | 1 (yes)
from The name of a departure airport
to The name of a destination airport
airport The name of an airport
day_of_week Mandag ... søndag | today | tomorrow | sameday
time_of_day morgen | formiddag | middag | eftermiddag | aften | dag | nat
number Integer used for minutes*, hours, day of month, month, reference, number of persons, id-number, and customer number (cust-nr).
delivery airport | post
*) Negative minutes denote minutes to an hour, positive minutes after an hour.
Table 6 [« | »] Topics.
The topic usually refers to the current domain item. Exceptions are meta-correct and meta-repeat which refer to the discourse; for domain purposes they may be seen to inherit the previous topic.
Topic name Description of topic
toptask Overall greetings and "do you want more?".
knows Does the user know the system?
customer The number of the customer to pay for the tickets.
persons The number of persons to travel.
idN Obtaining the identification number of the Nth traveller, N<=9.
ids The whole set of travellers.
from The departure airport.
to The destination airport.
return Is it a return travel?
outdate The day for starting the travel?
outhour The departure time.
homedate The day for travelling home.
homehour The departure time for the home travel.
reservation The whole reservation and its making.
deliver Will the traveller receive the tickets via post or pick them up in the airport?
meta-correct The user wants to correct a previous item.
meta-repeat Please repeat the latest information.

Appendix B [« | »] Software

The creation, maintenance and use of the corpus has been supported by a number of software tools. A number of public SGML software tools are presented on WWW. In the current project the following tools have been used:

Appendix B.1 [« | »] Trans package

The software package trans has routines for reading/writing restricted SGML markup documents. Currently the program trans is available:


	Transform dialogue/TEI transcript to html.

	Usage:

	   trans {options}  file

	where options are [default]:

           -h  Help [this table].

           -v  Version.

           -e  Print extra information.

	and file is the name of the transcription file.

      

The extra information provided with option -e is a summary of the internal modules communication, cf. Section 4.5.

The parsing is simple and directed by a definitions file with the format (quoted from the program text):


	// Definition ::= - '(' - name WS Def_type - ')' - .

	// Type ::=	s	// synthetic:	only <name attributes>

	//		t	// textual:	full but text body only

	//		c	// composite:	full but markup* body only

	//		g	// generic:	toplevel

	//		a	// auxiliary:	comments, declarations, ...

      
where

	// WS ::= (space | tab | cr | lf | ff) + .  // whitespace()

	// - ::= [WS] .  // skip()

	// In other words,	whitespace conform to C++ isspace predicate.

	//			- is optional whitespace,

	//			WS is required whitespace.

      

An example is given in trans.defs. The markup is read after the syntax:


	// Markup ::= Synthetic | Textual | Composite | Auxiliary | Generic .

	// Synthetic ::= '<' Name Attribute* '>' .

	// Textual   ::= '<' Name Attribute* '>' Text             '<' '/' Name '>' .

	// Composite ::= '<' Name Attribute* '>' (Text | Markup)* '<' '/' Name '>' .

	// Generic   ::= (Text | Markup)* .

	// Auxiliary ::= '<' - '!' Text '>' .

	// Text ::= - ( Char \ {<,>} )* - .

      

Appendix C [« | »] Log example

This appendix presents as an example part of a log file from the user test of the running system. An explanation of the transcription and its elements is given in sections 2 and 3. The example corresponds to this part of the transcription as abbreviated in the HTML version:

outhour U2-17a Ni ti .
keyed: niti
recognised [Hour/0.000000] : ni ti
semantics: number "9" number "10"
database [query/HOUR]
database [answer/DEPARTURES] DB_OK

The corresponding full TEI transcription is given here, bolding the above extracted parts:


<u id="U2-17a" who="C-2" topic=outhour><t type=cardinal value="9">Ni</t> <t type=cardinal value="10">ti</t>.</u>

<keyed which="U2-17a">niti</keyed>

<recognised which="U2-17a" grammar="Hour" score=0.000000>ni ti</recognised>

<parse which="U2-17a">

   | Current parsecontext: grammarset:  Command Command Command Yesno Yesno Hour Yesno Hour

   | semantic objects:

   | actionso(action ActionSO) [action --NULL--]

   | yesnoso(choice BooleanSO) [choice --NULL--]

   | time_of_dayso(time_of_day Time_of_daySO) [time_of_day --NULL--]

   | hourso(hours IntSO, minutes IntSO) [hours [ones [number --NULL--, sign --NULL--], tens [number --NULL--, sign --NULL--], hundreds [number --NULL--, sign --NULL--]], minutes [ones [number --NULL--, sign --NULL--], tens [number --NULL--, sign --NULL--], hundreds [number --NULL--, sign --NULL--]]]

   | Resulting Parse Tree # 0

   | Subgrammar[ 4 ]: Hour

   | l:[s_2,sem={hours={ones={number=9}},minutes={ones={number=10}}}]:{cat=s}.[

   |   l:[hour_p_3,sem={hours={ones={number=9}},minutes={ones={number=10}}}]:{cat=hour_p}.[

   |     l:[cardh_p_3]:{cat=cardh_p,hour_fg1=yes,minut_fg1=no,nb=plu,hour_fg2=yes,minut_fg2=yes,minut_fg3=yes,minut_fg4=yes,minut_fg5=yes}.[

   |       l:[ni_1]:{cat=card,hour_fg1=yes,minut_fg1=no,nb=plu,hour_fg2=yes,minut_fg2=yes,minut_fg3=yes,minut_fg4=yes,minut_fg5=yes,dalu=ni,lex=ni,scat=no,prae_comb=yes,post_comb=no,hour=no,int=9,tens=0}.[

   |       ]

   |     ],

   |     l:[cardh_p_3]:{cat=cardh_p,minut_fg1=yes,minut_fg4=no,hour=no,hour_fg1=yes,nb=plu,hour_fg2=yes,minut_fg2=yes,minut_fg3=yes,minut_fg5=yes}.[

   |       l:[ti_1]:{cat=card,hour_fg1=yes,minut_fg1=yes,nb=plu,hour_fg2=yes,minut_fg2=yes,minut_fg3=yes,minut_fg4=no,minut_fg5=yes,dalu=ti,lex=ti,scat=no,prae_comb=no,post_comb=no,int=10,tens=0}.[

   |       ]

   |     ]

   |   ]

   | ]

   | set Slot number "9"

   |  set Slot number "10"

</parse>

<database type="query" modifier="HOUR">

   | destination: CPH

   | destination: KAR

   | date: year:  1995

   |       month: 01

   |       day:   13

   |       dow: FRI

   |       relday: 0

   |       sameday: VOID

   | hour: hours: 9

   |     minutes: 10

   | discount: BLUE

   | travellers: 2

</database>

<database type="answer" modifier="DEPARTURES">

   | status: DB_OK

   | free: 1

   | total: 1

   | ----- departure:

   | hour: hours: 9

   |     minutes: 10

   | free: 8

   | discount: BLUE

</database>

      

The corresponding log is given below. Note how all communication with external modules are timestamped. The information could have been encoded via the use of a timeline, but thishas been omitted. The time stamps have no useful correlation with the spoken dialogue. Instead the time line is implicit in the sequencing of the utterances and in the markup of pauses and overlap. Note however, that elapsed turn and dialogue times are not recorded. Note also how most of the log has been deleted retaining only what is directly relevant to the speech and its processing as part of the discourse. Again, the parts corresponding to the abbreviated HTML transcription have been highlighted in bold. The user phrase "Ni ti", the topic and the user utterance id as well as any lexical information are not present in the log but have been added later to the TEI transcriptions. Added comments are in italics


	keyed: Sinput >

	keyed: Active Grammars: Command Hour Yesno

	 Switched on: Command

	 Switched on: Command

	 Switched on: Command

	 Switched on: Yesno

	 Switched on: Yesno

	 Switched on: Hour

	 Switched on: Yesno

	 Switched on: Hour

	> 36928 !SET REC1 ICM0 DONE

	        VOID  ;

	> 36929 !STA REC1 ICM0 DONE

	        VOID  ;

	> 36931 !BRK REC1 ICM0 DONE

	        VOID  ;

	> 36931 !SET REC1 ICM0 DONE

	        VOID  ;

	> 36931 !SET REC1 ICM0 DONE

	        VOID  ;

	> 36931 !SET REC1 ICM0 DONE

	        VOID  ;

	> 36931 !STA REC1 ICM0 DONE

	        VOID  ;

	> 36938 !PER PLA0 ICM0 DONE

	        VOID  ;

	9 10

	keyed: Sinput >niti

	keyed: grammar Hour, score 0.000000

	> 37257 EVE REC1 ICM0 RSENT

	        STRING6 "ni ti"  ;

	< 37258 SET ICM0 REC1 GSUB

	        INT4 0  ;

	Current parsecontext: grammarset:  Command Command Command Yesno Yesno Hour Yesno Hour

	semantic objects:

	actionso(action ActionSO) [action <<NULL>>]

	yesnoso(choice BooleanSO) [choice <<NULL>>]

	time_of_dayso(time_of_day Time_of_daySO) [time_of_day <<NULL>>]

	hourso(hours IntSO, minutes IntSO) [hours [ones [number <<NULL>>, sign <<NULL>>], tens [number <<NULL>>, sign <<NULL>>], hundreds [number <<NULL>>, sign <<NULL>>]], minutes [ones [number <<NULL>>, sign <<NULL>>], tens [number <<NULL>>, sign <<NULL>>], hundreds [number <<NULL>>, sign <<NULL>>]]]





	Resulting Parse Tree # 0

	Subgrammar[ 4 ]: Hour



	l:[s_2,sem={hours={ones={number=9}},minutes={ones={number=10}}}]:{cat=s}.[

	  l:[hour_p_3,sem={hours={ones={number=9}},minutes={ones={number=10}}}]:{cat=hour_p}.[

	    l:[cardh_p_3]:{cat=cardh_p,hour_fg1=yes,minut_fg1=no,nb=plu,hour_fg2=yes,minut_fg2=yes,minut_fg3=yes,minut_fg4=yes,minut_fg5=yes}.[

	      l:[ni_1]:{cat=card,hour_fg1=yes,minut_fg1=no,nb=plu,hour_fg2=yes,minut_fg2=yes,minut_fg3=yes,minut_fg4=yes,minut_fg5=yes,dalu=ni,lex=ni,scat=no,prae_comb=yes,post_comb=no,hour=no,int=9,tens=0}.[

	      ]

	    ],

	    l:[cardh_p_3]:{cat=cardh_p,minut_fg1=yes,minut_fg4=no,hour=no,hour_fg1=yes,nb=plu,hour_fg2=yes,minut_fg2=yes,minut_fg3=yes,minut_fg5=yes}.[

	      l:[ti_1]:{cat=card,hour_fg1=yes,minut_fg1=yes,nb=plu,hour_fg2=yes,minut_fg2=yes,minut_fg3=yes,minut_fg4=no,minut_fg5=yes,dalu=ti,lex=ti,scat=no,prae_comb=no,post_comb=no,int=10,tens=0}.[

	      ]

	    ]

	  ]

	]



	set Slot number "9"

	set Slot number "10"

	< 37382 SET ICM0 REC1 GSUB

	        INT4 3  ;

	< 37400 PER ICM0 APP0 APP_P5

	(database [query/HOUR])

	        LIST (

	          BINARY1 $02

	          BINARY1 $05

	          LIST (

	            LIST (

	              INT4 1995

	              BINARY1 $01

	              INT4 13

	            )

	            BINARY1 $04

	            INT4 0

	            VOID

	          )

	          LIST (

	            INT4 9

	            INT4 10

	          )

	          BINARY1 $02

	          INT4 2

	        ) ;

	< 37434 STA ICM0 REC1 -

	        VOID  ;

	keyed: [0 ni 2][2 ti 4]Resulting string: ni ti

	keyed:

	keyed: Active Grammars: Command

	 Switched on: Command

	 Switched on: Command

	 Switched on: Command

	> 37435 !SET REC1 ICM0 DONE

	        VOID  ;

	> 37436 !SET REC1 ICM0 DONE

	        VOID  ;

	> 37436 !STA REC1 ICM0 DONE

	        VOID  ;

	> 37613 EVE APP0 ICM0 APP_P5

	(database [answer/DEPARTURES])

	        LIST (

	          BINARY1 $01 (status DB_OK)

	          INT1 1

	          INT1 1

	          LIST (

	            LIST (

	              LIST (

	                INT1 9

	                INT1 10

	              )

	              INT1 8

	              BINARY1 $02

	            )

	            VOID

	            VOID

	            VOID

	            VOID

	            VOID

	            VOID

	            VOID

	            VOID

	            VOID

	            VOID

	            VOID

	            VOID

	            VOID

	            VOID

	            VOID

	            VOID

	            VOID

	            VOID

	            VOID

	          )

	        ) ;

      

[ Contents ]

NISLab NISLab, University of Southern Denmark, Odense
comments to: laila@nis.sdu.dk
© 1995 CCS - Hans Dybkjær, revised: $Date: 1996/09/02 09:18:28 $, revised: 1997/05/17, footer and affiliation revised 2003-07-13 (HD) links corrected 2005-07-26 (HD)