Resource directory for the Z39.98-2012 Authoring and Interchange
SSML Integration Feature
version 1.0

Table of Contents

Introduction

The Z39.98 SSML Integration Feature recasts a subset of the W3C Speech Synthesis Markup Language (SSML) Version 1.1 as a Feature for incorporation in Z39.98-AI Profiles.

The Z39.98 SSML Integration Feature is designed to be used in authoring contexts where speech output is targeted. The feature's content model definitions are designed so that a processing agent can safely ignore or filter out the SSML fragments when speech-related information is not relevant.

Each element contributed by the feature inherits the corresponding semantics as defined by SSML 1.1, unless specified otherwise.

This feature is maintained by the ANSI/NISO Z39.98 advisory committee under the auspices of NISO.

Normative References

The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this section are to be interpreted as described in RFC2119.

Version information

This resource directory represents version 1.0 of the SSML Integration feature:

This release may not be the most recently published (current) version of the SSML Integration feature. The current version should always be obtained from the static URI: http://www.daisy.org/z3998/2012/auth/features/ssml/current/

Identification

This feature must be identified as ssml in Z39.98-AI document feature declarations.

The canonical identity URI is: http://www.daisy.org/z3998/2012/auth/features/ssml/1.0/

Specification compliance

This version of the feature is compliant with the Z39.98-2012 Specification.

Normative schemata

The normative RelaxNG schema for version 1.0 of the SSML Integration feature is z3998-feature-ssml.rng.

Note - this feature schema does not represent an entire document model; it is intended for inclusion in host profiles.

The normative schema includes a number of modules and/or subschemas, which are listed in Appendix 1.

Available components

This feature makes the following components available for inclusion in host profiles:

Processing agent behavior requirements

This section defines processing agent behaviors that extend the default behaviors defined in Processing agent conformance definition.

Feature supported

If a processing agent supports this feature, it must comply to the following:

Processing of phoneme

The processing agent must support the SSML phoneme element, and process it as dictated by the SSML specification.

Upon encountering the ssml:ph and associated ssml:alphabet attributes on a non-SSML namespace element, the processing agent must process the element equally to the SSML phoneme element.

Processing of token

The processing agent must support the SSML token element, and process it as dictated by the SSML specification.

The w (word) element from the Z39.98-AI default namespace must be regarded a synonym of the SSML token element.

Processing of lexicon

The processing agent must support the SSML lexicon element, and process it as dictated by the SSML specification.

The processing agent must support the PLS ( application/pls+xml) media type, and process PLS documents as dictated by PLS.

Processing of prosody, say-as, sub and break

The processing agent should support the SSML prosody , say-as , sub and break elements, and process them as dictated by the SSML specification.

If it does not support these elements, the processing agent must employ the behavior defined in Ignore below.

Note that the expansion and name elements when referenced from an abbr element have semantics that mean that they provide content that can be used synonomously to the content of the alias attribute on the ssml:sub element.

Phonetic Alphabets

Processing agents should support the X-SAMPA phonetic alhabet.

Processing in non-speech output contexts
In a non-speech output context, the processing agent must employ one of the the behaviors defined in Feature recognized below.

Feature recognized

If a processing agent recognizes but does not support this feature, it must employ one of the following behaviors:

Abort

Upon encountering a document instance with this feature enabled, the processing agent issues a notification, and then aborts the processing.

Ignore

While traversing the document tree, the processing agent ignores any encountered XML element in the SSML namespace, and continues processing its children.

Any encountered attributes in the SSML namespace occuring on non-SSML namespace elements are also ignored.

Discard

The processing agent discards all elements and attributes contributed by this feature;

  • Encountered elements in the SSML namespace are recursively replaced by their children, or by void if no children exist;
  • any encountered attributes in the SSML namespace occuring on non-SSML namespace elements are removed.

The abort behavior is the default; the ignore and discard behaviors must only be employed when the processing agent is explicitly instructed to do so by the client.

Processing agents that employ the ignore or discard behaviors should issue a notification.

Feature not recognized

If a processing agent does not recognize this feature, it must, as dictated in Processing agent conformance definition, abort processing and issue an error message.

Feature component definitions

The component definitions provided below follow the conventions used in Core Modules.

The SSML Element Integration Module

Provides a subset of the W3C Speech Synthesis Markup Language (SSML) Version 1.1 element set, suitable for integration in Z39.98-2012 Profiles.

The SSML Element Integration Module: Element overview
Name Default attribute model Default content model Default usage context
break strength?, time?, xml:id? empty Phrase.class
phoneme ph, alphabet?, z3998.Core.attrib, z3998.I18n.attrib (text | Text.class | Phrase.class)+ Phrase.class
phoneme ph, alphabet?, z3998.Core.attrib, z3998.I18n.attrib (text | Text.class)+ Text.class
prosody pitch?, contour?, range?, rate?, duration?, volume?, z3998.Core.attrib, z3998.I18n.attrib (text | Text.class | Phrase.class)+ Phrase.class
prosody pitch?, contour?, range?, rate?, duration?, volume?, z3998.Core.attrib, z3998.I18n.attrib (text | Text.class)+ Text.class
say-as interpret-as, format?, detail?, z3998.Core.attrib, z3998.I18n.attrib (text | Text.class | Phrase.class)+ Phrase.class
say-as interpret-as, format?, detail?, z3998.Core.attrib, z3998.I18n.attrib (text | Text.class)+ Text.extern.class
sub alias, z3998.Core.attrib, z3998.I18n.attrib (text | Text.class | Phrase.class)+ Phrase.class
sub alias, z3998.Core.attrib, z3998.I18n.attrib (text | Text.class)+ Text.class
token Phrase.attrib (text | Text.class)+ Phrase.class
token Text.attrib (text | Text.class)+ Text.class
lexicon uri, xml:id, type? empty The lexicon element is allowed in the document head .
The SSML Element Integration Module: Attribute overview
Name Default values Default usage context
strength 'none' | 'x-weak' | 'weak' | 'medium" | 'strong' | 'x-strong' ssml:break
time TimeValue ssml:break
ssml:onlangfailure 'changevoice" | 'ignoretext' | 'ignorelang' | 'processorchoice' Document.attrib , Phrase.attrib and Text.attrib
ph PhoneticExpression ssml:phoneme
alphabet alphabet ssml:phoneme
pitch 'x-low' | 'low' | 'medium' | 'high' | 'x-high' | 'default' | RelativeChange | PitchExpression ssml:prosody
contour PitchContour ssml:prosody
range 'x-low' | 'low' | 'medium' | 'high' | 'x-high' | 'default' | RelativeChange | PitchExpression ssml:prosody
rate 'x-slow' | 'slow' | 'medium' | 'fast' | 'x-fast' | 'default' | NonNegativePercentage ssml:prosody
duration TimeValue ssml:prosody
volume 'silent' | 'x-soft' | 'soft' | 'medium' | 'loud' | 'x-loud' | 'default' | VolumeExpression ssml:prosody
interpret-as 'date' | 'time' | 'telephone' | 'characters' | 'cardinal' | 'ordinal' ssml:say-as
format text ssml:say-as
detail text ssml:say-as
alias text ssml:sub
uri URI ssml:lexicon
type MediaType ssml:lexicon
The SSML break element

Controls the pausing or other prosodic boundaries between tokens.

Refer to SSML 1.1 for further information.

The SSML break element
Local name break
Namespace http://www.w3.org/2001/10/synthesis
Default usage context Phrase.class
Default attribute model strength?, time?, xml:id?
Default content model empty
Optionality This element must not be omitted when activating this module.
The SSML phoneme element (Phrase)

Provides a phonemic/phonetic pronunciation for the contained text.

The phoneme element may be empty. However, it is recommended that the element contain human-readable text that can be used for non-spoken rendering of the document.

Refer to SSML 1.1 for further information.

The SSML phoneme element (Phrase)
Local name phoneme
Namespace http://www.w3.org/2001/10/synthesis
Default usage context Phrase.class
Default attribute model ph, alphabet?, z3998.Core.attrib, z3998.I18n.attrib
Default content model (text | Text.class | Phrase.class)+
Optionality This element must not be omitted when activating this module.

The following model restrictions apply to this element:

  • The ssml:phoneme element must not have ssml namespace element or attribute descendants.

  • The ssml:phoneme element must neither be empty nor contain only whitespace.

The SSML phoneme element (Text)

Provides a phonemic/phonetic pronunciation for the contained text.

The phoneme element may be empty. However, it is recommended that the element contain human-readable text that can be used for non-spoken rendering of the document.

Refer to SSML 1.1 for further information.

The SSML phoneme element (Text)
Local name phoneme
Namespace http://www.w3.org/2001/10/synthesis
Default usage context Text.class
Default attribute model ph, alphabet?, z3998.Core.attrib, z3998.I18n.attrib
Default content model (text | Text.class)+
Optionality This element must not be omitted when activating this module.
The SSML prosody element (Phrase)

Permits control of the pitch, speaking rate and volume of speech output.

Refer to SSML 1.1 for further information.

The SSML prosody element (Phrase)
Local name prosody
Namespace http://www.w3.org/2001/10/synthesis
Default usage context Phrase.class
Default attribute model pitch?, contour?, range?, rate?, duration?, volume?, z3998.Core.attrib, z3998.I18n.attrib
Default content model (text | Text.class | Phrase.class)+
Optionality This element must not be omitted when activating this module.

The following model restrictions apply to this element:

  • The ssml:prosody element must not have ssml:prosody descendants.

  • The ssml:prosody element must neither be empty nor contain only whitespace.

The SSML prosody element (Text)

Permits control of the pitch, speaking rate and volume of speech output.

Refer to SSML 1.1 for further information.

The SSML prosody element (Text)
Local name prosody
Namespace http://www.w3.org/2001/10/synthesis
Default usage context Text.class
Default attribute model pitch?, contour?, range?, rate?, duration?, volume?, z3998.Core.attrib, z3998.I18n.attrib
Default content model (text | Text.class)+
Optionality This element must not be omitted when activating this module.
The SSML say-as element (Phrase)

Provides information on the type of text construct contained within the element to help specify the level of detail for rendering the contained text.

Refer to SSML 1.1 for further information.

The SSML say-as element (Phrase)
Local name say-as
Namespace http://www.w3.org/2001/10/synthesis
Default usage context Phrase.class
Default attribute model interpret-as, format?, detail?, z3998.Core.attrib, z3998.I18n.attrib
Default content model (text | Text.class | Phrase.class)+
Optionality This element must not be omitted when activating this module.

The following model restrictions apply to this element:

  • The ssml:say-as element must neither be empty nor contain only whitespace.

The SSML say-as element (Text)

Provides information on the type of text construct contained within the element to help specify the level of detail for rendering the contained text.

Refer to SSML 1.1 for further information.

The SSML say-as element (Text)
Local name say-as
Namespace http://www.w3.org/2001/10/synthesis
Default usage context Text.extern.class
Default attribute model interpret-as, format?, detail?, z3998.Core.attrib, z3998.I18n.attrib
Default content model (text | Text.class)+
Optionality This element must not be omitted when activating this module.
The SSML sub element (Phrase)

Indicates that the text in the alias attribute value replaces the contained text for pronunciation.

Refer to SSML 1.1 for further information.

The SSML sub element (Phrase)
Local name sub
Namespace http://www.w3.org/2001/10/synthesis
Default usage context Phrase.class
Default attribute model alias, z3998.Core.attrib, z3998.I18n.attrib
Default content model (text | Text.class | Phrase.class)+
Optionality This element must not be omitted when activating this module.

The following model restrictions apply to this element:

  • The ssml:sub element must neither be empty nor contain only whitespace.

The SSML sub element (Text)

Indicates that the text in the alias attribute value replaces the contained text for pronunciation.

Refer to SSML 1.1 for further information.

The SSML sub element (Text)
Local name sub
Namespace http://www.w3.org/2001/10/synthesis
Default usage context Text.class
Default attribute model alias, z3998.Core.attrib, z3998.I18n.attrib
Default content model (text | Text.class)+
Optionality This element must not be omitted when activating this module.
The SSML token element (Phrase)

Indicates that the content is a token in order to to eliminate token (word) segmentation ambiguities of a synthesis processor.

Refer to SSML 1.1 for further information.

The SSML token element (Phrase)
Local name token
Namespace http://www.w3.org/2001/10/synthesis
Default usage context Phrase.class
Default attribute model Phrase.attrib
Default content model (text | Text.class)+
Optionality This element must not be omitted when activating this module.

The following model restrictions apply to this element:

  • The ssml:token element must neither be empty nor contain only whitespace.

The SSML token element (Text)

Indicates that the content is a token in order to to eliminate token (word) segmentation ambiguities of a synthesis processor.

Refer to SSML 1.1 for further information.

The SSML token element (Text)
Local name token
Namespace http://www.w3.org/2001/10/synthesis
Default usage context Text.class
Default attribute model Text.attrib
Default content model (text | Text.class)+
Optionality This element must not be omitted when activating this module.
The SSML lexicon element

Specifies a reference to a lexicon document.

Refer to SSML 1.1 for further information.

The SSML lexicon element
Local name lexicon
Namespace http://www.w3.org/2001/10/synthesis
Default usage context The lexicon element is allowed in the document head .
Default attribute model uri, xml:id, type?
Default content model empty
Optionality This element must not be omitted when activating this module.
The SSML strength attribute

Indicates the prosodic strength of the break in the speech output.

Refer to SSML 1.1 for further information.

The SSML strength attribute
Local name strength
Namespace None
Default usage context ssml:break
Default value(s) 'none' | 'x-weak' | 'weak' | 'medium" | 'strong' | 'x-strong'
Optionality This attribute must not be omitted when activating this module.
The SSML time attribute

Indicates the duration of a pause to be inserted in the output in seconds or milliseconds.

Refer to SSML 1.1 for further information.

The SSML time attribute
Local name time
Namespace None
Default usage context ssml:break
Default value(s) TimeValue
Optionality This attribute must not be omitted when activating this module.
The SSML onlangfailure attribute

Describes the desired behavior of a synthesis processor upon language speaking failure. The value of this attribute is inherited by descendants.

Refer to SSML 1.1 for further information.

The SSML onlangfailure attribute
Local name ssml:onlangfailure
Namespace http://www.w3.org/2001/10/synthesis
Default usage context Document.attrib , Phrase.attrib and Text.attrib
Default value(s) 'changevoice" | 'ignoretext' | 'ignorelang' | 'processorchoice'
Optionality This attribute must not be omitted when activating this module.
The SSML ph attribute

Specifies a phonemic/phonetic pronunciation for the text contained in the current element.

Refer to SSML 1.1 for further information.

The SSML ph attribute
Local name ph
Namespace None
Default usage context ssml:phoneme
Default value(s) PhoneticExpression
Optionality This attribute must not be omitted when activating this module.
The SSML alphabet attribute

Specifies which phonemic/phonetic pronunciation alphabet is used in the ph attribute.

If omitted, the implicit value x-SAMPA is assumed.

Refer to SSML 1.1 for further information.

The SSML alphabet attribute
Local name alphabet
Namespace None
Default usage context ssml:phoneme
Default value(s) alphabet
Optionality This attribute must not be omitted when activating this module.
The SSML pitch attribute

Specifies the baseline pitch for the contained text.

The labels x-low through x-high represent a sequence of monotonically non-decreasing pitch levels.

Refer to SSML 1.1 for further information.

The SSML pitch attribute
Local name pitch
Namespace None
Default usage context ssml:prosody
Default value(s) 'x-low' | 'low' | 'medium' | 'high' | 'x-high' | 'default' | RelativeChange | PitchExpression
Optionality This attribute must not be omitted when activating this module.
The SSML contour attribute

Sets the pitch contour for the contained text.

Refer to SSML 1.1 for further information.

The SSML contour attribute
Local name contour
Namespace None
Default usage context ssml:prosody
Default value(s) PitchContour
Optionality This attribute must not be omitted when activating this module.
The SSML range attribute

Specifies the pitch range (variability) for the contained text.

Refer to SSML 1.1 for further information.

The SSML range attribute
Local name range
Namespace None
Default usage context ssml:prosody
Default value(s) 'x-low' | 'low' | 'medium' | 'high' | 'x-high' | 'default' | RelativeChange | PitchExpression
Optionality This attribute must not be omitted when activating this module.
The SSML rate attribute

Specifies a change in the speaking rate for the contained text.

The values x-slow through x-fast represent a sequence of monotonically non-decreasing speaking rates.

Refer to SSML 1.1 for further information.

The SSML rate attribute
Local name rate
Namespace None
Default usage context ssml:prosody
Default value(s) 'x-slow' | 'slow' | 'medium' | 'fast' | 'x-fast' | 'default' | NonNegativePercentage
Optionality This attribute must not be omitted when activating this module.
The SSML duration attribute

Specifies a value in seconds or milliseconds for the desired time to take to read the contained text.

Refer to SSML 1.1 for further information.

The SSML duration attribute
Local name duration
Namespace None
Default usage context ssml:prosody
Default value(s) TimeValue
Optionality This attribute must not be omitted when activating this module.
The SSML volume attribute

Specifies the volume for the contained text.

If omitted, the implicit value +0.0dB is assumed.

Refer to SSML 1.1 for further information.

The SSML volume attribute
Local name volume
Namespace None
Default usage context ssml:prosody
Default value(s) 'silent' | 'x-soft' | 'soft' | 'medium' | 'loud' | 'x-loud' | 'default' | VolumeExpression
Optionality This attribute must not be omitted when activating this module.
The SSML interpret-as attribute

Indicates the content type of the contained text construct.

Refer to SSML 1.1 for further information.

The SSML interpret-as attribute
Local name interpret-as
Namespace None
Default usage context ssml:say-as
Default value(s) 'date' | 'time' | 'telephone' | 'characters' | 'cardinal' | 'ordinal'
Optionality This attribute must not be omitted when activating this module.
The SSML format attribute

In addition to interpret-as, provides further hints on the precise formatting of the contained text for content types that may have ambiguous formats.

Refer to SSML 1.1 for further information.

The SSML format attribute
Local name format
Namespace None
Default usage context ssml:say-as
Default value(s) text
Optionality This attribute must not be omitted when activating this module.
The SSML detail attribute

Indicates the level of detail to be read aloud or rendered.

Refer to SSML 1.1 for further information.

The SSML detail attribute
Local name detail
Namespace None
Default usage context ssml:say-as
Default value(s) text
Optionality This attribute must not be omitted when activating this module.
The SSML alias attribute

Specifies the string to be spoken instead of the string in the sub element.

Refer to SSML 1.1 for further information.

The SSML alias attribute
Local name alias
Namespace None
Default usage context ssml:sub
Default value(s) text
Optionality This attribute must not be omitted when activating this module.
The SSML uri attribute

Identifies the location of the lexicon document.

Refer to SSML 1.1 for further information.

The SSML uri attribute
Local name uri
Namespace None
Default usage context ssml:lexicon
Default value(s) URI
Optionality This attribute must not be omitted when activating this module.
The SSML type attribute

Specifies the media type of the lexicon document. The implicit value of this attribute is application/pls+xml, the media type associated with the Pronunciation Lexicon Specification.

Refer to SSML 1.1 for further information.

The SSML type attribute
Local name type
Namespace None
Default usage context ssml:lexicon
Default value(s) MediaType
Optionality This attribute must not be omitted when activating this module.
The SSML Element Integration Module - Implementations
Schema Language
ssml-11.rng RelaxNG

Activation of this module depends on the Core, datatypes, global-classes, I18n and ssml-datatypes modules also being activated.

The SSML Feature module depends on this module being activated.

The SSML phoneme attribute module

Defines an adaption of the SSML phoneme element as an attribute, enabling the provision of pronounciation information on elements that are not in the SSML namespace.

The SSML phoneme attribute module: Attribute overview
Name Default values Default usage context
ph PhoneticExpression Phrase.attrib and Text.attrib
alphabet alphabet On elements where ssml:ph occurs.
The SSML ph attribute

Specifies a phonemic/phonetic pronunciation for the text contained in the current element.

This attribute inherits the semantics of the ph attribute on the SSML ssml:phoneme element.

Note that this attribute is namespace qualified and intended for use on non-SSML namespace elements, as opposed to the default (non-qualified) ph attribute, which is only allowed on the ssml:phoneme element.

Consult Speech Synthesis Markup Language (SSML) Version 1.1 for further information.

The SSML ph attribute
Local name ph
Namespace http://www.w3.org/2001/10/synthesis
Default usage context Phrase.attrib and Text.attrib
Value(s) PhoneticExpression
Value alterability The defined value(s) or datatype(s) are fixed, and must not be altered when activating this module.
Optionality This attribute must not be omitted when activating this module.

The following model restrictions apply to this attribute:

  • Elements with the ssml:ph attribute element must not have ssml:phoneme descendants, nor descendants with the ssml:ph attribute.

  • The ssml:ph attribute element must neither be empty nor contain only whitespace.

The SSML alphabet attribute

Specifies which phonemic/phonetic pronunciation alphabet is used in the value of the ssml:ph attribute.

Note that this attribute is namespace qualified and intended for use on non-SSML namespace elements in conjunction with the ssml:ph attribute.

If omitted, the implicit value x-SAMPA is assumed.

Consult Speech Synthesis Markup Language (SSML) Version 1.1 for further information.

The SSML alphabet attribute
Local name alphabet
Namespace http://www.w3.org/2001/10/synthesis
Default usage context On elements where ssml:ph occurs.
Value(s) alphabet
Value alterability The defined value(s) or datatype(s) are fixed, and must not be altered when activating this module.
Optionality This attribute must not be omitted when activating this module.
The SSML phoneme attribute module - Implementations
Schema Language
ssml-phoneme-attrib.rng RelaxNG

Activation of this module depends on the ssml-datatypes module also being activated.

The SSML Feature module depends on this module being activated.

The SSML Datatypes module

This module defines a set of datatypes related to SSML

The SSML Datatypes module
Name Definition
PhoneticExpression A phonetic or phonemic expression.
alphabet The name of a pronounciation alphabet.
RelativeChange A relative change expression, as defined in relative change.
PitchExpression A number followed by the string 'Hz'.
PitchContour A pitch contour expression, as defined in pitch contour.
NonNegativePercentage An unsigned number immediately followed by "%", as defined in Non-negative percentage.
VolumeExpression A number preceded by "+" or "-" and immediately followed by "dB", as defined in prosody Element.
The SSML Datatypes module - Implementations
Schema Language
ssml-datatypes.rng RelaxNG

The ssml-elements and ssml-ph-attribs modules depends on this module being activated.

Informative References

Supporting software

Refer to the Z39.98-AI community portal for information on available software tools.

Appendix 1: Listing of modules in the normative schema

The below list represents the modules at the time of version 1.0 of this feature.

The occurrence of the keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in documentation fields embedded in these modules are to be interpreted as described in RFC2119.