DAISY Accessibility Framework

From zedwiki
Jump to: navigation, search



In this report from the architecture subcommittee meeting at RFB&D in Princeton, New Jersey on February 4th and 5th 2008, we propose a new modular architecture for Zed Next as opposed to the current monolithic Zed spec. The main objectives of such an architecture are twofold:

  1. a modular specification is easier to maintain and extend than a monolithic specification: we identify simpler components which can be combined to form the full DAISY specification;
  2. we can identify the core accessibility framework within DAISY that can also be used in different contexts, helping bring DAISY concepts to the mainstream.


Some of the terms used throughout this documents:

  • Accessibility framework: the synchronization of equivalent content types, with navigation and semantics.
  • Fileset: the set of files necessary for the distribution of a single piece of content (e.g., in DAISY, the fileset for a book.) The accessibility framework relies on files of different type to represent all the information necessary for playback. This report acknowledges the necessity of a set of files for distributing content conforming to the proposed accessibility framework but makes no recommendation for the final specification of a fileset; for instance, notions of distribution or packaging are not addressed.
  • Module: A sub-part of a system or a specification. A module encapsulates a particular well-defined type of functionality, is cohesive and has low coupling to allow efficient reuse in different contexts.
  • Profile: a profile is a collection of modules and possible additional requirements (see below for more on profiles) that forms a logical whole: an entity that - as opposed to a module - can actually be used for a particular real-world purpose.
  • Extension: an extension is the addition of new or optional modules to a profile to add new features; see for instance the MathML in DAISY extension.
  • Reference implementation: an implementation of future specifications (namely, an authoring tool) probably based on the DAISY SDK from the Urakawa project should be provided in order to help acceptance and adoption of a new standard.
  • Discrete media vs. continuous media: this distinction also comes from SMIL. Continuous media, "such as stored audio or video files, [have] a measurable and well-understood duration." Discrete media, such as (static) images or text, have no intrinsic duration.

Accessibility Framework

Multimedia can help with accessibility by bringing together multiple channels providing the same information. DAISY uses multimedia to provide audio in addition (or in place of) text to add a supplementary information channel. In the future, we can imagine having additional channels, such as video, signing avatars, etc.

Having several concurrent channels providing the same information raises the issue of synchronization of these. DAISY uses SMIL for this purpose; but SMIL only concerns itself with a single, self-contained, "presentation", whereas DAISY books can be arbitrarily long and thus divided into smaller units. Moreover, the availability of both continuous and discrete media in DAISY books raises the issue of navigation inside the book. The problem is two-fold:

  1. Structure navigation is the navigation inside the structure of the book, where the user can easily move to different chapters, sections or pages, when available. This is usually achieved through a table of contents of the publication;
  2. Local navigation is the navigation between the smallest synchronization units of the publications, such as sentences, paragraphs or phrases (depending on the granularity chosen by the producers of the book.) As opposed to structure navigation, it is implicit.

We can also add time navigation which is the navigation inside the timeline of continuous elements or of the book itself, but it is more of a playback issue and thus not a concern here.

The accessibility framework presented here is a formalization and a generalization of the aspects of the DAISY standard directly relevant to the aforementioned concepts of content, synchronization and navigation. The framework does not try to deal with other aspects such as packaging, distribution, DRM, etc. One important aim of this framework is to be at the same time specific enough that it can be used to extend DAISY without requiring further work, but also that it can be adopted outside of the DAISY sphere to provide better accessibility to formats and kinds of document which are not directly relevant to DAISY.


Content channels

The accessibility framework is built upon the idea of content channels. A content channel consists of one or more media types. If a channel consists of multiple media (e.g., text plus audio), these are expressed as a SMIL presentation. Other examples include: DTBook document, XHTML document, DocBook document, PDF, MP3 audio, QuickTime video.

A content channel has structure of some kind, and structural points can be identified and characterized semantically.

Any content channel can have an equivalent, alternative representation that itself can be structured as a content channel (parallel content channel). This alternative channel could use different media (e.g., audio version of text), or it could use the same media (e.g., two audio channels in different languages). Because parallel content channels have equivalent structures, they can be synchronized at structural points.

With the accessibility framework, any piece of content in any media can be made accessible by identifying structural points for navigation, parallel content channels, and synchronizing these.

We can classify contents in several categories:

  • Audio: can be compressed or uncompressed, with different qualities (number of channels, bit rate, sample rate, bandwidth) using different formats and codes (MP3, RIFF/WAV, Speex, Ogg Vorbis, etc.) Audio as a media content has usually no associated visual display. Audio content can be indexed by time.
  • Video: similar to audio, but has an associated visual display with an intrinsic resolution. Some video containers allow several audio channels. Codecs and containers include Quicktime, MPEG4, etc.
  • Text: plain text, with different encodings (e.g. US-ASCII, UTF-8, etc.) Can be indexed by character or byte position.
  • Images and graphics: raster graphics have a fixed resolution and can be compressed or uncompressed (RAW, BMP, GIF, JPG, PNG...); vector graphics have no intrisic size and can be scaled by the user (see SVG below; also PDF or Flash.)
  • XML: structured content of an arbitrary nature (text with XHTML, DTBook or DocBook; mathematics with MathML; synchronized multimedia with SMIL; animations and vector graphics with SVG; forms with XForms; compound formats with WICD; etc.) XML formats do not always carry display information, so styling is often provided separately through XSLT and/or CSS.

Content Rendering

Because there exists such variety of media contents, and within each categories many codecs and formats, it is necessary to define rules for rendering known contents, and importantly, fallback mechanisms for rendering unknown contents. Alternatives can also be defined in a document (for instance, the MathML in DAISY specification requires to provide images for clients that cannot render MathML formulas); and stylesheets (CSS and XSLT) can be provided.


Use of SMIL

SMIL can be used to synchronize multiple channels. DAISY until then focused on the synchronization capabilities of SMIL to synchronize contents provided outside of the SMIL documents, but SMIL presentations can of course include their own contents. It would then become feasible to show several parts of a document at once, for instance an image could remain visible in a special region when text and audio describing it are rendered. The SMIL DAISY Profile is a natural candidate for use in the accessibility format.

Lightweight synchronization

In a lightweight player scenario (see below) supporting a complex XML grammar such as SMIL is a burden but SMIL concepts can still be used such as media elements with time indexing (begin, end, and duration attributes).


Navigation is the ability to discover and move to structural points within the content. Therefore, different types of content structures suggest different navigation schemes. Some content is hierarchical: chapters, sections, subsections, etc. Other content is purely flat: a series of scenes within a video of a film. Different types of structures thus suggest different representations within the accessibility framework's navigation module.

In order to effectively discover and use structural points, the end user needs to understand the nature of the different points. That is, the structural points must have meanings associated with them. For example, we may identify a hierarchy of structural points within the text of a book (possibly expressed using elements such as XHTML's h1 through h6), but these are of limited use to an end user without understanding that elements at the highest level are "units", "chapters" at the next level, etc.

Therefore, navigation within the DAISY Accessibility Framework will need to define not only the various kinds of content structures (e.g., hierarchal vs. flat), but how to define and associate semantics with those structures. This is actually the hard part here: we have to clearly define roles and the mapping mechanism. Something like CSS would work well but may be hard to implement; XSLT may be too much. One starting point could be a simple XML grammar with CSS-like selectors.


Under the DAISY Accessibility Framework model, specification modules would need only describe very general characteristics of handling content channels, synchronization, and navigation. Since the Framework is entirely content-agnostic, there would be no definitions of allowable file formats, codecs, etc. Instead, the Framework modules need only provide the information necessary to define how to bring multiple content channels together into a meaningful, navigable, and hence accessible presentation. Individual profiles would then provide the specific information on allowed content, expected behaviors, etc.

The following lists some possible modules in the various categories.

Content channel modules

  • Content module: How to identify content types.
  • Structure module: How to identify structural points
  • Synchronization module: SMIL
  • Extensibility module: How to deal with new content types

Navigation modules

  • Hierarchical navigation module: NCX NavMap capabilities
  • Flat navigation module: NCX NavList capabilities
  • Page navigation module: Pages are a special kind of flat structure
  • Semantics module: How to define and associate meaning with navigation structures

Modules outside of the Accessibility framework

  • Packaging and distribution: manifest, etc.; also, splitting a fileset over several media, distribution different filesets over the same medium, online distribution, etc.
  • Metadata: publication metadata; use of Dublin Core or other standard, specific DAISY metadata.
  • DRM: hopefully by the time the spec comes out DRM will be a thing of the past.


A profile is a set of content types, navigation particulars, and semantics. A profile must do the following:

  1. Identify goal, product type, etc. for this profile
  2. Identify content types supported
  3. Identify navigation features (i.e. navigation modules and roles)
  4. Identify special semantics (?)

Special purpose profiles

Having separate profiles for different purposes: leisure, education, LD, DAISY 2.02-like, etc.


Lightweight, middleweight, full profile; strict subsets. Full profile books should degrade gracefully to the lower-spec profiles by making lightweight profile data available. All players should play all profiles within their capabilities.

In the case of DAISY, there is at least a clear rationale for a lightweight profile intended for players with limited resources. A lightweight view of the book could be provided in parallel to the regular Daisy Structure. That lightweight view would not replace the regular view of the book but complement it. Some elements to consider are:

  • A way to avoid SMIL files which are not useful on a simple player except for local navigation that can be replaced by a fixed time-jump (parsing/synchronizing SMIL files may take too long);
  • Enforce a rule for one audio file for each hierarchical navigation unit;
  • Provide a simplified NCX to support only basic navigation (the NCX file may be too big too.)

Single/no profile

Single profile, that may or may not include all modules; or no profile at all (which is really the same thing.) A sensible recommendation at this stage would be to define a profile including a reasonably small (simple but not simplistic) number of modules, with some external modules for more full-fledged multimedia experience, better layout capabilities, etc.


It is important to address implementation problems early on, so that when the specification comes out, content and tools to author new content is available from day one. This concerns both the DAISY spec itself (for DAISY authoring and playback tools implementers) and the accessibility framework (to encourage outside acceptance.)

Guidelines for implementors

As part of the specification(s) or as a different documents. Clear and exhaustive guidelines for playback behavior are necessary, and support for authoring tools (e.g. sample requirements, scenarios) are definitely useful.

Reference Implementation

In addition to the aforementioned guidelines, a useable reference implementation should be developed alongside the specification. It can be used as a testbed during the specification process and as a simple tool to create sample content illustrating the new aspects of the spec and generate interest in content producers. The reference application should have integrated QA playback to also illustrate playback behavior. It will however have a limited scope to fit in the time window of the specification development, and also not alienate external tool providers.

Potential Benefits and Risks

Benefits for end users

Through profiles, we provide the ability for content to be focused on a particular use case and/or content type.

  • This allows for minimizing noise in the user experience;
  • This can also allow for less costly tools to be provided. A user agent or authoring tool may implement support only for one particular profile.

As long as we have a mechanism for graceful fallback, we can ensure that all content can be read on all compliant user agents, albeit with a lest-common-denominator feature set.

Benefits for specification maintainers

  • Through modularization and the encapsulation it infers, we get a specification that is easier to maintain and extend.
  • Through a genericized extension mechanism, extensions (such as MathML) will not require specification rework.

Benefits for other standards agencies

  • Through modularization and the encapsulation it infers, we get a specification that is easier to adopt in part (e.g. only adopt NCX navigation).

Benefits for non-DAISY content provision agencies

  • The DAISY Accessibility Framework can be adopted and user for content that is not explicitly endorsed by DAISY, this allowing for a "DAISY-like" accessibilization of arbitrary content. Applying the DAISY Accessibility Framework to PDF is one example of such a scenario.


  • There is a potential risk for community fragmentation and tool interoperability problems.


Reiterate main points and explain why this is the way to go. Make recommendations about the choice of modules, profiles, etc. Outline next steps.


This report was prepared by Markus Gylling, James Pritchett and Julien Quint with contributions from Ole Holst Andersen, Marisa DeMeglio, and Dominic Labbé.

Personal tools