ZedDist Video UseCases Requirements

From zedwiki

Jump to: navigation, search

-- This page is currently in draft stage --

  • The parent for this wiki document is the 'landing page' of the Video-in-DAISY activity: ZedDist_Video
  • This document will eventually be merged into ZedDist_Design_Goals, which is the authoritative reference of all DAISY-next use-cases for the upcoming specification work. The information in this document is derived from other wiki pages such as Suggested_Video_Functions_in_Daisy.

Contents

Foreword: the DAISY Legacy vs the Future

This short section explains why video-in-DAISY is important, and what the needs are (in a nutshell).

DAISY Now

  • DAISY stands for Digital Accessible Information SYstem. Digital Talking Books (DTB) in DAISY format provide synchronized audio playback with text, images, and special content like mathematical formulas. DAISY audio books are therefore primarily designed for blind and visually-impaired persons.
  • In traditional paper-based publications, readers are able to navigate the text by page number, chapter, contents pages and indexes - these navigational features are also available in DAISY DTBs - providing the user with complete control over how they access the content.
  • The reading flow of DAISY books is customizable to match the reading speed or granularity requirements of a wide range of users. As a result, many of the needs of persons with dyslexia or print-related disabilities are addressed by DAISY Digital Talking Books.

DAISY Next

  • Representatives of communities unrelated to visual impairments or print disabilities have been showing great interest in DAISY:
    • Deaf and hard-of-hearing persons rely on structured, multimodal information to address their accessibility needs (captions, sign language video, etc.).
    • People involved in disaster preparedness / emergency training often exhibit a broad variety of sensory and cognitive impairments, so training material authored in DAISY format is a powerful medium to quickly access critical, life-saving information (atmospheric sounds, descriptive text and images, video guidance through physical spaces, etc.)
    • Persons with autism learn better and faster when they use interactive knowledge tools based on simple navigation and multi-sensory feedback.
  • Due to the lack of open-standard to address these diverse use-cases, communities have been building their own proprietary systems (some have even adapted DAISY to meet their needs). As a result, one of the most-requested "new" features for the next version of the DAISY standard is video.
  • Extending DAISY publications into this new realm will lead to innovative tools and services, in a growing publishing industry that is rapidly adopting new multimedia capabilities. Within this practically uncharted territory, DAISY-next must set a high standard in terms of accessibility, ease-of-use and structured authoring practices.

Definition of Terms

Here is a reminder of the 3 major parts of a DAISY Publication, not so much from a technical perspective but more as a conceptual break-down of the system. We intentionally focus on full-text full-audio Digital Talking Books, audio-only books simply being a reduction of the following set:

  • The Document part: the XHTML or DTBOOK content (or any other suitable markup grammar). Contains semantically-structured text, images (and their textual captions), MathML, etc. The key characteristic of this part is that it does not contain timed media (such as audio and video), and it can be displayed without requiring the other parts of the DAISY book.
  • The Navigation part: the NCC or NCX. This part contains the structural backbone of the "Document", in the form of the table of contents (e.g. hierarchical chapter headings). It also indexes navigation lists, such as page numbers, notes, etc. The navigation points do not refer to the "Document" directly, instead they address specific landing targets in the time domain (see below).
  • The Synchronization part: the SMIL file. This part adds a temporal dimension to the "Document", giving DAISY its true multimedia nature. This defines the mapping (in the time domain) between parallel streams of information ("alternative media") that represent (semantically-speaking) the same content. For example in the case of DTBs, a sequence of audio clips is associated with the flow of text in the document, usually in the natural document order (from "top to bottom" / "start to end").

For the sake of clarity, here are some commonly-used terms and their definitions:

  • Publication: the static DAISY material in its distribution form (e.g. the set of files that makes a complete DAISY Digital Talking Book).
  • Audio Book: a reduction of the broader concept of multimedia Publication, to mimic and enhance the functionality of traditional printed material, whilst providing access to an equivalent audible stream of information.
  • Edition: the subset of an original Publication, packaged independently as a distribution unit (i.e. not part of the original Publication, but a separate fileset).
  • View: a single Publication may contain several "renderings" of the same content, so that reading systems can pick the most suitable one depending on their functional requirements.
  • Presentation: the state of a DAISY Publication being consumed (e.g. the playback of a DAISY book and what becomes a communication interface with the user).
  • User-Agent: the playback tool used to transform the Publication into a useful rendering, such as the Presentation. This is the interface between the user and the content that enables the reading experience, responsible to expose functionality, to allow the user to override default or authored behaviours, etc. Technically-speaking, a User-Agent can also be a processing engine such as a conversion or production tool that does not offer any Presentation behaviors, but in the context of this discussion we can think of it as a playback interface.
  • Authored Intent, User-Agent Override and Semantic Roles: it is a common (and recommended) practice for authors to create material with separated content semantics and presentational styles. For example, a heading should be marked as such (e.g. "h1" XML element) and should not just be represented as large bold text (the style is in fact often, if not always, external to the Document itself). Such separation of concerns gives the User-Agent a chance to override default behaviors whilst preserving the essence/meaning of the content (e.g. particular visual needs may need to be met, using high-contrast color scheme, large print styles, etc. which are not defined in the original Publication). The capacity for the user to override Authored Intent is a strong accessibility requirement. Another example is layout: the author may position a video region in one particular spot, but some users may find it visually easier if it was located elsewhere. By marking this video region with additional Semantic Roles (e.g. "sign-language"), the User-Agent is better equipped to expose functionality in a way that makes sense to the user (e.g. users may decide to turn-off sign-language channels if they do not understand signing). High levels of configurability are best achieved when the authored content includes sufficient semantic information.
  • Media Asset: the source media objects used in a Publication (e.g. video and audio files). A Presentation is essentially a composition of these Media Assets in the time domain (synchronization), and within a visual rendering context (layout, positioning).
  • Channel: a stream of information of a given type (e.g. video or text or images or audio, etc.) that serves a specific purpose in the Presentation and that can be turned on/off depending on the user's reading needs. For example, the Document of a DTB may be seen as the main "text channel". Conversely, its synchronized narration would be seen the "audio channel". This concept is independent of the physical manifestation of Media Assets, it is more an abstract term to refer to how the publication is conceptually divided into several alternative content types that can be rendered in parallel (multimodal information).
  • Primary vs Secondary: a Publication usually contains a single Channel of Primary information, in the sense that it represents the original content from which alternative types of multimedia Channels are derived (to ensure content accessibility). In the case of DTBs, written text is often the initial material upon which the audio narration is based. The multimodal streams of information designed as Secondary Channels can normally be switched on/off depending on user's needs. Note that this is not a mean to distinguish mandatory versus optional content, but merely a way to mark content streams depending on their origins and purpose. A synonym of Primary is Master.
  • Track: some Media Assets may embed several tracks, which in turn may be selectable individually at the user-agent level. Complex 5.1 audio, for example, may contain music on one channel, and voices in another. The ability to pick the desired tracks guarantees that users with special reading needs can focus on one particular portion of the content.

The Relationship Between DAISY and the Web

  • The goal of DAISY is to offer a highly-structured packaged publication format based upon navigable synchronized multimodal streams of information. By contrast, the web is an hyperlinked mesh of static documents (i.e. with no timed information) , for which timed media assets are scattered across as small individual islands of information that bare no relationship with one another ("black boxes").
  • Whereas accessibility for the web is implemented as an extra layer sitting on top of a loose and flexible document model, DAISY's accessibility is a byproduct of specialized information design: document structure and time-based synchronization are first-class citizens. DAISY has been designed as a multimedia format from the ground up, with accessibility requirements built into its core.
  • It is tempting to envision video frames visually-embedded (i.e. displayed) in the flow of the "Document", just like webpages. However this is probably out-of-scope. The rationale is: why reinvent the wheel ? This use-case is already addressed by the web browsing paradigm, using implementation technologies that are becoming increasingly accessible (captions and audio descriptions superimposed in the video stream, for example).
  • The functionality that video-enabled DAISY-next can offer is based on a different approach to information design, driven by requirements in the publishing industry that sets it apart from the web. The web may of course be used as a method to encode DAISY publications for distribution purposes (i.e. at the end of the production chain), but the master source of information encoded in the time-based DAISY format contains richer time and navigation semantics. By analogy, this is just how ePub currently works: it a specialized e-book format that fulfills the needs of the publishing industry (e.g. single container, compressed, navigable using books semantics, with rich built-in metadata for archiving / searching, etc.). But on the other hand, ePub books can be rendered with ease using traditional web technologies, therefore enabling the web-browser centric experience as well as the full hardware or software -based reading mode.

Overview of Use-Cases

This is an overview for each major type of multimedia publications envisioned for "video-in-DAISY". There may be many variations based upon this baseline proposal, but the individual requirements extracted from these use-cases should enable most scenarios.

Note: for each use-case, it is assumed that a top-level navigation structure may be included (i.e. table of content), to allow users to navigate scenes, chapters, sections, etc.

  1. Video is master: the primary channel of information is a video clip (non-specific content, like a movie, documentary, interview, etc.), which length generally corresponds to the total duration of the publication. A text channel provides time-aligned captions (to represent dialogue, noises, etc.), which can be displayed in the traditional way (bottom of the video or accurate overlay positioning within scenes), or as transcripts (e.g. separate vertical scrolling pane with highlight of the currently active caption). A channel of synchronized audio descriptions fills the silent gaps in the video track with audible descriptions. Alternate text is also provided for these descriptions. The video may need to pause for a moment, in order to give the user enough time to consume the extra descriptions. This kind of accessible motion pictures addresses the needs of both deaf/hard-of-hearing and blind/visually-impaired users.
  2. Sign-Language / Lip-Reading is master: the primary channel of information is a video clip dedicated to displaying the full sign-language or lip-reading rendition of a story / technical description / etc. The text channel is synchronized with the signing/lip-speaking, but in the case of signing, it may not flow in the traditional DAISY audio-book way: instead, the order of the reading flow may make successive text highlights occur in a non-contiguous way, sometimes selecting more than one piece of text at a time. Users may choose to display one or the other, but usually not both sign-language and lip-reading at the same time (to avoid unnecessary visual overload). This kind of material addresses the needs of deaf / hard-of-hearing users who wish to have access to synchronized text as part of their sign-language or lip-reading experience.
  3. Video is secondary: the primary channel of information is a text document (just like with full-text full-audio Digital Talking Books). In addition to a potential audio narration of the text, short video clips are scattered throughout the text document at specific insertion points to temporarily illustrate facts in a visual manner. Video regions are not necessarily displayed "inline", they may be popup windows triggered when the user reaches the video insertion point, or displayed in a dedicated fixed region. To fully enable accessibility, each video clip is produced using the technique explained in "Video is master" (text captions and audio descriptions). This kind of multimedia material primarily (but not exclusively) addresses the need of publications produced in the context of disaster preparedness / emergency training, where the combination of text, images and motion pictures enables effective communication with users of varied physical and cognitive abilities.

Requirements

See here: [1]

Personal tools