ZedDist Video
From zedwiki
This is a landing page for "video-in-DAISY".
Background
- The DAISY-Next Distribution format (affectionately referred-to as "ZedDist" or ZDIST", and formally known as "Z39.86-2010 Part B: Distribution Framework") is currently in its inception phase. Please consult the main ZedDist page to obtain the most up-to-date information.
- The ZedDist manifesto (see ZedDist_Design_Goals) highlights the design goals for this particular part of the revision of the DAISY standard. One of the targeted goals is to include video into the DAISY standard. What "video-in-DAISY" entails to is to be defined in this present document.
- Note: there is no video-related topic in "Part A: Authoring and Interchange Framework", see http://www.daisy.org/z3986/2010/ for details.
- Specific requests for video were made during the DAISY-Next Requirements Gathering phase, see here, here, here and here. The results from this external feedback were synthesized in this taxonomy page.
Document Organization
The main goal of this wiki page is to aggregate the material produced by the "video-in-DAISY" working group:
- Inception notes.
- Specification proposals.
- Conference calls minutes.
- Face-to-face meetings reports.
Manifesto
Moved here: ZedDist_Video_Manifesto
Use-Cases
A condensed, user-friendly version of the full user requirements analysis is available here: ZedDist_RicherMultimedia
Working Group Participants
- Daniel Weck, DAISY Consortium, Software Developer (UK)
- Silvia Pfeiffer, Vquence CEO, Xiph (Ogg Vorbis, Theora) expert, W3C Media Fragments and TimedText participant (AUS)
- Sean Hayes, Microsoft Corporation, Media Accessibility Business Unit, W3C TimedText co-chair (UK)
- Bjorn Nyqvist & Patric Larsson, National Agency for Special Needs and Schools, "Linked" accessible book technology (Sweden)
- Wolfram Eberius, Freelance, developer of a video-enabled online DAISY player (Germany)
- Guillaume Olivrin, Meraka Institute (South-Africa)
- Geoff Freed, WGBH
- Douglas Blane, Open University - Digital Audio Project
- [REGRETS, but will contribute remotely] Tom Fiddian, Technology Development Manager at RNID (UK)
- WGBH ?
Objectives, Plan of Action
The suggested short-term plan is 2-fold:
- Functional requirements - During the DAISY-Next Requirements Gathering phase, external feedback revealed a few valid use-cases for "video-in-DAISY" but the whole subject remains pretty vague (see links in the above 'Background' section). The group must establish a formal functional scope, by detailing real-world use-cases in prose (i.e. not in technical terms). Determining these high-level design goals must be a concerted effort, as outlined in the above 'Manifesto'.
- Integration mockups - The group should produce technical proof-of-concepts in order to grasp the architectural issues related to the inclusion of motion pictures in the DAISY standard. This would provide members of all ZedDist groups with a basic foundation on which to further discuss cross-cutting concerns, such as time versus structure navigation. This work would be based on the envisioned distribution format, so there would be a continuous dependency on the outcome from other ZedDist working groups (e.g. DTBOOK vs XHTML markup, modular NCX, etc.). For that reason, technical considerations should be sufficiently abstracted from the real distribution format (i.e. files, markup, etc.): the goal would be to examine the relationship between video integration and other ZedDist topics, not to establish any kind of normative definition.
Relationship with SMIL
Historically, the SMIL 3.0 DAISY Profile was created to provide a SMIL-compliant (i.e. a subset of SMIL Modules) distribution format covering the feature-set of DAISY 3.0 / ANSI-NISO Z39.86-2005, and to demonstrate that extending DAISY with richer interactive multimedia functionality was feasible using off-the-shelf technologies.
With hindsight, this standardization effort was driven by the need to find the right balance between richness and specialization. In other words, it is clear that the W3C SMIL 3.0 Language Profile (i.e. the entire specification) provides sufficient multimedia functionality to address DAISY requirements, but at the same time it is a complex beast that developers and designers find difficult to comprehend. In that light, the goal of the DAISY Profile was to define a subset of SMIL that was complete enough to implement DAISY Digital Talking Books, whilst offering sufficient scope for multimedia extensions (i.e. video, interactivity, etc.).
DAISY should avoid becoming too complex. The overwhelming richness of SMIL has undoubtedly hindered developer enthusiasm and slowed-down market adoption. SMIL is a generic solution that lacks business focus and inevitably faces implementation costs that are hard to justify. By contrast, DAISY is designed to be 'productivized', i.e. derived into a rich ecosystem of tools and services.
The ZedDist group has yet to decide whether the SMIL 3.0 DAISY Profile should be used "as-is" as a building block for DAISY-Next. There seems to be an unspoken consensus that the Profile should be broken-down even further in order to simplify implementations, and that the SMIL Modules themselves are too large. To be discussed.
Architectural Integration of Video
In a nutshell: the DAISY standard is currently based on a conveniently simple (and somewhat restrictive) 3-tier composition model for multimedia publications, i.e. navigation (NCX), document (DTBOOK), audio synchronization (SMIL). If we take the standard as it is, there are essentially 2 avenues to consider for integrating the video media type:
- (1) Video as master stream - video can be placed in a DAISY presentation as the main synchronization stream, in the same way that audio is weaved / interspersed with text in the time domain (i.e. in the SMIL markup). With this approach, video is a simple "drop-in" replacement for the audio media type (i.e. use of "video" XML elements where "audio" tags normally go). In other words, "timed sections" of video (i.e. clip-begin, clip-end) can be synchronized with document "fragments" (i.e. structural text, images, MathML, etc.). The only difference would be the addition of a rendering surface for the video's visuals (i.e. rectangular area, or "region" in SMIL terms). It is worth noting that the SMIL 3.0 DAISY Profile fully supports this integration model, so technically-speaking, an off-the-self standard is already available. A concrete use-case for this is a video presentation synchronized with projection slides (i.e. the video recording of a public talk). Another potential application is 3 parallel streams: sign-language, human narration and text (which can be independently switched on/off).
- (2) Video as embedded media - video clips can be placed inside the document structure (e.g. DTBOOK) as individual elements. This "compound document" model means that the main text document is the host for individual time-aware "islands". This resembles HTML5's video element, or generally-speaking any webpage with a video area embedded in the flow of text/images. In this mode, it is possible to introduce captioning and audio descriptions at the level of the video element itself, in total isolation from the rest of the document. The problem with this approach is that most DAISY user-agents consider the SMIL markup definition as the timing and synchronization master: by adding a timed element inside the main text document, we introduce a conflict of interest, or at least a situation that needs to be disambiguated. We therefore need to clarify the responsibilities of each tier of the multimedia presentation and the ways by which content can globally or locally be navigated.
Online Connectivity
In this day and age, it is crucial for a publication format to provide hooks into the fabric of the web, in order to benefit from not only the powerful social aspect (e.g. collaborative bookmarks, notes, etc.) but also so that the distributed storage infrastructure can be put to use (as this would dramatically reduce the footprint of DAISY files).
Providing that permissions are sufficient, a DAISY multimedia publication should be able to allow references pointing to resources outside its "sandbox" document model, for example to stream video clips straight from YouTube (or any other online provider). Such modern video service includes APIs to access content fragments at precise time intervals as well as annotations of all kinds, and of course: subtitles.
Accessible Motion Pictures
Generally-speaking, the video media type combines 2 sensory modalities by synchronizing their corresponding tracks in the time domain:
- visual - animated pictures
- aural - audio track
The most commonly used methods to make the content of motion pictures accessible to people with sensory impairments are:
- Audio Descriptions
- Captions
- Sign Language
Audio Descriptions
The audio track in the original video may be totally absent, switched off, of silent in certain places. At the times when the default video lacks an aural signal, people who must rely on auditory feedback to access information (e.g. blind users) need an alternative stream called audio descriptions in order to be aware of what is happening in the visual stream of information. ADs are usually recorded human narration, precisely weaved into ("open", cannot be switched off) or overlaid onto ("closed", can be turned on/off) the video's audio stream to cover the blank parts.
In practice, providing a text transcript of the audio descriptions is a good thing, because it enables the use of TTS (Text To Speech) or of hardware refreshable Braille displays, offering further accessibility options to the user.
Audio descriptions are not to be mistaken with "dubbing", which is a voice overlay in a different locale than the original spoken language in the video (i.e. a translation). By contrast, ADs do not overlap the original spoken soundtrack.
References:
- http://www.rnib.org.uk/xpedio/groups/public/documents/publicwebsite/public_audiodescription.hcsp
- http://www.ofcom.org.uk/tv/ifi/guidance/tv_access_serv/archive/audio_description_stnds
- http://adinternational.org/ADIad.html
- http://joeclark.org/ad-principles.html
- http://ncam.wgbh.org/richmedia/tutorials/audiodesc.html
Captions
People who do not have access to the audio track of a video (e.g. deaf users) need a an alternative representation of the missing information, usually in the form of captions (i.e. displayed text) or using sign-language. Captions are ruled by a set of conventions regarding colors, placement, etc. Caption text aims at representing all kinds of aural data (i.e. not just human language, but also background noises, etc.). Again, this additional stream of information can either be "closed" or "open", thus the common term CC (Closed Captions).
Captions are not to be mistaken with subtitles, which are text transcriptions of human narration in a specific language that differs from the original spoken language of the video. Subtitles provide foreign language speakers with a translation, usually displayed onscreen in a fixed location, typically in the bottom center of the video frame using a single visual style.
References:
- http://captioningsucks.com
- http://screenfont.ca/learn
- http://www.webaim.org/techniques/captions/
- http://main.wgbh.org/wgbh/pages/mag/services/captioning/faq
- http://alastairc.ac/2006/09/captions-vs-subtitles
- http://www.captioncentral.com
- http://en.wikipedia.org/wiki/Subtitle_(captioning)
- http://www.ofcom.org.uk/tv/ifi/guidance/tv_access_serv/archive/subtitling_stnds
- http://joeclark.org/appearances/AEA/2007
- http://fawny.org/readingthetube.html#illos
- http://www.dcmp.org/ciy
- http://diveintomark.org/archives/2009/01/07/give-part-4-captioning
Sign Language
See here: ZedDist_Video_SignLanguage
References:
- http://www.sit.se/direkt/videoindaisy and Suggested_Video_Functions_in_Daisy
- http://www.ethnologue.com/show_family.asp?subid=23-16
- http://joeclark.org/access/captioning/bpoc/SL.html
- http://www.ofcom.org.uk/tv/ifi/guidance/tv_access_serv/archive/sl_dtt/
- http://signwriting.org/
- http://www.creaturediscomforts.org/watch
- http://www.ehow.com/VideoSearch.aspx?s=sign+language&Options=4
- http://www.cervantesvirtual.com
- http://www.lsfdico-injsmetz.fr
- http://www.ox.ac.uk/media/science_blog/090731.html
Interesting article about the Deaf culture:
Lots of interesting links and comments here:
Cross-Cutting Concerns
- Codecs (related to ZedDist discussion on audio codecs)
- Navigation (time-URLs, surface area/map, media fragments ?)
- Interactivity (presentation state, video playback current time, etc.)
- Polyglotism
- Feature switch based on user profile/preferences (adaptability)
- Single file container (compress or store video ?)
W3C Timed-Text DFXP
There are many captioning formats out there. There is however an emerging standard called Timed-Text which aims at unifying these various formats under a well-defined XML markup grammar. Timed-Text DFXP (Distribution Format Exchange Profile) is a specification published by the W3C (World Wide Web Consortium) that provides a feature-set designed to cover the needs of most existing caption formats (e.g. Microsoft SAMI, Apple QuickTime-Text, RealNetworks RealText, 3GPP, etc.).
TimedText is self-contained, meaning that it is in control of its own metadata, time and space domains (animations, transitions, layout, styling, etc.). It is designed to be an independent content channel that contains textual information of varying richness, usually to encode captions or subtitles (see the 'Accessible Motion Pictures' section above for more information). TimedText content is playable as a stand-alone media format/type, but at playback time it is more likely to be hosted or constrained by a video stream.
W3C SMIL-Text
There may be some confusion with another technology called SMIL-Text. The W3C SYMM Working Group introduced it in the SMIL 3.0 specification not as a replacement for Timed-Text, but as a mean to provide basic support for rich formatted text (SMIL-Text a very lightweight implementation compared to Timed-Text).
As one would expect, SMIL-Text integrates very well with the rest of SMIL (such as the timing model, layout, transition, state, etc.), which makes it an alternative candidate for authoring subtitles or captions. By contrast, Timed-Text is more or less seen as a "black box" from a host SMIL presentation, which on one hand limits the interaction between the 2 formats but on the other hand guarantees the separation of concerns between the captions track and the parallel video stream.
Online Video Tools
- http://captiontube.appspot.com
- http://www.jeroenwijering.com/?item=JW_FLV_Media_Player / http://www.jeroenwijering.com/?item=Making_Video_Accessible
- http://corp.kaltura.com/static/developers
- http://www.tubecaption.com
- http://captioning.stanford.edu
- http://icant.co.uk/sandbox/youtube-captioning.html
- http://dotsub.com
- http://www.overstream.net
- http://www.subtitle-horse.com
- http://www.youtube.com/t/annotations_about
- http://icant.co.uk/easy-youtube
- http://www.nihilogic.dk/labs/youtubeannotations
- http://video.google.com/support/bin/answer.py?hl=en&answer=26577 / http://video.google.com/support/bin/answer.py?answer=27738
- http://www.veotag.com
- http://www.parleys.com/display/PARLEYS/Home
- http://www.cuts.com/faq/making-a-riff
- http://jumpcut.com
- http://eyespot.com
- http://jaycut.com
- http://www.moviemasher.com
Related Projects
SiGML (Signing Gesture Markup Language) and eSign
- http://www.sign-lang.uni-hamburg.de/esign/annualreport2004/sigml.html
- http://www.sign-lang.uni-hamburg.de/esign/demo.html
Swedish sign-language books
German video-enabled online DAISY player
Meraka, NAP portal, Thibologa / Guillaume Olivrin
- http://www.opensourcereleasefeed.com/interview/show/guillaume-olivrin-on-open-source-nap-and-sign-language-in-information-systems
- http://www.napsa.org.za/portal/public/content/view/viewServicesHome.jsf
- http://www.thibologa.co.za/
- http://www.meraka.org.za/~golivrin/
HTML 5 video/audio accessibility
- http://wiki.whatwg.org/wiki/Video_accessibility
- http://esw.w3.org/topic/HTML/MultimediaAccessibilty
- http://lists.w3.org/Archives/Public/public-html/2008Sep/att-0118/html5-media-accedssibility.html
Mozilla Foundation, Xiph (Ogg-Theora), Wikimedia
The Mozilla Foundation recently awarded a grant to Benetech for writing a DAISY reader Firefox plugin. At the same time, the foundation will be supporting Silvia Pfeiffer to conduct an accessibility study for video in Firefox. Some synergy may be possible between the two, which would offer an open and online experimentation platform for video-in-DAISY:
- http://blog.wikimedia.org/2009/01/26/mozilla-and-wikimedia-join-forces-to-support-open-video
- http://blog.mozilla.com/blog/2009/01/26/in-support-of-open-video
- http://blog.gingertech.net/2008/09/23/video-accessibility-for-firefox/
- https://wiki.mozilla.org/Accessibility/Video_Accessibility
- https://wiki.mozilla.org/Accessibility/Captioning_Work_Plan
- https://wiki.mozilla.org/WeeklyUpdates/2008-09-22#Foundation_Updates
- http://lists.xiph.org/mailman/listinfo/accessibility
- http://www.w3.org/2007/08/video/report.html
Open Video Conference
Open Video Alliance
- http://openvideoalliance.org/wiki/index.php?title=List_of_Open_Source_Video_Software
- http://openimages.eu/blog/2009/01/07/open-source-video-software-an-inventory
- http://www.openmedianow.org
WGBH / NCAM (National Center for Accessible Media)
Major actors in the broadcast captioning, they also provide audio description services. They offer a free captioning and description software called MAGpie. They have also lead a study called "Beyond the Text", which looked into richer "book" content:
- http://ncam.wgbh.org/projects
- http://ncam.wgbh.org/ebooks
- http://ncam.wgbh.org/ebooks/prototypes.html
Emergency Preparedness
Disaster Preparedness has a strong case for the use of accessible motion pictures, and DAISY could be a fantastic implementation for accessible multimedia manuals. Deaf-Link provides a multi-modal and multi-device implementation of a communication infrastructure for Emergency Preparedness, which meets the needs of blind/visually-impaired people too. It is called Accessible Hazard Alert System and uses American Sign Language amongst other techniques. The video in the top-left corner is very interesting:
Participatory Culture Foundation
There is an interesting Wiki project maintained by the Participatory Culture Foundation, that aims at fostering community efforts and contributions in the field of online video authoring and publishing. There is not much emphasis on accessibility yet, unfortunately:
The Rosa Lee Show
The "Rosa Lee Show" offers entertaining multimedia shows without any spoken words and sometimes with music. The emphasis is on sign language (ASL) and text captions, although the scene background is often projected with abstract visual displays during live shows.
WebMultimediale (Italian)
Various other things
