Media Overlays Playback Requirements
Version: working Draft
Release date: To be decided.
Publications supporting Media Overlays provide universal accessibility for people with disabilities. To provide the optimum reading experience to users, the reading systems need to implement correct playback behaviour. The following guidelines explain the expected playback behaviour for Media Overlays.
The main objective of the document is to guide the developers of "accessible reading systems", who have implemented or intend to implement EPUB 3.x Media Overlays functionality.
Note: The prime focus of the guidelines is "playback" behaviour for Media Overlays. Therefore, other details like visual rendering of navigation view, scrolling views etc. are not covered in the document.
The guidelines are based on EPUB 3.1 specifications, available at
However the guidelines are also valid for EPUB 3.0.1 and EPUB 3.0 specifications.
- Media Overlay: In the EPUB 3.x specification the term “Media Overlay” defines the synchronization between the text of the publication and an audio recording. It is implemented by using Synchronized Media Integration Language (SMIL).
- Reading System: The EPUB 3.x hardware or software that delivers the reading experience.
- Playback: When a reading system is rendering the audio of a media overlay, the time-based rendering is being played and the primary user interface is assumed to be the audio.
In cases where suitable synthetic speech is not available to the reading system to present the Media Overlays publications in a non-visual manner, the audio content of the Media Overlays publication and its presentation to the user must, as much as possible, provide access equivalent to that available to visual users of the same content.
In other words, The Media Overlays playback behavior should be capable of functioning as a replacement for the text to speech output provided by reading systems or assistive technology. This is important for the following reasons:
- People with cognitive disabilities and hearing disabilities find it difficult to perceive synthetic voices.
- More than 80% of print disabled live in developing countries, where quality text to speech engines are not available for most of the languages.
- Some languages do not have any written script, which eliminates the use of text to speech engines.
- the text of the material may not exist in written form (it is a recording of a presentation which does not have a transcript). For such presentations, navigation markup of the audio can make the content much more useful to its audience than a single audio file.
- Some reading systems do not incorporate a text-to-speech component, and use prerecorded audio prompts for providing non-visual access to the user interface.
1. Indication of availability of Media Overlays
The reading system should indicate the availability of Media Overlays as soon as the Media overlays book is loaded.
2. Table of contents
The table of contents consists of the list of heading labels in the Navigation Document. Each heading label has hyperlink pointing to the respective heading in the Content Document. The Navigation Document should have Media Overlays for mapping each heading to the document with the corresponding audio clip. The syntax is similar to that of the Media Overlays used for Content Documents, but the reading system should provide special playback behaviour for table of contents, explained as follows:
- The playback should auto start as soon as a heading label receives keyboard or touch cursor focus.
- The playback should stop by itself after playing the audio corresponding to the selected heading label.
- When a heading label is being played, and the user moves keyboard or touch cursor focus to another heading label, then the current playback should stop immediately and the newly focused heading label should be played.
3. Content Documents
3.1. Reading system must provide the commands for the following operations:
- Playback command to continuously play the audio of all the Content Documents in the spine, till the end of the publication. If Navigation Document is included in the spine then continuous playback command should play the Navigation Document also. This is similar to reading the table of contents in a publication.
- Navigation command to move playback to next phrase and previous phrase (next and previous audio clip).
- Navigation command to move playback to next section and previous section.
- Navigation command to move playback to next page-marker and previous page-marker. (The page markers are the static page marks that are provided by the publisher.)
3.2. Synchronization of text with the corresponding audio
- The reading system should highlight the text chunks which are synchronized with the playing audio.
- The highlight should be consistently synchronized with audio playback throughout the publication. When continuous playback moves to the next Content Document, the highlight should also move seamlessly to the newly focused Content Document.
- The highlighted text should be always visible on the screen.
- If the user gives a navigation command to move to a destination, then the audio playback along with text highlight should begin from the destination.
3.3. Remember last reading position
The reading system should allow users to pause playback and audio rendering should resume from the same pause position when play command is given again. The pause position should also be retained for the book; when the book is closed and opened again, i.e. bookmark of last reading position.
3.4. Skippable structures
If the user has switched off playback of skippable structures like sidebars, footnotes, pages etc. in reading system preferences, the reading system should not play the skippable structures during continuous playback.
3.5. Escapable structures
The reading system should provide a command to jump past the playback of the escapable structures. When a user encounters escapable structures such as tables, lists and sidebars during playback, it should be possible to move directly to the next item following the escapable structure, e.g. the end of the table.