Daisy 3 Production Tool Requirements

Version: 2004-10-12

Status of document: version 1.0, following DC membership review and comment period during september 2004.

Collapsed Table of Contents

Expanded Table of Contents

Requirements grouped by Activities

DTB Creation

Ability to create a simple source file (audio NCX fileset) from within the production tool for the creation of simple Audio/NCX books. [rejected]

Initial Generation

Import existing DTBOOK source document (full or partial text)
Upon DTBOOK source document import, make statements on how the xml grammar should be represented in the smil presentation and in the NCX
import ToC from DTBOOK source document (creating a audioNCX fileset)
import existing timebased data (audio); this process automatically creating structure. Option to have each imported audio file become a heading. Option to have audio file name become heading item text.
import full or partial existing Daisy 3 DTB into a blank project
start blank project (no preexisting data, but valid)
import bibliographic metadata from external xml document
import bibliographic metadata from dtbook document
specify a UID of a DTB
randomly generate a UID for a DTB
Do DTBOOK markup on raw text files [rejected]
Text import via external converter that takes publisher-provided format or OCR output, creating a text-based DTB [rejected]
automatically import metadata from the source document into the project upon project generation. [rejected]

New Content Creation

Live Audio Recording

record from live audio input
easily modify input levels during narration
record directly to wav and mp3 formats
fundamental actions/behavior available while recording: stop, record, playback, navigation
retake modes while recording: insert, overwrite, replace user selected portion (between punch in and mark, scope selection)
exposure of clip errors/warnings during recording
Total recorded time in current section, subsections included .
define a segment as being skippable or escapable during recording [applies primarily to audioNCX DTBs]
force a new SMIL time container during recording
force a new SMIL audio element (phrase) during recording
Record in double speed using one, two or four simultaneous inputs [rejected]
Transport functions suited to narrator editing: momentary cuing with sound/playback function
Calibration for the background noise level on the audio input [rejected]
Prevention of "empty" phrase creation at start of title/section.
Possibility to use different audio quality settings (sample rate/channels/bit rate) within a project, even within one section.
Not only detect audio clipping, but also detect and warn of levels that are consistently too low.
warning or info display if in overwrite mode in a section that already contains audio
ability to set automatic phrase detection during live recording
report providing a list of all clip errors/warnings generated the tool, and ability to go directly to each from the report

Manual Import of Media into existing DTB project

Import of Audio

selection of audio files to import: point at audio files using file system browser
selection of audio files to import: point to part of external DTB to import audio (and nothing else) from
import one or several audio files simultaneously
prelisten to audio files before import
when several audiofiles to be imported; define/alter sequence before import
select position where in the structure to insert the imported audio
Possibility to use different audio quality settings (sample rate/channels/bit rate) within a project, even within one section.
all codecs installed on the OS platform should be supported for import of audio.
Ability to import an audio file that spans multiple sections/headings.
the ability to import more than one WAV file at a time, with the ability to preview and order the files prior to actioning. [rejected]
The importing of exiting timecoded audio [rejected]

Import/add of Text

text addition via: type/cut+paste text to be associated with selected audio; select new element properties: element name, attributes
text addition via: point to element in external Dtbook document; have the element and its children imported
import multiple DTBook source documents (full or partial text) to create a single valid DAISY 3 document. [rejected]
Ability to import text as a source file, and have automatic markup based on simple punctuation within the text.

Import of Images and Video

import video and images into defined range of SMIL presentation
import images into NCX
preview image and video before import
Preview, import several files at once, after import resize/crop, etc. like more elaborated description of importing audio. [rejected]

Automatic generation of Audio

select a portion of text (by range, dtbook element type, or for all empty sections within a certain range) and have an external TTS engine generate corresponding audio with synchronization

Automatic generation of Text

select a portion of audio (by range) and have STT generate corresponding text [rejected]

DTB Editing

editing actions should be atomic [pending]
delete selected portion or portions of the DTB; prompt with confirmation of delete before the deletion occurs; all related files updated upon deletion of selection portions; if invalidity would result from deletion, a warning and prompt should occur [rejected]
Ability to modify the source document from within the production tool. [rejected]

Modification of Media Content

import (of all spec supported mediatypes) possible at all times during production process
system to flag SMIL structure as having changed and provide a (wizard) utility to distribute the audio over the new SMIL structure
Eliminate the need to have an audio event follow a text event, such as an empty page. [rejected]

Modification of Text

modify text nodes of dtbook document
when text content is modified in dtbook, if corresponding element is within NCX: user prompt: want to edit corresponding NCX text
modify text nodes of NCX document
manually add/edit bibliographic metadata in OPF and DTBOOK

Modification of Audio

support native editing of wav and mp3
physically remove portions of audio not referenced from SMIL, NCX, or RESOURCE
Select audio for insertion in NCX label
make audiofiles sequential in relation to SMIL presentation
mark portion of audio (not necessarily file) for action performed by external wave editor
external editor to display only part of wave that is in smil presentation
Perform actions such as noise reduction, hiss removal, declipping, normalization on selected portions of audio [rejected]
Perform beep tone and section (pause other than phrase) detection [rejected]
Perform transcoding of audio files of DTB; for whole DTB or selected portion thereof. [rejected]
Delete audio from NCX item, the audio of child-items inclusive. In current tools this can only be done per Ncc item
Editing of audio in mp3 will not result in (additional) loss of audio quality. [rejected]
Possibility to join more than two sequential audio events, even if they are in different audio files.
Possibility to insert/replace a portion of audio with new audio of any length and make corrections efficiently while in insert/replace mode. [rejected]
audio file naming options at build: scramble, hexadecimal, numerically and with meaningful (per NCX heading) names. [rejected]

Modification of Images and Video

Modification of Structure

import full or partial external DTB into active DTB: option to insert or replace
create a range of pages and insert at selected position in presentation
for any structural change made; have automatical instant grammatical check of validity of the change - and meaningful prompting
structural changes in NCX, DTBOOK should result in corresponding changes to affected documents of fileset
Ability to "clean" a Resource file so that only resources that are part of the current DTB remain.
When placing page numbers, they should increment automatically. When a page event is moved or deleted, the number order should refresh to remain in sequence.

Modification of DTBook Structure

modify structure of dtbook document (add element, remove element, element reordering, change element name, change attributes) - with corresponding changes in SMIL and NCX; when reorder: optional making of consequential changes in Spine, smil and ncx
modify granularity of dtbook document - with corresponding changes in SMIL
apply resources to dtbook elements

Modification of SMIL Structure

apply, redo, and remove phrase detection on selected portion of DTB
modify phrase detection parameters manually
have analysis of audio of a selected portion of DTB suggest/change phrase detection parameters automatically
manually adjust smil audio element clip-begin and clip-end time values
manual cut, copy, paste and delete of selected audio (a phrase, a range)
reordering, removing, adding of time containers
apply escapability and skippability to seleced timecontainer(s)
change attributes of escapable and skippable structures on a one by one basis
apply resources for escapable and skippable structures
Create links in the SMIL which reference other parts of the book.
Insert a pause of specified or unspecified length, associated with a SMIL element.

Modification of NCX Structure

manual edit of clipbegin and clipend of NCX audio
edit textnodes of NCX text
create and delete sections of navMap
merge (two or more) adjacent sections of navMap
split adjacent sections of navMap
reordering and releveling av navPoints, one or several at once;optional making of consequential changes in DTBOOK; when reorder: optional making of consequential changes in Spine, smil and dtbook
create navTargets dynamically during recording
remove navlists and navTargets, one or several at once
add new navlist by semantic type
add navTargets by semantic type
reorder navTargets in a navList, one or several at once
manually identify navTargets and create the list that they go in
apply resources to NCX elements, singular and grouped
Possibility to copy-paste parts of the ncx structure to avoid manually duplicating large part of ncx structure.

DTB Navigation

DTB Navigation using NCX

navigate via navLists and navMap of NCX [global navigation]
find NCX audio that is silence only

DTB Navigation using DTBook

navigate via DTBook [local navigation]
navigate via DTBOOK element with particular name

DTB Navigation using SMIL Structure

navigate via SMIL time containers (previous/next)
navigate by SMIL audio element (previous/next 'phrase')
navigate by SMIL text element (previous/next)
navigate by skippable/escapable structure by type (previous/next)
navigate by time containers with text without synchronized audio (previous/next)

DTB Navigation using SMIL Audio

navigate by silent portion of audio (previous/next)
navigate by audio clip overload error (previous/next)
auto-mark the stop/start points within a wave file in each NCX, and ability to go directly to selected marks [pending]
navigate by recording stop and start point within audio files

DTB Navigation using Bookmark DTD

[using bookmark DTD] navigate by bookmark and hilite (next/previous)
[using bookmark DTD] navigate by bookmarks that contain certain value on label attribute (next/previous)
[using bookmark DTD] navigate by bookmarks and hilite that contain certain substring of note text (next/previous)
[using bookmark DTD] find lastMark

Project Management and Administration

Export and import of all application settings using external open format
all program settings must be exportable to permit importing into other instances of the program. [rejected]

User Management

support for user profiles
administrator and narrator user profile types
narrator user type has default restriction on performing actions that impacts project integrity
individual user profiles (that implement user types) should store: recording settings, GUI configuration, recent project list, last position within DTB (using lastMark)
Possibility to escape/not use the logon screen; a default user profile will be used.
ability to set user rights on all functions and operations [pending]
All user settings and program preferences should be saved so that they carry-over to each new project.These should be saved as defaults. [rejected]

Content Management

assign passages to specific narrators; prohibit other narrators to work on other sections
User Log Off and Log On available without restart of application
track files that have been changed by external processes evoked from the system, and provide meaningful prompts
explicit save project feature, causing conformance creation and validity checks
recall saved state upon reopening; last position (lastmark) using bookmark DTD , recording settings
support all aspects of bookmark DTD (bookmarks, text/audio notes, highlights etc)
import bookmarks
merge bookmark files
modify and save recording settings
configurable recent projects list
system to provide information of location of all files associated with project, including restore points
Ability to explicitly trigger a full validation of DTB
built in quality assurance (QA) player [rejected]
mark sections of DTB as finished and have a clear and accessible display of this property
automatic save upon exiting program
ability to annotate bookmarks [rejected]
User (producer) marks, placed during recording (on the fly) or in editing mode. [rejected]
Ability to order the names of audio files based on user defined criteria [rejected]
Ability to choose media type/size when building in order to span multiple media. [rejected]
Require the ability to build an 'incomplete' book for distribution

Backup

user triggered restore point
option to autocreate restore points at set interval
compulsory autocreation of restore point at session start
compulsory autocreation of restore point previous to calling any external application
restore points should cross session boundaries
restore points have a date and time associated with them
restore points to include support for restoring wave files that have been physically altered
creating of restore points shall be fast
restore point history: within size limit set by user
automatic documentation/log if crash occurs
previews of restore points (or summaries of project state at the restore points) to be presented to the user before rollback begins
A roll-back function (with user defined roll-back time) [rejected]
Ability to create a manual restore point (not time dependent). [rejected]
All types of restore points should be capable of being made during recording, processing, etc. [rejected]

Undo/Redo

multiple levels of undo/redo per session
information on currently active files and active fragments of files

General Tool Usage

Copy-paste audio/text between multiple projects that are open simultaneously.
Possibility to set defaults that differ from "factory" defaults.

Learning the Tool

documentation provided in accessible and open format
context sensitive help
should cover all functions of the system
ability to translate helpfile
ability to associate additional (customized) helpfile to help implementation

Tool Installation

detection of previous installs
localization requirements apply to installer
accessibility requirements apply to installer

Using the tool with other (external) tools

call external programs and have these programs perform operations on DTB data; selected portions thereof, or whole DTB
provide system status (recording, paused, busy etc) via programming interface
DTB must be valid throughout production process
Interface for communicating with peripheral devices (e.g. remote control).
Allow project data save to/load from SQL Server (or equivalent) database. [rejected]
Include touch-screen support (USB/Serial). [rejected]
interoperability with library and production databases or other tools within a networked environment [rejected]
export project metadata to networked resources such as library repository or holdings [rejected]
If audio processing is to be done externally, then it is imperitive that the production tool have DirectX and VST plugin capability.
Any plug-in should be applicable to live recording as well as editing functions.

Adapting the Tool to the Environment

shall run on Linux Gnome
shall run on Mac OSX
shall run on currently by MS supported windows operating systems
data stored to disk be stored in open, non-proprietary format
ability to select input and output audio devices
network support: user profiles and projects
network support: UNC paths

Adapting the Tool to the User

support APIs needed by major accessibility products
compatibility with W3C ATAG/UAG
modify fontsize and colors of all controls of software
expose or hide viewports, controls in all main windows
external definition of configuration and layouts, stored in user profile, including accessibility configuration
availability of some predefined configurations of configuration and layouts, including accessibility configuration
wave form display to include markers describing smil structure
textual content display that supports CSS rendering of XML with realtime contextual highlighting
vu meter with dual viewport: graphic and textual
coexistance of wave form display and textual content display
multiple projects open simultaneously
Allow Page and Chapter key assignment to any keys, including USB keypads and other external serial/USB devices. [rejected]
an annunciator that indicates processing is occurring and the program has not crashed (progress bar indicating estimated time until completion and an audio cue, or some other cue) for all non-interactive operations that are not instantaneous
save narrator specific phrase detection settings by user (narrator)
Audio should be displayed as a wavform on the screen, and display event markers, etc [rejected]
Allow editing functions directly to the audio display, which automatically updates smil refs, etc.
Ability to select an event marker (e.g page or section) that is displayed on the WAV form and drag it forwards or backwards. If it is dragged beyond another event, a warning should be given before it is actioned.
Ability to customise the screen to display only the minimal amount of icons/shortcuts needed. This should be saved as part of the program settings for all projects. [rejected]

Adapting the Tool to the Locale

support for unicode encoded xml documents
ability to import (transcoding) of documents of commonly used charactersets
mix scripts in textual content
specify and modify language information of documents and fragments of documents
all text strings of GUI externally defined in open format that can be translated into local language
external translation tool to support updates to software without need for complete retranslation
all GUI controls able to display any operating system supported script/font
ability to choose any font supported by the OS for textual content display

Requirements grouped by Categories

Extensibility

Internationalization

Localization

Platform Support

GUI

Accessibility

Validity

Installation

Documentation/Help

Metadata

Import

Narrator

Audio Technician

Safety, Error and Mistake handling

Proofing

XML

Graphics (images, video)

Text-to-speech

Analog to Digital

Miscellaneous Properties

Requirements