The article ‘Reading the DAISY Way’ explained in plain language what DAISY is and why it is important both for those who are unable to read standard print and for publishers. This article explains what is inside a DAISY Digital Talking Book (DTB) and introduces some exciting future directions for DAISY.
The largest organisations in the world providing accessible library services are members of the DAISY Consortium. DAISY membership spans the globe, with Australia, Canada, Denmark, Germany, Japan, South Korea, the Netherlands, New Zealand, Norway, Spain, Sweden, Switzerland, the United Kingdom and the United States represented on the Board of Directors. The Board sets the policies and develops the strategic plans that drive the DAISY Consortium forward. A complete list of the organisations and companies belonging to the Consortium is provided on the DAISY website. Members around the world are producing DAISY books. Friends of DAISY, who are for-profit developers of hardware and software technologies, design and distribute related tools and systems that support DAISY playback, production, implementation and so on.
Although approaches to production and the types of DAISY books being created both vary, all parties are encouraged to ensure that their collections are valid and conform to DAISY specifications. Some organisations have built their DAISY collections rapidly by converting large numbers of their analogue master collections into DAISY DTBs.
The DAISY standard has evolved. Where DAISY 2.0 was based on HTML, DAISY 2.02 is based on XHTML, SMIL (Synchronized Multimedia Integration Language — a W3C Standard), audio and images. DAISY/NISO is based on XML, SMIL, audio and images.
DAISY is a multimedia standard. DAISY 2.02 has been the implementation standard since early 2001, and many organisations still use this fileset specification. The DAISY/NISO 2005-2 Standard, first approved in 2002 as ANSI/NISOZ39.86, supports the production of richer, further enhanced DAISY DTBs.
The DAISY Consortium is the designated maintenance agency for DAISY/NISO 2005-2. Organisations and companies producing DAISY DTBs are focusing their efforts to move to the DAISY/NISO standard, on which all new Consortium technical developments focus (DAISY 2.02 support is strictly in maintenance mode). Neither DAISY 2.02 nor DAISY/NISO 2005 are proprietary; they are both open standards.
It is the XML ‘tree structure’ (or parent-child/sibling-sibling relationship) that gives DAISY DTBs their nested navigation and provides readers with direct access to the content of books never before possible in audio or print books. The markup in a DAISY DTB content or textual component file conveys the structureand semantics of the information. The attributes within the XML elements contain additional information about the data being presented.
The heart of the DAISY/NISO standard, and therefore of DAISY DTBs, is a DTD called dtbook 2005‑2. A DTD defines a collection of allowed element and attribute names and the relationship between them; that is, which element is allowed as a child of each element, which attribute is allowed on which element and so on. Designed to identify common elements found in books and other publications, dtbook 2005‑2 defines the markup for the textual content of a DAISY DTB. It is a machine readablelist of allowable tags, the attributes that may be applied to them, and rules about where the tags may be used.
Also designed to facilitate conversion to an XML vocabulary, dtbook 2005‑2 borrows heavily from XHTML and adds the elements needed to‘describe’ a book (such as page numbers, sidebars and annotations). It is a simple vocabulary, containing only 81 elements. While other vocabularies describe book content — such as docbook, Text Encoding Inititaive (TEI) and TEIlite— dtbook 2005‑2 was specifically designed to be simple and reflect the structure of a book.
The maintenance of the DAISY/NISO standard includes availability of earlier versions of the DTDs, ensuring that a book produced previously will remain valid even though a newer version of the DTD has been recommended.
The DAISY Structure Guidelines: Part 1 (available from the DAISY website) explain that:
‘The DAISY DTB is a collection of digital files… which provides an accessible representationof the printed book for individuals who are blind, visually-impaired, or print-disabled…The structure of the book is designated by the XML tags and is accessible to the reader by use of a browser or a playback device.’
The degree to which the ‘body’ of a DAISY DTB is structured and marked up will determine the navigability and usefulness of the book. The structure of any DAISY DTB should be at the same level as the structure of the original publication. A novel, for example, usually has a one-level structure consisting of chapters. In contrast, a textbook or cookery book may have several structural levels that should be incorporated into the DAISY DTB, enabling the reader to navigate through sections quickly and easily.
From the Guidelines again:
‘However, extensive markup may demand significant resources. The level of structure and markup chosen by the producer will therefore depend upon striking a balance between the resources available, the requirements of the publication, and the needs of the visually impaired and print-disabled readers.’
Some organisations may be limited by resources available for production.
The DTB structure created by the producer determines the level of navigability available to the end user. The greater the structural markup, the greater the number of navigation points available to the reader. A DAISY/NISO 2005 DTB consists of some or all of the following files:
The NCX file is the navigation interface for the user, providing a view of all of the navigation points within the DTB. Each navigable point in this file is linked through the SMIL file to the exact corresponding point in the audio and XML textual content files, giving the reader direct navigation to those points within the DTB.
From the DAISY/NISO 2005 Standard:
‘The Navigation Control file for XML applications (NCX) provides the reader efficient and flexible access to the hierarchical structure of a DTB as well as direct access to selected elements such as page numbers, notes, figures, etc.’
Requirements for tagging and structuring The following tags are required for a book to be valid under dtbook 2005‑2. The complete DAISY DTB is surrounded by the <dtbook></dtbook> tag pair. Within these, the <head></head> and <book></book> tag pairs must be present, in this order. The <head> tags identify information about the book that is separate from the content. The <book> tags enclose the content of the book.
Within <book>, the content may be divided into three sections: front matter, body matter and rear matter, presented in that order and tagged with the elements <frontmatter>, <bodymatter>, and <rearmatter>.
Most published materials are produced with some hierarchically arranged structural elements. Markup identifies the proper hierarchical structure in the DAISY DTB.
Levels describe the relative position of the major structural elements. Components at different levels in the hierarchy must be nested, that is, contained one within the other. A component at a lower level must be completely inside the component that is at a higher level. For example, a level 3 must be preceded by a level 2, and a level 2 by a level 1. A level 3 element that is not inside a level 2 element is invalid.
These concepts are fundamental to XML and thus to DAISY DTBs. Wellformedness refers to the syntactical correctness of the document, while validity refers to its grammatical correctness. A document that is not well formed cannot be valid, but a document that is well formed may be invalid. Each one of the documents used within a DTB must be valid to the specification, including XML validity.
It is critical that DAISY DTBs produced around the world are valid and can be upgraded to new standards. The DAISY Consortium has developed open-source, comprehensive validation tools to ensure interoperability and has a commitment to provide utilities which support migration from one standard to the next. Migration can only be ensured if DTBs are100% valid and conformant. DAISY validation tools go well beyond XML validation: they evaluate all the interrelationships of the files within a DAISY DTB.
The Consortium continues to develop production and migration tools to ensure that DAISY books produced yesterday and today are valid and can be migrated to new standards and different formats.
Many of the tools developed by the DAISY Consortium are open source and all DAISY tools currently under development are open-source projects. Individuals and organisations around the globe are participating in DAISY open-source tool and standards development. Adoption within the mainstream is one of the key goals of the Consortium and LGPL (Lesser General Public License) open-source licensing encourages reuse of the software by for-profit companies.
Validation tools The DAISY Consortium has developed two validators, one for DAISY 2.02 and another for DAISY/NISO. Both are open source and are available on the DAISY website for download by individuals or organisations that wish to create valid DAISY DTBs. The DAISY validation process for a typical book runs through several thousand tests on the interrelated files within the DTB and goes far beyond XML validation.
The DAISY Regenerator, also open source, incorporates the DAISY 2.02 Validator and was developed to upgrade DAISY DTBs from DAISY 2.0 to DAISY 2.02 as well as to correct errors and problems identified in 2.02 books. Producers who have collections of books in DAISY 2.0 format are encouraged to upgrade their collections to 2.02 to enable migration to DAISY/NISO and thus future-proof their collections.
The Consortium and its Members and Friends, have developed tools to support DAISY DTB production. Two current and very important DAISY Consortium projects are the DAISY Pipeline and Urakawa. Both of these open-source projects are future-facing and will support the implementation of DAISY/NISO 2005.
1. The DAISY pipeline
This is a collaborative software development project hosted by the DAISY Consortium. There is a clear need for tools to transform content from one format to another as simply and economically as possible.
Required transformations include:
- Publisher file to DAISY master
- DAISY to Braille format
- DAISY 2.02 to DAISY/NISO
- DAISY/NISO to 2.02 (for distribution purposes)
The goal of the Pipeline is to create a single point of coordination for DAISY-related document/fileset transformation developers. It has been called the ‘ultimate in conversion tools’. Detailed information about the DAISY Pipeline is available on the DAISY website.
2. The Urakawa Project and Obi
The Urakawa Project is a collaborative effort by the DAISY Consortium, INRIA (l’Institut National de Recherche en Informatique et en Automatique), CWI (Centrum voor Wiskunde en Informatica) and NRCD (National Rehabilitation Center for Persons with Disabilities). Its objective is to develop a multimedia authoring software toolkit including an object-oriented abstract data model, an Application Programming Interface (API), a code library and at least one sample application. The deliverables will be open source, royalty free, and available under licensing terms that will encourage commercial and non-commercial companies to build on the API and code library.Obi, a comprehensive audio-centric production tool, will be the first deliverable of the Urakawa Project. Still in the early development stages is Tobi, a text-centric production tool. Detailed information about the Urakawa Project is available on the DAISY website.
DAISY OK takes DAISY DTB production and reading system development beyond the basic requirements and will allow Members and Friends of the Consortium to participate in the DAISY OK Self-Certification process.‘To create this rich reading experience, the books must be feature-rich and the reading systems must utilise the features to provide an enhanced reading experience… the DAISY OK self-certification process sets minimumrequirements for books and reading systems. We strongly encourage book producers and reading system developers to go beyond what is set as the minimum.’
George Kerscher, Secretary General of the DAISY Consortium wrote in his paper ‘New Applications of The DAISY Standard’:
‘The DAISY standards describe an XML vocabulary, but it is limited in what it can do today. The DAISY Consortium is extending the current elements we support, but more importantly, we hope to support all logical XML vocabularies in the future. This means that all publishing systems, including those optimised for health information could be used within the DAISY multimedia and navigation model. This is research and development we are targeting for the next several years.’
DTDs are being replaced by schema such as Relax NG. The DAISY Consortium will be migrating to the more robust schema-based languages in the next version of the DAISY specification. DAISY will continue to be in the forefront of standards for accessible reading materials, with the goal to move this multimedia standard into the mainstream. Why shouldn’t everyone be able to reap the benefits of ‘reading the DAISY way’?
Lynn Leith is Head of Information Services and Administrative Support for
the DAISY Consortium. She has worked with the Canadian National Institute for
the Blind’s Library for the Blind for 27 years in audio master production.
She has been involved with the work of the international DAISY Consortium since1996,
participating in the development of requirements and standards for software
production tools for DAISY DTBs and of many training and support materials.
E: lynn (dot) leith (at) gmail (dot) com
W: www.daisy.org