ZedAI Meta Data - Status Report - January 2009

From zedwiki

Jump to: navigation, search

Contents

Overview

The work of the meta data subgroup has been split into two parts: identifying meta data for the authoring and interchange format and identifying meta data for the distribution (talking book) format. This status report concerns itself only with the former.

The first task of the group was to analyze the current meta data set used in the DAISY/NISO standard together with various meta data standards in use by producers as a means of developing a consistent set of required/recommended meta data for the interchage format. This work is currently winding down, although discussions will likely be carried out on the list until the final recommendation is ready.

The second task, to commence in January, is to identify an extensible framework that will allow the required/recommended meta data to be expressed and that is also flexible enough to allow the various producers of DAISY files to customize to their needs. Information on this work will be added to this page as it becomes available.

Terminology

Producers, Publishers, Creators and Sources

The distinction that has been made between producers, publishers and sources in previous versions of the standard are harder to delineate in the authoring and interchange format under development:

  1. When dealing in talking books, a print publisher can usually be assumed. Not so when documents with a wide range of uses can be, and are expected to be, authored using the new text format.
  2. With the new standard, the producer and the publisher of the text files will, ideally, be the same entity, whereas previously the producer of a talking book was normally a library for the blind and the publisher was a traditional print publisher.
  3. The text standard requires a document, which makes for an ambiguous use of source as an identifier. When a talking book was the end product, the source was the print document. Considering that these text files may be output to many formats (talking books, braille, e-text, etc.), the files themselves are the source.

Although publisher, source and producer information are all important data points to retain, a need exists to structure this data in a way that makes more sense for the documents that are expected to be created.

Document and Producer

A clearer way to refer to the meta data for the authoring and interchange format is by use of the terms "document" and "producer", where:

  • document refers to all meta data that identifies the work. This information would include the creator, title, copyright, publisher (if one), etc. Essentially, all of the information that had generally been grouped under the Dublin Core banner and as the source.
  • producer refers to information specific to the production of the text file (who created, when, what version, etc.)


Meta Data List

The current list of required/recommended meta data can be found here.

This list is not meant to be a specific listing of elements from any standard, but a general overview of the meta data data points that should be included in all ZedAI files. How best to represent the data is yet to be determined, but more on this topic can be found in the following sections.

Inclusion Methods

To be determined is the method by which meta data will be included in the new authoring and interchange files.

Embedded

Usually achieved through the use of a header or meta data element preceding the document content, this method is the most often employed in shared data files. Its primary advantage is the simplicity of keeping the meta information with content in a single file. Its major drawback is that the meta information is not accessible without obtaining and processing the entire content of the document, which in many cases defeats the purpose of meta data.


Separated

Meta data can be separated from the content file either through storage in a separate xml file or in some cases through storage in a relational database. This approach makes the meta data more easily and quickly available.

The drawback for interchange in this kind of system is that multiple files have to be transferred, making loss of data a realistic problem. Processing of the data also becomes more of a challenge, as information will initially be contained in two separate xml documents or an xml file and a database.

(Database storage would tend to be a non-starter for the intended use under consideration.)


Inclusion Mechanisms

XHTML2/RDFa

RDFa could be used in conjunction with XHTML's existing meta tags to mark up meta data. This approach would be the most natural and elegant, as it would use the framework intended by the XHTML2 working group. Dublin Core has an RDFa extension, but it is the only of the major meta data standards to have taken this step. MODS, ONIX, etc. do not have models for use in RDFa, and this group would not be in a position to migrate the standards (see Issues).

A similar approach would be to use RDFa on span elements within the front matter and/or body to identify pieces of meta data where they appear in the flow of the body. Although a viable approach, this method does present a problem when not all meta data appears in the content (which could lead to a confusing mix of meta and span elements to cover all required meta data). A separate meta data section in the document header would also be an inevitability, as producer information would not be storable as content of the documents.


Meta Elements

Similar to the use of spans with RDFa, another approach that has been floated is to create unique elements for every piece of meta data. There are more drawbacks to this approach beyond the potential for missing meta data elements noted above. Having to define elements for all of the possible meta data that could be tagged is one. Creating content models to allow this tagging would be another. This approach would seem to work best in controlled situations, like when print book reproduction is the only concern.


Dublin Core Application Profile

A dublin core application profile is yet another possibility for implementing meta data, but it's applicability may be hampered by its complexity. A profile would allow a standard set of meta data to be defined, but it is unclear at the present how extensible the model is or how easily the meta data could be integrated. The standard is still significantly new that there is not a lot of literature or working examples to reference.

This approach may also be crippled by the lack of RDF implementations of other standards (see Issues).


Importing Namespaces

If all else fails, namespaces could be imported so that element sets can be used directly. This would be the least practical solution, as it would prove messy and convoluted, but is available as a last resort.


Issues

RDF

In order to use RDF for the meta data, RDF models need to exist for the standards that we anticipate using. Dublin Core is ahead of the curve in that they already have such a model available (http://dublincore.org/documents/dc-rdf/). Other standards (MODS, ONIX, etc.), do not appear to have be as progressive, although there is some (unofficial?) work that appears to have been done for MODS.

Potential solutions to this problem include:

  • determining if IFLA can exert any pressure on the MODS standard maintainers to make an RDF model available
  • create our own RDF model and submit it to the LoC in hopes they will implement


NVDL and Document Exposition

Namespace-based validation obviates the need to litter data files with information about specific schemas or DTDs to validate the files against. Validation can remain separate from the data because namespace are used to segment the data and send it for the appropriate validation.

Is there consequently any need to take a step backward and put information about file conformance into the meta data? It might be helpful from a human perspective to know which version of the standard and what profile were intended, but ultimately the entity manipulating the file will have their own rules and processes to apply.

One concern that has been raised is that a file will have to be inspected for namespace usage in order to determine its suitability for particular transforms (i.e., has it been marked up for braille or only as a leisure book and how will that affect the generation of a braille file).


Unique Identifiers

In order for identifiers to be unique, there must be a central agency controlling the assignation of the identifiers. DAISY does not currently appear to be in a position to assign identifiers for every possible producer (assuming production outside of libraries for the blind).

IFLA may be in a better position to be this authority, but that would make the unique identifiers a data item that would have to be added when works are added to the global library catalogue.

No harm in adding identifiers as a recommended data point, but what instruction can we give producers on how to assign them without potential conflicts? (For producers unable to use a library identifier and for which the documents have no isbn, etc.)


Optional Meta Data

One of the questions yet to be resolved is how much optional meta data should be included in the standard specification, if any at all.

Although the inclusion of any optional meta data will be at a producer's discretion, there is an advantage of standardization if the optional naming has, as much as possible, been handled in the specification. Alternatively, though, if the meta data is not considered essential to the files and is at producer discretion, there is less need for it or for uniformity of naming.


Deliverables

Specification Prose

After the framework that will be used to express the meta data has been established and has received approval, work will begin on developing a definitive explanation of the meta data requirements for the specification. The text of this document will be posted to this site under the heading Specification Prose.


Implementer's Guide

In addition to the specification text, a practical guide to implementing meta data in the new standard will also be developed. This guide will most likely include more verbose descriptions and examples of meta data using the established framework. The text of this document will be posted to this site under the heading Implementer's Guide.

Personal tools