ZedAI Meta Data - OutOfLine Proposal
From zedwiki
This page discusses a proposal of using out-of-line metadata in ZedAI.
Contents |
Issue
The many different production uses for book metadata -- bibliographic, commercial, etc. -- while having many overlaps also require their own unique meta information, making it near impossible to find a single standard that covers all potential needs.
Metadata in these contexts, moreover, is consumed by many different systems during book production and delivery, making access to, and comprehensiveness of, the metadata critical to each producer.
By defining a single standard for metadata storage in ZedAI documents, we are potentially limiting production systems and causing more problems than a single standard was designed to solve.
Rationale
The concern for the ZedAI format is strictly with the use of the metadata within the context of a ZedAI document.
Consequently, the first question that has to be asked is: what function does metadata serve in these documents? The primary answer is that ZedAI documents will be read and used by ZedAI processing agents. Examples of such systems, as given in the draft specification, include "authoring tools (XML editors, XML-enabled word processors), transformation pipelines, business transfer chains, end-user provisioning interfaces, and conformance validators."
While some or all of these systems might have need of bibliographic or other metadata about a book, using the ZedAI document itself as the source of this metadata is not an intuitively sound design decision. An end-user delivery system, for example, would find it much more efficient to get information about the book from a library database than by opening the actual authoring format XML document and parsing it for standard metadata.
There is, moreover, no imperative that the complete book metadata actually appear in the book itself, so long as the book can be unambiguously identified and include information on how to obtain additional metadata. With that information, a processing agent can get whatever other information might be necessary -- if any -- to complete its work.
Proposal
The proposed metadata solution for ZedAI is to two-fold.
First, a core set of required metadata relating specifically to the ZedAI document will be inlined in each document. This required metadata will:
- identify the ZedAI document's creator, including the producing entity's name and location;
- uniquely identify the document, including a unique identifier (as defined by the producer), a version number and a release date.
Second, ZedAI documents will allow the optional inclusion of one or more out-of-line documents containing additional metadata. These documents will be referenced by URI, which will make it possible for the links to point to files (local or remote), web pages, or web services (see "Linking Metadata" below).
The external metadata objects would allow any processing agent to unambiguously identify the resource being processed and to discover any further information it might need.
Benefits
The decoupling of metadata from ZedAI documents will allow the kind of flexibility that producers will need to integrate ZedAI documents into existing workflows and to create new workflows using the ZedAI format that aren't limited by a pre-defined metadata storage scheme.
A producer who prefers that bibliographic data accompany their documents, for example, could attach document-specific metadata in the form of a MODS or MARC record. A producer that intends to sell their content could instead opt to attach an ONIX record.
An additional advantage is that this method will allow metadata to be adapted for as-yet-to-be-defined purposes without impacting on a producer's existing production processes. If, for example, the IFLA global library intitiative requires MARC metadata for inclusion of documents, a producer would only have to create the records and add an additional link to their documents in order for them to be accepted.
Drawbacks
The primary drawbacks to this approach to metadata are:
- the problems of building transformation tools capable of working with a potentially wide variety of metadata standards; and
- the sharing of documents where the sharing of metadata is a requirement.
By not defining a single standard for metadata, we are allowing greater internal flexibility at the potential expense of external interoperability.
This drawback can be mitigated by producers making their metadata available in more than one format for external use. Although an added overhead, a mapping system is under development that may assist in the future in transforming xml-based metadata schemes (see "Cross-vocabulary mapping initiatives" below).
Another potential answer to this problem would be to define a finite set of standards for producers and developers of production tools to use, to at least cap the variety that will be found in the wild.
Linking Metadata
Typically, an inlined link would identify the resource and, optionally or by requirement, also its nature: "this URL points to a MODS record", "this URL to an ONIX record", etc. Expressing the version of the resource is also possible, and should be optional.
As of the 20091005 version of the ZedAI document head, the standard meta element should suffice. The property values would be defined in an external RDF vocabulary, so that new types can be added.
<meta rel="z:meta-record" href="metaRecord.xml"> <meta property="z:meta-record-type" about="metaRecord.xml" content="z:mods" /> <meta property="z:meta-record-version" about="metaRecord.xml" content="3.3" /> </meta>
This approach also allows linking to several metadata sets where this is needed (even of the same type but different versions)
Complete head example
<head xmlns:z="http://www.daisy.org/z3986/2010/vocab/decl/#" xmlns:dc="http://dublincore.org/2008/01/14/dcterms.rdf#"> <meta property="dc:identifier" content="int-daisy-123456" /> <meta property="dc:publisher" content="DAISY Consortium" /> <meta property="dc:date" content="2009-11-12T13:50:05-05:00" /> <!-- date serves also as version identifier --> <meta rel="z:meta-record" resource="metaRecord.xml"> <meta property="z:meta-record-type" about="metaRecord.xml" content="z:mods" /> <meta property="z:meta-record-version" about="metaRecord.xml" content="3.3" /> </meta> <!-- the nesting of metas as above is not required, just stylistic --> <!-- note that several z:meta-records can coexist in the same instance, varying types or even varying versions of the same type --> </head>
Consequences on the spec
- the current meta section is removed, and replaced by a section that includes:
- an informative explanation of the external meta record principle (noting the fact the the referenced resource (URI) can be both to a physical file, local or remote, or to a webservice (in which case the resource URI would be a query string));
- a normative statement that allows zero, one or several such external records to be linked from the instance;
- a normative statement of which meta properties must be provided inline in
document/head(identifier, date, publisher, ?location?) - a normative statement that makes dc:date equal to a version identifier (and forcing the syntax to W3C dateTime, as DC suggests)
- a normative reference to the decl vocab that identifies the (three) record properties (meta-record, meta-record-type and the optional meta-record-version)
- declaration that the above vocab also declares names for record types ("mods", "onix", "marc", "dcterms", doi?, etc)
- an informative discussion on the processing agents' "metadata harvest" process:
- noting that the document may contain additional metadata in the head (but restricted to the <meta>+RDFa attributes syntax)
- noting the document may (which minimizes duplication also) contain additional metadata in the body (using RDFa) ( <title role="fulltitle" property="dc:title">Origin of Species</title>)
- noting that if the resource identified to be the metadata record is the document itself (<meta rel="meta-record" resource="."/>) then body RDFa is what is being used exclusively
- perhaps recommending that document authors make use of RDFa in body to allow processing agents to harvest at least minimally needed metadata (title, author) for common output formats, using inlined metadata only?
- Primer updates, including a set of code examples to make the above (both the external record linking syntax, and the interaction with inlined RDFa) clear. The current MODS sample could be reused, but recast as an external document sample.
Cross-vocabulary mapping initiatives
DOI News - June 2009 DOI News is a public news release; information contained within this newsletter can be reproduced and disseminated to all interested parties. 1. Launch of "Vocabulary Mapping Framework" A new initiative, the Vocabulary Mapping Framework (VMF), has been announced by a consortium of partners. This will create an extensive and authoritative mapping of vocabularies from nine major content metadata standards, creating a downloadable tool to support interoperability across communities. The mapping will also be extensible to other standards. The work builds on the principles of interoperability established in the indecs Content Model, and is an expansion of the existing RDA/ONIX Framework into a comprehensive vocabulary of resource relators and categories, which will be a superset of those used in major standards from the publisher/producer, education and bibliographic/heritage communities. The International DOI Foundation, which fully endorses this work, will provide a web hosting facility for the Framework as part of its commitment to promoting the wider use of interoperable metadata, and will use the vocabulary mapping wherever possible to support the association of metadata with DOI names. For additional information see: -o- VMF project announcement, June 15, 2009, at http://www.doi.org/news/VMF_project_announcement_090615.pdf -o- "indecs Content Model" at http://en.wikipedia.org/wiki/Indecs_Content_Model -o- "RDA/ONIX Framework for resource categorization" at http://www.dlib.org/dlib/january07/dunsire/01dunsire.html
Links
The Digital Object Identifier (DOI®) System: http://www.doi.org/
