ZedDist Document Text
From zedwiki
This is a prep area for the ZedDist Document Text Component.
Contents |
Background
As a solution to many of the text-related evolutionary targets in the design goals, strawman 2 proposes the use of XHTML derivative grammars for DTB textual content. (Note that this refers to the "traditional" notion of DTB text as referenced from smil:text and held in a standalone XML document. Timed text is a separate concept entirely.)
The following properties serve as the fundament for making this a viable solution:
- No need to assert validity to (any of the) W3C XHTML variants
- The grammars created would be based on the XHTML namespace and use the fundamentals of the XHTML document structure and element set, but do not necessarily need to be valid to XHTML1, 1.1 or 5. The success criteria here are
- that the grammars created can be passed to a browser in text/html mode, with a resulting fully functional browser DOM returned
- that CSS-based layout and rendering works without hitches in any reasonably compatible CSS 2.1 or higher implementation
- The grammars created can be derivatives of XHTML(5)
- In the case we find ourselves forced to extend the grammar as given from W3C/WHATWG, and from a labelling perspective, "DAISY XHTML" or something like that would be suitable to clarify that this is not traditional (subsetted-only) XHTML. What we w beould create would domain-specific derivatives of the W3C (X)HTML spec family (in other words, not a fork). While what we would do bears resemblance to the XHTML Modularization approach, we have no firm requirements to be compliant with that spec either.
- Creating more than one grammar; "variants"
- Based on a pool of modules (see SVN sandbox), we allow several variants of DAISY XHTML to be created. Each ZedDist Profile that includes the Document Text component must create (and allow) one or several of these. Note: a ZedDist profile is not locked to allowing one variant, but can allow several.
- Need to harmonize with EPUB 3.0
- Ideally, the approach we take here (and the resulting grammars) should be in full harmony with the upcoming EPUB 3.0 spec. As a result, it is likely that our draft solutions will have to be revised and modified repeatedly.
Derivation patterns
The following methods are available (technically through the modular design) to allow us to create derivatives that satisfy the DTB context requirements:
- Subsetting of the XHTML element set
- XHTML elements that do not apply to our context are simply not available in our module pool. Variants (concrete grammars) may also subset individually by omitting certain modules from the grammar. Note also that due to the profiles being SMIL-based, a tentative assumption is that any HTML 5 elements that has duration (video, audio, possibly canvas) would be excluded (and instead included in SMIL).
- Subsetting/strictening-up of the XHTML content models
- The traditional content models of XHTML can be subsetted / strictened up where deemed necessary. Note that this subsetting can vary between the DAISY XHTML grammar variants.
- Use the attribute axis for semantic annotations
- Using technologies such as RDFa and Role, we can "enhance" existing elements in the XHTML namespace with semantics. Note that a vocabulary is under development in ZedAI, and this is intended to be used by ZedDist as well (for annotation in SMIL, NCX and Document Text contexts). An example:
<div role="footnote"> - Add new elements and attributes
- This solution can be employed as an alternative to, or in conjunction with, attribute-axis semantic annotations. The result is a namespace compound document. Behavior in (X)HTML browser DOMs needs to be carefully investigated.
Grammar variants
The below table lists DAISY XHTML grammar variants. Note that, in reality and in the way the spec is being built, each profile creates "its own" grammar or grammars using the module pool. So this table is not a final enumeration of what grammars are available to the profiles; it's more of a "guideline" to help us design the module pool correctly.
(In discussions so far, we have been talking about strict and loose versions. These terms have a history and connotations, and should therefore be replaced by other designations)
| Designation | Rationale | Nature | Success Criteria |
|---|---|---|---|
| loose | Primarily intended to be used when "DAISY-fying" existing content off the web, or from other sources which have an inherent poor predictability in terms of structure and semantics | Would exert limited content model subsetting, would allow but not require attribute-axis annotations and new elements/attributes | Automated clean-up processes are in a vast majority of the cases able to generate content that conforms to this grammar variant. |
| strict | Primarily intended to be used when producing DTBs from "good" sources in terms of structure and semantics. | Would exert more extended content model subsetting. Could if necessary require attribute-axis annotations and new elements/attributes | The resulting instances possess a predictability and "richness", structurally and semantically, that is equal to, or surpasses, DTBook. |
Grammar variant Q&A
- The table above does not seem to follow the ZedAI axis of "one grammar per content nature". Why?
- Well, we could do this, producing, for example, DAISY Book XHTML, DAISY Newsfeeds XHTML, DAISY Generic XHTML, etc. But first, it needs to proven that it's a worthwhile exercise in this context. Note that the difference between these grammars in the distribution stage could be covered by attribute axis annotations, which would allow preservation of context-specific semantics without requiring a unique grammar to be created. Think:
<section role="article"><span role="dateline">).
Additional structures and semantics needed
TODO complete this table (notes, noterefs, annotations, glossaries, pagination, bridgeheads (zedai:hd) etc etc)
| Concept | Description | Doable as attribute-axis annotation (role)? | Doable as new element/attribute? | Available in loose? | Available in strict? |
|---|---|---|---|---|---|
| section | A major structural division (equal to dtbook:level and zedai:section) | Yes, but as HTML5 has section, we're better served using that | Yes, but as HTML5 has section, we're better served using that | Yes, but not required, and without any major content model restriction. The xhtml div element is interchangeable with section | Yes, and required, also requiring proper use of headings (zero or one heading per section.). The XHTML div element is subordinate to section. |
| semantic annotation attribute | An attribute that allows semantic annotation of any element | This is that attribute. Whereas RDFa allows this generally, it does (in version 1.0 at least) not allow the single-attribute approach. The W3C role attribute (if alive), RDFa 1.1 (if addressing the single-attribute approach), ARIA role (if it is confirmed to have the necessary properties, see questions on ARIA role below), or "our own" role attribute would be used. Compare also TEI @type | Yes, we create this attribute if the W3C does not provide a solution for it, but it is likely that either ARIA role, PF role or RDFa 1.1 would provide a workable solution. | Yes | Yes |
| smilref attribute | An attribute that allows linking into smil presentations. | No, the value of @smilref is not an RDF property. | Yes. We prolly want to keep stuff like this in a separate namespace. Its quite a hack to begin with, maybe it can be dropped entirely (up to reading device developers mainly) | Yes | Yes |
| front- body- and rearmatter | Major structural divisions in a document. Equivalent to dtbook:frontmatter, dtbook:bodymatter and dtbook:rearmatter | Yes, syntax: <section role="x-matter">. Since these elements represents only their children, it could be argued that these document divisions are not necessary for distribution. On the other hand they might be useful for screenreaders and reading software | Yes, using dtbook vocabulary, <xmatter> | Yes, but not required | Yes |
| Sidebars and other floating information | Supplementary and floating information - often refered to as "grey boxes". Equivalent to dtbook:sidebar | Yes, using <div role="sidebar"/>, but HTML5 has the <aside> element that seem to have the same role. Syntax: <aside role="sidebar" class="classForStyling"/> Note: HTML5 ARIA suggests <aside role="complementary"> | While the <aside> element seem to cover our needs, we still need to figure out what to do with the dtbook:sidebar@render attribute. Should this attribute be implemented as a new attribute or would a scripting solution based on rolename work? | Yes | Yes |
| Notes/Annotations | Foootnotes, endnotes and annotations in the text | Yes, syntax: <div role="note" class="footnote/endnote" id="NoteId"/>. Alternatively <div role="footnote/endnote" id="NoteId"/> Note: HTML5 ARIA suggests <aside role="note"> | Yes, but is there anything to gain compared to the <div> solution? | Yes | Yes
|
| Noterefs/Annorefs | Reference to notes and annotations | Yes, syntax: <a role="noteref/annoref" href="#NoteId"/> | Yes, but probably not without implementing <a href> in one way or the other | Yes | Yes
|
| Pagination | Navigation points in the text representing pagenumbers. Note that the actual page is not represented as a container, though a virtual page container can be constructed from the pagenumbers. Equivalent to dtbook:pagenum | Yes, syntax: <span role="pagenum" class="front/normal/special">nn</span>. Alternatively: <span role="page-front/page-normal/page-special"/> | Yes, dtbook pagenum syntax could be used | Yes | Yes |
| Lists | Ordered, unordered, or preformatted lists | <list type="ol/ul"> translates into "traditional" Html list vocabulary (<ol> and <ul>). The pre-formatted option would have to be implemented through styling: <ul style="list-style:none;"> and setting the class or role="preformattedlist" | Yes, but Html <ul> and <ol> covers our needs | Yes | Yes |
| Speech synthesis instructions | Speech synthesizer instructions, such as references to PLS lexica (head/ssml:lexicon) and phonemic/phonetic pronounciation instructions inline (ssml:say-as, ssml:ph) | No, but inline pronounciation instructions could be done on attribute axis only. See ZedAI's ssml-phoneme-attrib.rng | Yes, preferably using SSML 1.1 semantics (as does ZedAI, see z3986-feature-ssml.rng) | ||
| Image group | A container for one or more images and their captions and/or descriptions | No, not without implementing an ID reference (caption@idref). A less perfect solution could be: <div role="imggroup"><img/><p role="caption"/></div>, but that would mimic the new Html5 element <figure> with its child <figurecaption> which would probably be a better choice. A disadvantage about <figure> compared to dtbook:imggroup is that it can only hold one <img> and one <figurecaption>. A dtbook:imggroup with more than one image or caption would have to go through a complex transformation. | Yes, dtbook:imggroup syntax could be used | Yes | Yes |
| Headings, other than <h1>-<h6> | Headings that are not part of the main structure of the document, e.g. headings in lists and <div>. Equivalent to dtbook:bridgehead , docbook:bridgehead, dtbook:hd and zedai:hd | Html5 doesn't have any elements equivelant to dtbook:bridgehead/hd. Both <header> and <hgroup> have other functions. This leaves us with <p role="hd">. <p> cannot be a child of <ul> and <ol>, but it can be a child of <li>. Example: <ul><li><p role="hd">List heading</p></li></ul> . | Yes, dtbook:hd and dtbook:bridgehead is a possibility, or better: a combination of the two (since dtbook:level/hd is not needed in html) | ||
| Table | Tabular data in a document - not intended for lay out purposes. The HTML5 proposal is very close to existing dtbook grammar and we should be able to use HTML5. Differences include the summary attribute - in HTML5 replaced by the <summary> element | No | No | ||
| Definition lists (e.g. glossaries) | Lists of terms and their definitions/explanations | <dl>, <dt> and <dd> seems to be unchanged compared to previous versions. Since dtbook have adopted the HTML definition list vocabulary HTML5 definition list can be used without problems | |||
| Forms | TODO |
|
Questions on ARIA and @role
Reading 3.2.6 Annotations for assistive technology products (ARIA) of the HTML5 spec, the following questions appear:
- what is the proper mechanism to add properties (== values for role)? It is clear that HTML5 does not endorse RDF and/or CURIE-based hookups of new vocabs here (even if the ARIA vocab is defined using RDF).
- Is it an HTML5 conformance error if a value appears in HTML5@role that is not part of the ARIA vocab?
- Are multiple (space separated) values supported in HTML5 @role?
Note: the current draft ARIA vocab is available here.
Extension Points
TODO: concretely how to integrate CDR and CDI.
Implementation
The module pool is implemented using RelaxNG+ISO Schematron. Using an approach developed within ZedAI, informative W3X XML Schema versions of the schemas are autogenerated from the RelaxNG sources.
It is not a success criterion to be able to autogenerate DTDs, all following the sales pitch mantra "DAISY 4 - now with 100% less DTDs!"
