Go directly to main content.

NCX text encoding

Project:EPUB Maintenance
Component:Open Packaging Format (OPF)
Category:bug report
Priority:normal
Assigned:PSorotokin
Status:completed @ 2.0.1

In following the TeleRead comment thread on Michael Volz's new Firefox plug-in to render ePub, there were two comments about improper rendering of non-Basic Latin characters (such as em-dashes and accented characters) in the NCX.

Upon studying the OPF and DTBook specs regarding NCX, I realize that, unlike Content Documents and the Package, we apparently do not require the NCX to be UTF-8/16 encoded. That is, the NCX may be any encoding (so long as non-UTF-8/16 encodings are properly declared in the XML prolog).

Is this something we will want to firm up in the OPF spec? (Of course, if I missed anything from the OPF and DTBook specs, and we do now require UTF-8/16 encoding for the NCX, we should still firm it up in a more prominent way.)

Description
Issue Id: 
35
Resolution: 

Amend OPF specification section 1.4.1.2 item (vii) to read "an NCX must be included and either UTF-8 or UTF-16-encoded; and"

Comments

#1

Only allowing UTF-8/16 encoding in XML stems from interoperability requirement: this way all Reading System only need to be able to process mandatory XML encodings and do not need to carry encoding tables for all other languages (which would be a considerable burden). Allowing some non-optional XML-formatted content to escape from that restriction would defeat the whole point. My view is that not requiring NCX to be UTF-8/16 encoded is just an omission and this requireent, while not in the letter of the spec, is in its spirit.

#2

Peter, I agree that in spirit we intended the NCX must be UTF-8/16 encoded.

I propose that we explicitly state it in the updated specs.

#3

Assigned to:Anonymous» PSorotokin

#4

Status:open» proposed resolution

Proposal:

Amend OPF specification section 1.4.1.2 item (vii) to read "an NCX must be included and either UTF-8 or UTF-16-encoded; and"

I do not see a good place to put blanket XML encoding requirement; probably we should put it in once all the specs are unified. At this point we can only express it as WG opinion on that planned annotated spec wiki.

#5

Status:proposed resolution» errata

review time is up

#6

Status:errata» completed @ 2.0.1
Valid XHTML 1.0!

Powered by Drupal, an open source content management system