A production chain to go from HTML to EPUB3 with audio ??


I have been working for a while with the following production chain

1. Scan and OCR a book
2. Correct the OCR and add structure with MS Word with DAISY Plugin
3. Export to DAISY XML with DAISY Plugin
4. Convert DAISY XML to DAISY Book using the "TTS Narrator" option in DAISY Pipeline. This step generates the audio files using the TTS of the PC.
5. Convert DAISY Book to EPUB3 using DAISY Pipeline 2. The output here is now and EPUB3 book with media overlays.
6. Consult the book with Readium/Chrome or any other EPUB3 compatible reader which can handle media overlays

This is all working fine, but now I want to take on some work where step 2 (Correct OCR and add structure) will be done by a number of people.
Ideally, I can set up a web application where the editing is done using a web browser ( something like http://etherpad.org/ ) with one webpage per bookpage.

The bit I can't do yet is the step 3, i.e.how to transform my html to DAISY XML.
I could do this in javascript or XSLT but I'd like to know if anyone else has already done this, i.e. converted html to validated DAISY XML ??

Alternatively, has anyone taken html, generated the associated audio files somehow and then generated EPUB3 without using DAISY Pipeline ?

Best regards,


The very good and amazing post about HTML to EPUB3, this post provide unique information of my interest XML and HTML coding. 3 part ncr forms in UK