A Question regarding a DOCX to XHTML Transformation


Dear Daisy 2 Developer Team,

I just recently found out about this tool and I'm really excited to use and adapt it to our needs.

We want to implement an automatic transformation from a well structured and annotated (in terms of document styles) DOCX Document to an XHTML document.

We already have an XSLT-Stylesheet in place that handles all necessary transformations. The XSLT-stylesheets uses an external XML-configuration file in which certain aspects of the transformation can be determined (e.g. wrapper elements for images and captions, tags of the XHTML output etc.)

Although I still have to dig deeper into XProc I already have a first idea on how to implement a pipeline/script within your Daisy framework that will handle the DOCX-to-XHTML transformation.

I therefore wanted to ask if you could give me feedback regarding the feasibilty and possible caveats.

(a) When setting up a job for the DOCX-to-XHTML transformation four parameters need to be specified by the end user

- the source docx file
- the config.xml file for the xslt transformation
- the name (filename) and location (file location) of the resulting xhtml document

(b) The first step of the pipeline/script would be the decompression of the docx file by means of px:unzip

(c) The next step will be the XSLT transformation by means of p:xslt. In order for the XSLT-Stylesheet to work properly it needs the following:

- the path to the config.xml file
- the path to the document.xml and footnotes.xml file within the DOCX archive

My idea is to hand these information to the XSLT-Stylesheet by means of parameters which can be defined within the p:xslt step. These parameters will in turn receive their values from the parameters specified during the setup of the job (see a).

(d) The last step would be to save the resulting xhtml document at the location specified at the beginning (see a).

Thank you very much and regards, Matthias Einbrodt

Hi Matthias,

Your project looks really interesting!

What you describe is perfectly feasible with XProc. It's probably better to discuss the technical details in our developers mailing list or via email (mine is rdeltour at gmail dot com). I'll be happy to provide further assistance and pointers!

Best regards, Romain.

Hi Romain,

thanks very much for your quick reply. I already registered to the Mailing list and awaiting approval.

Regards, Matthias