Overview

The DAISY Pipeline is a generic framework for document- and DTB-related pipelined transformations. The DAISY Pipeline project is fundamentally a collaborative multi-organization, created by and for the DAISY community. It has been launched in 2005 and is maintained by the DAISY Consortium.

The DAISY Pipeline software components are cross-platform and open source (released under the business-friendly LGPL).

High Level Goals

The objective of the DAISY Pipeline project is to efficiently and economically meet the increasing needs for conversion and transformation utilities within the DAISY community. This translates in the following high-level goals:

  • Support the single source master concept
    A pivot document format (e.g. XML DTBook) is used to represent the book material before publishing it to the possible output formats. It allows to decouple the production and publishing phases: on the one hand the production is a matter of converting an input format to the pivot format, on the other hand the publishing is a matter of converting the pivot format to the output format. Documents in the pivot format can be stored by organizations, and output formats can be generated from it on demand.
  • Support transformations at the production, publishing and maintenance stages
    The production is an automated, non-interactive conversion of input content into DAISY content, possibly used in conjunction with manual editing environments. The Pipeline supports for instance supports the following formats: WordML 2003, ODF (OpenOffice.org), RTF, XHTML, Audacity, etc.
    The publishing stage is the conversion of documents into various output formats. The Pipeline for instance supports the following formats: Daisy 2.02 DTBs, DAISY/NISO Z39.86 2005 DTBs, Open eBook (OPS 2.0, OCF), XHTML, RTF, LaTeX
    The maintenance involves manipulation of existing content such as audio encoding, dtbook repair, format migration (e.g. from DAISY 2.02 to DAISY 3).
  • Minimize overlap and duplication
    The collaborative nature of the project aims at reducing the duplication of effort and ensuring maximum sharing of best practices amongst the community.
  • Enable creation of common, reusable components
    Organization shall be able to reuse the Pipeline functionality in multiple deployment scenarios. Organizations are free to contribute components in both for-profit and not-for-profit contexts.

Core Concepts

The three concepts of Transformer, Script and Job are central to understanding how the DAISY Pipeline works.

Transformers

Transformers represent the atomic units of functionality in the DAISY Pipeline. Example of such piece of functionality are validation, MP3 encoding, TTS synthesis, dtbook repair, etc.

Transformers are not directly exposed to the end user but are meant to be chained in a Script.

The set of transformers provided out-of-the-box in the Pipeline are stored in the transformers directory of the Pipeline home.

Transformer Description File

Concretely, transformers are defined by an XML document - the Transformer Description File (usually called transformer.tdf) - and a set of implementation files (Java, XSLT, etc).

The TDF contains the following information:

  • the unique name of this transformer
  • some meta information (author, license, description, documentation link)
  • the name of the Java class responsible for the execution of the transformer
  • the list of the typed parameters expected by this transformer
  • an optional list of Java libraries bundled in this transformer

See the TDF documentation for further details on the TDF syntax

Java Transformer Execution Class

Most of the transformers are developed in Java, providing at least a custom transformer execution class. However, the Pipeline framework comes with a set of generic-purpose transformer classes to develop new transformers without writing a single line of Java code. These generic transformer classes can be used for instance to run an XSLT or an executable (python script, exe file).

The existing set of transformers makes the best samples collection for a new transformer developer. Look for instance at the dk_dbb_dtbook2rtf transformer for an example of a pure XSLT transformer.

Script

Scripts are XML documents representing a sequence of steps (i.e. a chain of transformers), each step performing an operation on the input data or the result of the previous step. When a user wants to run a new Pipeline Job, he first selects a script and then provide values for its parameters.

All the transformers listed in a script must be provided with their required parameters. These parameters may be either hardcoded in the script file, or exposed themselves as script parameters. In this latter case, the parameters will need to be provided by the user when he wants to make a new Pipeline Job out of this script.

The DAISY Pipeline ships with a series of predefined scripts in the scripts directory of the Pipeline home. In addition, one can create its own scripts with refinements or variations on the predefined scripts, or to expose a totally new chain of transformers to the end user.

See the Script grammar documentation for further details on the script files syntax.

Job

A job is basically the association of a script and a set of parameters value. When a user of the DAISY Pipeline (whether it is a human or a web service client) wants to use the Pipeline to transform some input data, he first select the script, then configure the parameters (input file, output directory, optional behavior, etc). This makes a new job ready to be run in the Pipeline framework.

Software Components

The core of the DAISY Pipeline project is made up of three main parts: the Pipeline runtime framework, the utility library and the transformers.

Runtime Framework

The DAISY Pipeline runtime framework is the fundamental glue code that is able to parse scripts, launch transformers, and execute jobs. It is made of the Java classes in the org.daisy.pipeline package and sub-packages.

Utility Library

The DAISY Utility Library is a set of Java classes that provides utilities for a wide range of functionality such as file set manipulation, XSLT transformation, localization, IO operations, XML validation, audio manipulation, etc. It is made of the Java classes in the org.daisy.util package and sub-packages.

Transformer Set

The transformer set is made of all the individual transformer components developed by the Pipeline contributors, that are made available to be used and referenced in Pipeline scripts. All the transformers should be located in the transformers directory of the Pipeline home.

Deployment Options

The current Pipeline core functionality is available in several flavors, depending on the deployment requirements.

Command Line Interface

The Command Line Interface is the most minimalist Pipeline distribution. It is basically deployed as a compressed archive (ZIP), and allows to run the Pipeline from a command line environment (the MS console, a shell terminal, called from a shell script, etc).

See the Pipeline CLI page for more information.

Desktop Application: Pipeline GUI

The Pipeline GUI is a standalone desktop application. It is usually deployed via an installer. It allows the end user to use a rich graphical user interface to create Pipeline jobs, execute it, and track the execution progress and messages.

See the Pipeline GUI page for more information.

Embedded GUI: Pipeline Lite

The Pipeline Lite is a minimalist GUI for the Pipeline functionality made of a set of dialogs, from a simple progress dialog to a dynamic job configuration dialog. It aims at being embedded in third-party software that want to include some Pipeline functionality (e.g. MS DAISY Translator Word Add-In or DAISY's own Obi).

See the Pipeline Lite page for more information.

Web Application: PipeOnline

The PipeOnline is a web application for creating and executing Pipeline jobs over the wire. The PipeOnline intends to be a robust database-backed application, with built-in execution queues, email notification, and persistence of usage statistics.

See the PipeOnline page for more information.

Remote Component: Pipeline WS

The Pipeline WS is a web service layer on top of the Pipeline functionality. It will allow to drive the Pipeline remotely in a platform-independent manner. This is typically interesting for organizations that want to run an online Pipeline service or whose home environment is not directly compatible with the regular Pipeline Java API (e.g. in a MS .Net environment).

See the Pipeline WS page for more information.