Closer to rendering: other parts of the food chain

By Rick Jelliffe
August 22, 2008

There is a kind of architectural battle going on for markup publishing systems. On the one hand is the old three-stage architecture, with high-level, specific markup transformed into formatting objects (such as XSL-FO) then transformed into low-level rendering instructions. (Think LaTex->TeX->DVI, but it is older than that.)

On the other hand you have a two step process, with markup languages that attempt to straddle the high-level specific and the format objects on one hand (e.g. HTML, ODF, OOXML) and smarter rendering systems which do more of the processing themselves (e.g. accepting SVG) and which take simpler and better structured formats (think of the move from postscript to Structured PDF, and of PDF/A and XPS).

The two-step model sends shivers down the spines of old-time SGML-ers, but despite its ascendancy the continuing importance of content management systems, web servers, and other disguised pipelines continues to provide a workable habitat for higher-level markup and transformation-based systems.

Two drafts of standards are available on the web which relate more to the rendering side, and show that there is still different trade-offs in figuring out who does what in the processing chain:

  • OASIS Unstructured Operation Markup Language (UOML) is a Chinese-lead effort which comes out of a format called SEP, which addressed similar issues to PostScript. The key design characteristic in UOML seems to be a desire for a small and simple set of primitives: you have elements for pages, matrix transforms and circles for example. Of particular interest is that UOML is positioned as a format for direct editing: you import your HTML or ODF or OOXML into your editor which works with the UOML natively. UOML provides a kind of specification for a minimally acceptable editing system (which is not to say trivial!). I don't see any provision for structured editing in it, and it is not concerned with issues such as white-space handling and hyphenation: I might have missed it, but I think it is not targeted at that point: in a sense this makes it a rather old-fashioned approach (like OOXML or linear ODF.)
  • ISO/IEC 24754:2008 Minimum requirements for specifying document rendering systems is a Japanese-lead effort which provides the minimum requirements to specify the features that a document rendering system which transforms formatting objects to rendering output. It may be used as a frame of reference... It enumerates features in the abstract only, and is designed to be useful for product and design evaluation: what a minimally acceptable rendering system has to cope with. (The SC34 WG2 team responsible has particular links to the printer industry.) The final version of the standard is available for purchase from ISO.

It strikes me that UOML would be in a much stronger position (and a better technology) if the UOML group audited it against the ISO minimum requirements standard. That would be proof that the UOML spec (still at draft stage) did not have gaps in functionality. (Or, if there were gaps, that these had good reasons for existing.)

The main use I see in a low-level language like UOML, which does not even provide any structuring or annotation or metadata capabilities (which at least structured PDF does) is that a UOML engine would simplify conversion from XSL-FO out (or rendering of HTML, ODF, OOXML, etc). I would imagine a UOML engine could be built on top of most existing typesetting engines readily, or even on top of SVG.

UOML (like UOF, ODF, OOXML, PDF, XPS, etc) seems like it might have been primarily influenced by particular products. Its main rival is PDF/X maybe, with XPS as a lurker in the page-description-in-XML area.

You might also be interested in:

Popular Topics


Or, visit our complete archives.

Recommended for You

Got a Question?