XSLT 2.0++: Streaming XML Transformations and Parallel XML Message Processing

By M. David Peterson
September 1, 2008

An interesting and welcome post from Dr. Michael Kay regarding the next revision of the XSLT specification:

Apart from getting a maintenance release out, my other immediate priority is to make some progress with the XSLT 2++ specification. The "theme" for the next release is streaming, and the Working Group (with a largely new membership) has been spending 18 months or so brainstorming about what's needed in the language to allow streaming transformations. Some of these ideas have already found their way, in experimental form, into Saxon 9.1 and are being well received by users. But the WG needs to switch now from brainstorming mode to design mode. We've got a meeting in Prague later this month that will hopefully achieve that transition, but to make this successful I need to prepare some concrete language proposals that we can use as a baseline. Fortunately (?) my consultancy diary is fairly light at the moment so I should be able to find some time for that.

Nice! Working with large data sets in XML can obviously be problematic when required to load the entire tree into memory *first* before beginning the transformation process, so providing the ability to incrementally parse and transform that data set on the fly will be a welcome addition.

But what I find even more interesting than working with large data sets is the ability to begin processing an XML message -- no matter its size -- upon receipt of the first angle bracket, providing the ability to quickly and easily begin routing messages to their appropriate handler without having to first wait for the entire message to be parsed and loaded into memory.

This is where the Schema-Aware functionality made available as part of the XSLT 2.0 specification really begins to shine: If I can validate each element, attribute, and the value of either as part of the transformation process as opposed to first validating the entire message before beginning that process, we've now entered an entirely new realm in which type-safety becomes an integral part of a dynamic process, spreading the processing load of a single message across all available processors.

Take the following simplified diagram for example:

XSLT Streaming Message Processing.png

If each "process" in the above diagram represents a side-effect free (AKA thread-safe) handler which can be scheduled to run on the next available processor, if you think of each handler as nothing more than an XSLT template in which a primary message router passes an in-memory pointer for further processing, you should hopefully see where I'm going with this.

And if you can see where I'm going with this and furthermore know a thing or two about XMPP, then it shouldn't be too much of a stretch to imagine just how powerful the combination of XSLT 2.0++ and XMPP can -- and most likely will -- be and therefore just how important both technologies will become as part of our multi-core processor-based world of the futurehere and now.

Now let's just hope it doesn't take the same 8 years between XSLT 1.0 and 2.0 to gain access to XSLT 2.0++. ;-) My guess? XQuery 1.0 and therefore XPath 2.0 have long since left the stable, so the XSLT 2.0++ working group doesn't have to sit around idle waiting for anyone else to catch up. This should be a good, positive thing for everyone involved, resulting in the next revision of the XSLT spec reaching both working draft(s) and final recommendation status in record time.

Or not. ;-) Time will tell, but in the mean time this certainly provides some interesting possibilities to begin thinking about.

---
NOTE: Sylvain Hellegouarch has been thinking about the streaming XML and XMPP space for quite some time now. Take a look at the previous link as well as his "Headstock" project for more detail.


You might also be interested in:


Popular Topics

Archives

Or, visit our complete archives.

Recommended for You

Got a Question?