Simple Pretty Printing with XSLT

By Eric Larson
July 12, 2008 | Comments: 7

The other day a coworker asked about a simple way to format an XML file. Basically, he just wanted to get some nicely indented output to look at. At first he try get things working with Eclipse's XML editor. From what I undersand it worked, it is a pretty heavy weight solution to get some indented XML. Being the resident XML guru (which is a pretty scary thought!), I wrote a quick XSLT to help him out.

The idea is really basic:

  1. Take an identity function
  2. Strip the white space
  3. Indent the output

If you're not familiar with XSLT, the above might not make much sense, so let's review it.

In XSLT there is a pattern known as an Identity Template. It pretty much is used for copying some XML from the source document verbatim. This is a great way to start working with XSLT because it allows you a simple way to change a lot of output with a small amount of code. Here is a simple identity template:

<xsl:template match="node() | @*">
  <xsl:copy>
    <xsl:apply-templates select="node() | @*" />
  </xsl:copy>
</xsl:template>

This is just one example. There are many different ways you can write an identity function depending on your needs. I wrote the above leaving out the text nodes (text()) because my goal was to make a nicely indented version of some XML. If I wanted to add a position attribute to each element in the document, it might look like this:

<xsl:template match="node() | @*">
  <xsl:copy>
    <!-- add a position attribute -->
    <xsl:attribute name="position">
      <xsl:value-of select="position()" />
    </xsl:attribute>
    <xsl:apply-templates select="node() | @*" />
  </xsl:copy>
</xsl:template>

When you consider I only added three lines to add a contextual attribute to every element in the document, it becomes pretty clear how powerful this pattern really is.

The next two steps I mentioned are both really simply features of XSLT dealing with white space. White space in XML is actually a node. For example:

<some>
  <value>Hello World</value>
</some>

There are text nodes before and after the value element. The vast majority of time this kind of white space is unimportant, so it is usually pretty safe to ignore it in processing. But, when it does matter, it is good to know you have the ability to access and manipulate it as needed. It can also be helpful in debugging XSLT. If I were to select the children of the "some" element, I would have three nodes, the two text nodes and the value node. It is much easier to debug something like XSLT when you fully understand how it recognizes all the markup.

When you are processing XML with XSLT you can declare that you want to strip out the text nodes that do not have inherit value. I'd check the XSLT specification for more information on what exactly is assumed to be in important white space. My understanding is that it works (more or less) like a trim function, the white space characters before and after a set of none-white space characters.

Stripping the white space in XSLT is performed with the "strip-space" element.

<xsl:strip-space elements="*" />

As you can see, "elements" attribute allows you to specify the elements you want to strip space on. In the above example, I simply selected all elements.

Finally, in order to make the output look pretty, I want to indent it. This is often referred to as "pretty printing" in DOM implementations. Again, XSLT has a simple means of doing this via the "xsl:output" element:

<xsl:output method="xml" indent="yes" />

The "xsl:output" element is a very powerful aspect of XSLT. In addition to allowing different output "methods" such as XML, HTML and text, you can specify the media type, encoding and a wealth of other details.

In the end, here is my simple pretty printing XSLT:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:strip-space elements="*" />
  <xsl:output method="xml" indent="yes" />

  <xsl:template match="node() | @*">
    <xsl:copy>
      <xsl:apply-templates select="node() | @*" />
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

I'm sure my small pretty printer has some flaws, but I think it is OK. XML, as a technology, is rather full featured and complex. There are a wealth of tools that support almost every minute detail. That doesn't mean you have to use them all! While there are plenty of people that write XSLT every day and all day, there are just as many folks who could benefit from copying an identity template and making a few changes. Hopefully, the next time you have to do some mundane XML clean ups, the simple patterns I used above can help you knock it out and get on with life.


You might also be interested in:


7 Comments

Using <xsl:strip-space> is really useful, but you must remember to use <xsl:preserve-space> to retain all the whitespace-only text nodes within mixed content. For example, if you were pretty-printing XHTML you'd use:

<xsl:preserve-space elements="h:p h:span h:em h:strong ..." />

If you don't do that, you might accidentally strip out whitespace that changes the meaning of the document. For example, if you have:

<p>a really <em>silly</em> <strong>example</strong>

then without preserving the whitespace in <p> elements, you'll get:

<p>a really <em>silly</em><strong>example</strong>

which won't look very good.

Jeni

Two small issues:

- you don't need the identity template, just using xsl:copy-of is sufficient (the serializer which adds the indentation whitespace operates on the result tree after the XSLT has created it, so it doesn't matter how you create the result tree)

- the identity template processes text() nodes because the node() pattern matches them.

Also, if you did want to add a position attribute to elements, you might find the XSLT 2.0 identity transform more intuitive:


<xsl:template match="element()">
<xsl:copy>
<xsl:apply-templates select="@*, node()"/>
</xsl:copy>
</xsl:template>

<xsl:template match="attribute()|text()|comment()|processing-instruction()">
<xsl:copy/>
</xsl:template>

The (slight) problem with the 1.0 version is when an attribute is matched you copy it to the result, but then try and add a position attribute and then apply-templates to it, all of which gets ignored.

In the 2.0 version, you can modify just the element() matching template so no instructions need to be ignored.

(personally I still use the 1.0 identity template even when writing 2.0 transforms as I've written it a million times over, but for anyone struggling with learning the identity transform for the first time the 2.0 version could be more intuitive.

@Andrew

This is true. I used to write my identity transform using a xsl:copy-of, but I was informed strangeness could occur due to the context (if I remember it correctly). With that said, I was unclear as to what the problems could be and if it were processor specific, so I have been using the above method (more or less).

@Jeni

Very true! Thanks for raising the issue and bring up preserve-space.

Btw, thanks to both of you for commenting and sharing your knowledge on the subject. My goal was to provide a bare bones example to get someone 80% of the way. Your notes and suggestions help to get that last 20%, so thanks!

hi eric,

wondering if the stylesheet is now corrected to take the 2 comments that you acknowledged ? it would help if you could put a command that could run in windows(where most NOOB developers reside).

Also, if someone wants to do a 1 line command line : try downloading this tool : http://xmlstar.sourceforge.net/ which works on windows as well!

then run the following command >"xmls fo > "

BTW, this tool helps in validation of a document as well. Not sure if this is the best tool BUT it works for many cases.

Thank you,

BR,
~A

hi there,

my angle brackets truncated things.

xmlstarlet :

to format : run xmls fo xml_file xml_formatted.xml

to validate run xmls val xml_file



BR,

~A

@ arjan

I haven't included the notes from comments simply because the concepts were somewhat optional in that different situations have different constraints. In other words, if the above works for you, great! If things don't work, then depending on what you want to do, the above comments might be helpful.

As for running the above template, I suggest trying out Amara for Python. It comes with a 4xslt command that will allow running a transformation:

4xslt source.xml pretty_print.xslt

It should work wherever Python works. The only caveat is that sometimes on windows you need to update your path to include the Python script directory.

There are other options as well, but that is what I use.

Thank you! This solved my problem!

Popular Topics

Archives

Or, visit our complete archives.

Recommended for You

Got a Question?