Localizing XSLT error messages for Schematron

By Rick Jelliffe
August 14, 2008 | Comments: 2

As part of the current upgrade to the Schematron skeleton code at schematron.com, I have been working on localization. An application is localized when it presents its interface using the language and cultural conventions of the user.

Now Schematron is already quite localized, because the assertion texts are entered by the schema developer not the Schematron skeleton programmer. So Schematron starts a big step ahead from the grammar-based validators with their hard-coded messages. And the messages generated by the underlying XSLT system will be localized (or not) depending on the implementation.

But I was surprised to read a review of Schematron and other schema languages which cited the lack of localization as an important reason to not use it, so the next release of the skeleton has localized messages. Here is the approach I took.

Where previously there was code such as this


<xsl:template match="iso:diagnostic" mode="check-diagnostics">
<xsl:if test="not(@id)">
<xsl:message>Markup Error: no id attribute in <diagnostic></xsl:message>
</xsl:if>
</xsl:template>

now we call a named template to provide the text, so that all messages are replaced by unique numbers.


<xsl:template match="iso:diagnostic" mode="check-diagnostics">
<xsl:if test="not(@id)">
<xsl:message><xsl:call-template name="outputLocalizedMessage" ><xsl:with-param name="number">9</xsl:with-param></xsl:call-template></xsl:message>
</xsl:if>
</xsl:template>


We put the numbers into a fragment as top-level foreign elements. XSLT allows these and doesn't do anything with them: they are a great way of systemizing complicated stylesheets: you can make up your own configuration language even. You can reference top-level foreign elements within the same stylesheet just using the document("") function in Xpath.

In this case, I just borrow some elements from XHTML for convenience.


<xhtml:div class="ErrorMessages">
<!-- Where the error message contains dynamic information, the message has been split into an "a" and a "b" section.
This has been done even when the English does not require it, in order to accomodate different language grammars
that might position the dynamic information differently.
-->
<xhtml:p id="sch-message-1">Schema error: Schematron elements in old and new namespaces found</xhtml:p>
<xhtml:p id="sch-message-2">Schema error: in the queryBinding attribute, use 'xslt'</xhtml:p>
<xhtml:p id="sch-message-3a">Fail: This implementation of ISO Schematron does not work with schemas using the query language /xhtml:p>
<xhtml:p id="sch-message-3b"/>
<xhtml:p id="sch-message-4a">Phase Error: no phase has been defined with name </xhtml:p>
<xhtml:p id="sch-message-4b" />

...
</xhtml:div>

We introduce a parameter to the stylesheet to say which language is desired:

  <xsl:param name="langCode">default

The outputLocalizedMessage named template is below.

	<!-- ===================================================== -->
	<!-- Localization 						                  -->
	<!-- ===================================================== -->
	<!--
		All messages generated by the skeleton during processing are localized.
		(This does not apply to the text that comes from Schematron schemas
		themselves, of course. Nor does it apply to messages in metastylesheets.)
		
		Stylesheets have a parameter $langCode which can be used to select the
		language code (e.g. from the command line)
		
		The default value of $langCode is "default". When this is used, the
		message text is taken from the strings below. We use XHTML, to provide
		the namespace. 
		
		If the $langCode is somethign else, then the XSLT engine will try to
		find a file called  sch-messages-$langCode.xhtml in the same directory
		as this stylesheet. Expect a fatal error if the file does not exist.
		
		The file should contain XHTML elements, with the text translated.
		The strings are located by using ids on each xhtml:p element.
		The ids are formed by sch-message-$number-$langCode such as  
		sch-message-1-en
		
		If there is no match in a localization file for a message, then the
		default will be used. This allows this XSLT to be developed with new
		messages added without requiring that any localization files be updated.
		
		In many cases, there are actually two localization strings per message.
		This happens whenever a message has an embedded value that is dynamically
		generated (using ). Having two strings, preceding and following,
		allows the translator to make idiomatic error messages. When there are
		two message for a single message, they have numbers like 30a and 30b: 
		translators should check the reference to them in the XSLT above to
		see what the dynamically generated information is.  
	-->
	<xsl:template name="outputLocalizedMessage">
		<xsl:param name="number" />  
		 
		<xsl:choose>
		   <xsl:when test="string-length( $langCode ) = 0 or $langCode = 'default'" > 	   	 
				<xsl:value-of select='document("")//xhtml:p[@id=concat("sch-message-", $number)]/text()' />
			</xsl:when>
			<xsl:otherwise>
				<xsl:variable name="localizationDocumentFilename" >
					<xsl:value-of select="concat('sch-messages-', $langCode, '.xhtml')" />
				<<xsl:variable>
				<xsl:variable name="theLocalizedMessage" >
					<xsl:value-of select=
				'document( $localizationDocumentFilename, /)//xhtml:p[@id=concat("sch-message-", $number, "-", $langCode)]/text()' />
				</xsl:variable>
				
				<xsl:choose>
					<!-- if we found any external message with that id, use it -->
					<xsl:when test=" string-length($theLocalizedMessage) > 0">
						<xsl:value-of select="$theLocalizedMessage" />
					</xsl:when>
					<xsl:otherwise>
						<!-- otherwise use the default strings -->		
						<xsl:value-of select='document("")//xhtml:p[@id=concat("sch-message-", $number)]/text()' />
					</xsl:otherwise>
				</xsl:choose>	
				 
			</xsl:otherwise>
		</xsl:choose>
	</xsl:template>

If anyone wants to contribute a localized version of the error messages in your own language, that would be great. Go to the Schematron.com website (in a few days time) and download the sch-messages-en.xhtml file, translate it, and send it back to me.


You might also be interested in:


2 Comments

The above solution only seems to take into account Schematron's built-in error messages. For our application, a code generator, this is not important (the Schematron files are statically known to be correct). It's the assertion messages themselves that need to be localized to a multitude of different languages. Because of the hassle, our code generator currently only presents error messages in English. For the future, we're looking at replacing the plain-English assertion messages in the Schematron files with textual identifiers that we can use to look up the localized error messages.

What I would like to see is a way to associate resource bundles with Schematron files -- one for English, one for French, etc. This is the approach taken by Java.

There is an annex about this in the standard (See www.schematron.com/spec.html) You use the diagnostics elements with different xml:lang attributes, and select the text you need from the SVRL. In the upcoming revision, it will be easier to bundle diagnostics into separate resource files, and I hope to have the skeleton support for this completed by end of the week.

The approach I am proposing for better inclusions, by the way, is this:

1) Include can go anywhere
2) If the include points to an element that is the same as the current parent, the contents of that are included (so if you have diagnostics/include that points to a diagnostics, the children of that diagnostics are included.)
3) Otherwise the element pointed to is included directly.

Popular Topics

Archives

Or, visit our complete archives.

Recommended for You

Got a Question?