As part of the current upgrade to the Schematron skeleton code at schematron.com, I have been working on localization. An application is localized when it presents its interface using the language and cultural conventions of the user.
Now Schematron is already quite localized, because the assertion texts are entered by the schema developer not the Schematron skeleton programmer. So Schematron starts a big step ahead from the grammar-based validators with their hard-coded messages. And the messages generated by the underlying XSLT system will be localized (or not) depending on the implementation.
But I was surprised to read a review of Schematron and other schema languages which cited the lack of localization as an important reason to not use it, so the next release of the skeleton has localized messages. Here is the approach I took.
Where previously there was code such as this
<xsl:template match="iso:diagnostic" mode="check-diagnostics">
<xsl:if test="not(@id)">
<xsl:message>Markup Error: no id attribute in <diagnostic></xsl:message>
</xsl:if>
</xsl:template>
now we call a named template to provide the text, so that all messages are replaced by unique numbers.
<xsl:template match="iso:diagnostic" mode="check-diagnostics">
<xsl:if test="not(@id)">
<xsl:message><xsl:call-template name="outputLocalizedMessage" ><xsl:with-param name="number">9</xsl:with-param></xsl:call-template></xsl:message>
</xsl:if>
</xsl:template>
We put the numbers into a fragment as top-level foreign elements. XSLT allows these and doesn't do anything with them: they are a great way of systemizing complicated stylesheets: you can make up your own configuration language even. You can reference top-level foreign elements within the same stylesheet just using the
document("") function in Xpath.
In this case, I just borrow some elements from XHTML for convenience.
<xhtml:div class="ErrorMessages">
<!-- Where the error message contains dynamic information, the message has been split into an "a" and a "b" section.
This has been done even when the English does not require it, in order to accomodate different language grammars
that might position the dynamic information differently.
-->
<xhtml:p id="sch-message-1">Schema error: Schematron elements in old and new namespaces found</xhtml:p>
<xhtml:p id="sch-message-2">Schema error: in the queryBinding attribute, use 'xslt'</xhtml:p>
<xhtml:p id="sch-message-3a">Fail: This implementation of ISO Schematron does not work with schemas using the query language /xhtml:p>
<xhtml:p id="sch-message-3b"/>
<xhtml:p id="sch-message-4a">Phase Error: no phase has been defined with name </xhtml:p>
<xhtml:p id="sch-message-4b" />...
</xhtml:div>
We introduce a parameter to the stylesheet to say which language is desired:
<xsl:param name="langCode">default
The outputLocalizedMessage named template is below.
<!-- ===================================================== --> <!-- Localization --> <!-- ===================================================== --> <!-- All messages generated by the skeleton during processing are localized. (This does not apply to the text that comes from Schematron schemas themselves, of course. Nor does it apply to messages in metastylesheets.) Stylesheets have a parameter $langCode which can be used to select the language code (e.g. from the command line) The default value of $langCode is "default". When this is used, the message text is taken from the strings below. We use XHTML, to provide the namespace. If the $langCode is somethign else, then the XSLT engine will try to find a file called sch-messages-$langCode.xhtml in the same directory as this stylesheet. Expect a fatal error if the file does not exist. The file should contain XHTML elements, with the text translated. The strings are located by using ids on each xhtml:p element. The ids are formed by sch-message-$number-$langCode such as sch-message-1-en If there is no match in a localization file for a message, then the default will be used. This allows this XSLT to be developed with new messages added without requiring that any localization files be updated. In many cases, there are actually two localization strings per message. This happens whenever a message has an embedded value that is dynamically generated (using). Having two strings, preceding and following, allows the translator to make idiomatic error messages. When there are two message for a single message, they have numbers like 30a and 30b: translators should check the reference to them in the XSLT above to see what the dynamically generated information is. --> <xsl:template name="outputLocalizedMessage"> <xsl:param name="number" /> <xsl:choose> <xsl:when test="string-length( $langCode ) = 0 or $langCode = 'default'" > <xsl:value-of select='document("")//xhtml:p[@id=concat("sch-message-", $number)]/text()' /> </xsl:when> <xsl:otherwise> <xsl:variable name="localizationDocumentFilename" > <xsl:value-of select="concat('sch-messages-', $langCode, '.xhtml')" /> <<xsl:variable> <xsl:variable name="theLocalizedMessage" > <xsl:value-of select= 'document( $localizationDocumentFilename, /)//xhtml:p[@id=concat("sch-message-", $number, "-", $langCode)]/text()' /> </xsl:variable> <xsl:choose> <!-- if we found any external message with that id, use it --> <xsl:when test=" string-length($theLocalizedMessage) > 0"> <xsl:value-of select="$theLocalizedMessage" /> </xsl:when> <xsl:otherwise> <!-- otherwise use the default strings --> <xsl:value-of select='document("")//xhtml:p[@id=concat("sch-message-", $number)]/text()' /> </xsl:otherwise> </xsl:choose> </xsl:otherwise> </xsl:choose> </xsl:template>
If anyone wants to contribute a localized version of the error messages in your own language, that would be great. Go to the Schematron.com website (in a few days time) and download the sch-messages-en.xhtml file, translate it, and send it back to me.


Print
Listen





By 

The above solution only seems to take into account Schematron's built-in error messages. For our application, a code generator, this is not important (the Schematron files are statically known to be correct). It's the assertion messages themselves that need to be localized to a multitude of different languages. Because of the hassle, our code generator currently only presents error messages in English. For the future, we're looking at replacing the plain-English assertion messages in the Schematron files with textual identifiers that we can use to look up the localized error messages.
What I would like to see is a way to associate resource bundles with Schematron files -- one for English, one for French, etc. This is the approach taken by Java.
There is an annex about this in the standard (See www.schematron.com/spec.html) You use the diagnostics elements with different xml:lang attributes, and select the text you need from the SVRL. In the upcoming revision, it will be easier to bundle diagnostics into separate resource files, and I hope to have the skeleton support for this completed by end of the week.
The approach I am proposing for better inclusions, by the way, is this:
1) Include can go anywhere
2) If the include points to an element that is the same as the current parent, the contents of that are included (so if you have diagnostics/include that points to a diagnostics, the children of that diagnostics are included.)
3) Otherwise the element pointed to is included directly.