New draft of ISO character entities from W3C

By Rick Jelliffe
July 27, 2008

I am pleased to see that the W3C MathML WG has produced a new draft XML Entity definitions for Characters. These are the latest and greatest mappings from the characters to Unicode. These are the characters you get in HTML or XHTML when you type ⌈ it should give you the character ⌈

There are three ways to use these entities:

  • Use a DTD and call in the sets as parameter entities
  • Use XSLT2 and the character map function
  • Use (draft) ISO DSRL (Martin Bryan's implementation in the Zip file as is a front end for XSLT2)

ISO/IEC JTC1 SC34 (the Document Processing and Description Languages committee) originally defined and owned these sets, which SGML-ers and DTD users can be familar with through entity sets such as isopub. SGML was designed for the publishing industry, and mathematical typesetting has always had a need for many special characters. The American Mathematical Society was strongly involved and I am glad to see it is continuing its involvement. SC34 handed over maintenance of the entity sets to W3C MathML a year or so ago.

I advise people who are still using my PEN entity set to move over to the W3C mappings for new documents. You can tell if you are using the PEN entities because the files will have a .pen extension: my set was I think the first XML mapping from entities to Unicode, though Unicode had already had the ISO sets as an input, and has not been kept up to date (by me, at least!) There have subsequently been several other mappings in common use, notably those used by HTML and those prepared by John Cowan and by OUP's Sebastian Rahtz.

They all have small differences. So professional publishers with maths probably will not want to change old files to use the new mappings (and risk the wrong symbols appearing): if you are using XML Catalogs, this should all be managable.

The trouble is that HTML had a couple of dubious mappings, and the MathML group has been very loath to change them just to fit in with high quality publishers' legacy files and existing production systems. However, the current changes seem reasonable: mainly they seem to be moving away from using the CJK versions of characters to the versions in the symbol blocks.

phi continues to be a problem, and will always have variable mappings. If you want a straight phi, use phis.

I hope this set takes us one step closer to having all these entity sets built into XML. This is the kind of thing that computers are good at: turning names into numbers.

You might also be interested in:

Popular Topics


Or, visit our complete archives.

Recommended for You

Got a Question?