why XSL-T support in the browser is a failure

By Bryan Rasmussen
June 22, 2008 | Comments: 12

Julian Reschke sent out a mail to the XSL-List going over current evaluations of XSL-T in the browser.


His argument is that XSL-T in the browser is nearly doable at this point, that is to say that you server your data as XML with a processing instruction to tell the browser how to tell the browser what XSL-T to use to transform the XSL-T, I think it won't get done, ever by anyone other than as an
interesting example.

Why do I say this? Well there are lots of reasons not to do client side transformations, off the top of my head I can think of the following:


  1. You can't control the transformation engine, you will have to use the transformation implemented in the particular browser, this means that you will need to accept limitations on implementations of extension functions, although for these kinds of problems there are solutions such as David Carlisle's exsl:node-set hack.


  2. You will not be able to add processing power to the client to do handle complicated transformations the way you can on the server

  3. For security reasons access to the document function may be shut off. This can be controlled from your server side environment.



However these are technical reasons why XSL-T transformations in the browser courtesy of the processing instruction are still problematic, like the node-set function linked to above technical issues can be solved within limitations (the limitation of Firefox 2 raising an error with the node-set implementation discussed above is likely to limit the acceptance of the method), but there is a more killing limitation that means we will not have XSL-T transformations via processing instructions anytime soon, and that is that
to do it would mean cutting oneself off from meaningful indexing by google.


0f course Google will index unknown XML formats as text, as you can see on this search which is an attempt at crafting a search that will return one of Julian's example files.
But they will not at any time soon index the output of a transformation, which is what would need to be done for the transformation inside the browser to make sense.

As a demonstration of this examine the XML format being used in Julian's examples, a format for representing RFCs, here's a sample (careful, the embedded stylesheet causes Firefox to freeze for about 5 seconds while it is processing) In it you will see a lot of XREF elements, like so:

What's that? That's an element that will get evaluated and turned into an hyperlink to the appropriate RFC in the output. Now considering that Google determines the worth of documents by who links to them, but also partially by who they link to, in this document it can't understand that it links to a lot of other documents that are RFCs. This is problematic for determining search ranking of results in a query for RFC 2160. The only way for Google to handle this is of course to load the Stylesheet, do the transformation, and then indexing the output. But if Google does that they open themselves up to various Security hazards, not to mention more processing power etc.

Not to say that it can't be done, for an organization as rich in resources as Google is I suppose it would be relatively simple. So why not do it? Because Google needs to minimize the number of formats they MUST support, especially of dynamic formats where the content needs to be interpreted in order to be indexed. Think of how long it took for them to make some inroad on indexing Flash (as an example of flash indexing try this). The easiest way to prevent the need for indexing XML with XSL-T is to not do it until it becomes really wide-spread. Given the other benefits that server-side transformation offers I can't see client side transformation ever becoming anything but a curiosity due to the need to cut oneself out of the cycle of added vallue Google performs on all the data it indexes. .


You might also be interested in:


12 Comments

XSLT in the browser is a very viable way to reduce the amount of information exchanged between it and the server; in some cases, I've been able to reduce the amount of information by an order of magnitude. Of course you have to be willing to handle client- AND server-side translations, see http://www.informit.com/articles/article.aspx?p=677911 ... and it's not just a nice article, I have a working web site using this technology (and thousands of users amazed by its speed :)

Hi Ivan,

I agree that theoretically you can construct a site where you will be able to serve a lot of users using this technology because of the ability to cut down on the server side performance. However the solution of serving a single element XML and using a specialized XSL-T, as in your code here :

<?xml version="1.0" ?>
<?xml-stylesheet href="browserCheck.xsl" type="text/xsl"?>
<root />


is not what is usually meant as a use of the technology but more of a hack, a clever hack by which you get around the performance problems mentioned earlier but sacrifice the holy grail that people are always talking up when discussing XML in the browser, that the markup should be meaningful.

And from what I've seen by skimming through your article (will have to look deeper later, but currently behind on my own XForms article) it looks like the Google objection is even worse in this situation. How do you handle the need to provide indexable versions of the site? Server side redirection based on User Agent?

I absolutely hate doing that if I can at all avoid it. :)

ok I went and looked through the code a bit more, basically you're serving XML in browser only to IE. I wonder about the maintainability of that, but I guess if you're not having any problems currently then more power to you.

Cheers,
Bryan

@Bryan,

I couldn't disagree with you more on each and everyone of your points. I have an entire platform built on top of client-side XSLT and it works flawlessly on every major browser. And your comment regarding Google is only true when you attempt to use a 100% custom XML language. When you use atom feeds or XHTML or other web standards, Google is able to search the content just fine.

Sure, Google can index Atom or XHTML fine. That's because they have implemented indexing of those specific formats. That doesn't scale. By which I mean they are not going to implement indexing on all the formats that are out there. So you're right it's a problem if you use a custom language, it's also a problem if you use a standard language that isn't one of the standards that Google indexes with the meaning intact. Which is pretty much all of them.

Have they really implemented XHTML indexing, or have they just ported their HTML indexing? I actually suppose the latter. As for Atom and various versions of RSS indexing: Sure they implemented them but I interpret that as following my earlier statement:

Google needs to minimize the number of formats they MUST support, especially of dynamic formats where the content needs to be interpreted in order to be indexed.

The great thing about newsfeed standards for Google are that the documents do not have content that can be changed by the running of the document in the browser. If you suppose a document fragment like the following:


<script src="myAjaxScriptCreatesAllContent"></script>
<body>
<div class="contentholder"></div>
</body>

or like this

<?xml version="1.0" ?>
<?xml-stylesheet href="some.xsl" type="text/xsl"?>
<contentholder />

the main problem is that in order to truly index the content in those documents you need to interpret them.

Now think of the level of breakthrough the various newsfeed standards had to achieve to actually be indexed in a meaningful manner (except for Atom, that seems to be being indexed because Google and everyone has decided that is the format of the future)

So I will accept that the newsfeed standards on the browser are a success, but not 'XML in the browser' as it is generally understood (or perhaps just as I've generally understood it to be generally understood, to go all meta here), at any rate not the way that Julian seemed to expect that it would be used, with an example XML format that was what you would probably call a 100% custom one.

Anyway where you say you disagree with me on all my points, you certainly don't disagree that by using a client-side transformation you will be limited to using the transformation engines in the particular clients (funny JavaScript based implementations and possible flash ones aside), you just disagree that this is any sort of problem.


Bryan,

The thing most likely to ease the indexing of 'custom' XML issue will be the use of RDFa. If people publishing XML content on the web wish to have their content indexed, then it is not the format that is important as much as the format of the metadata. In addition, use of up-coming standards like the WAI's ARIA Roles, which not only act as landmarks for accessibility tools but can also be useful as transformation landmarks, or in this case indexing landmarks that help indexing tools identify where the links and the main content are.

As a lover of XSLT, you'll be hard-pressed to dent my enthusiasm for client-side XSLT because its use goes beyond simple document transformation, but will, in my opnion, become increasingly viable as a general client-side programming language.

Hi Phillip,

That's an interesting point about the easing the indexing of custom XML, I've been thinking the same thing applies if simple xlinks were used for defining linking, but it seems hardly anyone ever considers this when designing their custom formats.

@Bryan,

>> So you're right it's a problem if you use a custom language, it's also a problem if you use a standard language that isn't one of the standards that Google indexes with the meaning intact. Which is pretty much all of them.

So your argument is what? That we shouldn't have to be tethered by Googles "interpretation" of what is and is not a standard document format? Do you realize how silly that sounds?

Open your eyes and your brain, Bryan. The power that is client-side XSLT *IS NOT* the ability to transform any given XML syntax however you damn well please. Instead, it's the ability to utilize the strength and power of client-side content processing such that the server can focus on delivering raw *standards based* content that can then be transformed into whatever the end user wants it to be. You do recognize the fact that an *XML* parser is faster than an SGML parser by several magnitudes, right? And you do realize that offloading the cost of transforming raw data into HTML onto the client means the server is better able to handle more requests, right? Greater efficiency. A focus towards standards. The ability for each visitor to customize their viewing experience inline with their personal taste w/o causing the entire system to suffer as a result. And your argument is? That each XSLT processor has its quirks?

Dear God, Bryan: Seriously. Of all the client-side technologies available to the developer, you believe it's the various XSLT processors that present the biggest problems as far as standards compatibility? Look at libraries like Dojo or jQuery, both built by communities who have made it their life's passion to create a standard library that works across all four major browser w/o modification. They spend each second of each hour of each day fighting against the differences in each browser, and your attempting to argue that the real browser incompatibility problem is client-side XSLT implementations.

Sorry Bryan: You have no clue what you're talking about.

"So your argument is what? That we shouldn't have to be tethered by Googles "interpretation" of what is and is not a standard document format? Do you realize how silly that sounds?"

I'm sorry but it's not my argument that we should have to be, it's that we currently are (by we I mean the internet as a whole), It's been my experience that people want things indexed on Google and to score high, is that the only thing that factors into consideration, no. But it's a big one and considering all the other things that go against XSL-T in the browser it's a big problem.

"The power that is client-side XSLT *IS NOT* the ability to transform any given XML syntax however you damn well please."

ok, however the example given was a custom format that was being transformed any way you damn well please, and I recall from the early days the promise of XSL-T in the browser would be that we would have any given XML syntax however we damn well pleased, perhaps that was just implied by all those tutorials and examples with custom non-standardized XML in the browser and the XSL-T transforming them, if the implication should not have been made then I guess that's a mistake in marketing.

" Instead, it's the ability to utilize the strength and power of client-side content processing such that the server can focus on delivering raw *standards based* content that can then be transformed into whatever the end user wants it to be."

Ok, actually I think you're probably right in delivering a standardized format like Atom with XSL-T in it has some benefit.

"You do recognize the fact that an *XML* parser is faster than an SGML parser by several magnitudes, right?" And you do realize that offloading the cost of transforming raw data into HTML onto the client means the server is better able to handle more requests, right?"

Sure. However I do also realize that I can't improve the performance of the client machine, I can improve performance ON the client machine but not the performance OF the client machine.

"The ability for each visitor to customize their viewing experience inline with their personal taste w/o causing the entire system to suffer as a result. And your argument is? That each XSLT processor has its quirks?"

I don't think that sounds like a very fair generalization of what I said, but I think you're probably upset right now.

I understand why you would be upset because you like XSL-T an awful lot, so do I, and maybe you think I'm talking it down, I don't think I am.

But sometimes the environments in which technologies exist are such that certain aspects of those technologies will not have the success we would want for them. I would like XSL-T in the browser to have that success (describing XSL-T in the browser specifically as transformation of whatever XML is served via the processing instruction).

In fact your arguments earlier have convinced me that I should probably set up some feeds using this (I was thinking of some ways I could server Atom feeds with XSL-T from AppEngine last night), but I'm doing it because I don't really care about providing anything that people (meaning theoretically a large number of people) will use with the technology.

Maybe I'm wrong, maybe XSL-T used in this way will have a success beyond just a few folks here and there finding using it because they love XSL-T. I tend to be the overenthusiastic one, but I'm pessimistic on this. Maybe we should make a long bet. If I lose I would be pretty happy in doing so.

"Dear God, Bryan: Seriously. Of all the client-side technologies available to the developer, you believe it's the various XSLT processors that present the biggest problems as far as standards compatibility? Look at libraries like Dojo or jQuery, both built by communities who have made it their life's passion to create a standard library that works across all four major browser w/o modification. They spend each second of each hour of each day fighting against the differences in each browser, and your attempting to argue that the real browser incompatibility problem is client-side XSLT implementations."

No. what I mean on the matter is if I have a large number of browser incompatibilities to deal with, I'm not going to want to move to handling more browser incompatibilities, especially not for a technology that has the other problems mentioned (not indexed, that's a big problem for most people). I realize that XSL-T's browser incompatibilities pale in comparison to other technologies, but they are hardly insignificant.

Anyway, maybe I've been rude in the way I've addressed the subject, or haven't phrased myself well, because I think you're way more worked up on the issue than I would expect anyone to get, even though you have invested time and effort in making XSL-T in the browser a working reality it still seems like you got really really worked up on this. I certainly appreciate everything you have done for XSL-T, so, I apologize.


By the way, I don't know why none of my html tags, for styling, were coming through.

sorry about the nearly illegible nature of comments on this page. It makes preview bad too.

The biggest problem i've come across with trying to make applications transform xsl clientside is that opera has a major bug in it, whereby if you go to a xml page with javascript turned off - it'll fail processing when it hits a tag.

To me, this is a big problem. As a good xslt application probably uses ajax and on-the-fly transformations and at least some form of javascript. With all the browsers i've tested a basic application on, they all seem to do a pretty good job (even with on-the-fly transformations even if you have to fall back to using ajaxslt for some browsers).

Plus, of course, it's hard to detect the browser without using javascript - the only method I can think is to detect if they're on opera using php or something (ropey!), then only serve xslt without any reference to any javascript.

it's a real pain.. very frustrating. a part of me says that the 0.0001% of people using opera with JS disabled arent an issue - but it means that all they get is a "XSLT Processing error!" screen.

Almost all drawbacks can be solved by having an application support both client side and server side XSLT. You can either white-list supported user agents or black-list unsupported user agents. I prefer the former, and 98.8% of requests that result in HTML output are performed with client side XSLT. The remaining requests, e.g. from IE 5.5 and lower, user agents with significant bugs in their XSLT processors, search engine crawlers etc. are served run of the mill HTML.

The only issue I see is incorrect user agent detection by means of proxying, user agent spoofing etc. However, this issue is negligible in my opinion when white-listing of supported user agents is employed.

Regarding processing time on the client, this can mostly be overcome by writing efficient XSL and well canocalised XML. Although, any response that would require significant processing time on the client, can again be solved by doing the transformation on the server side. It is worth noting that almost all of my use cases show a significant improvement in total time between request and a rendered page, even on mobile phones and other low performance clients.

Mike Franklin's point is thus moot. As you said, the proportion of users in that case is negligible, and you would serve XHTML to these users anyway (by performing the XSLT on the server after detecting the user agent).

FTA, "0f course Google will index unknown XML formats as text [...]"

A candidate for server side XSLT. The second half of the article is therefore irrelevant.

Popular Topics

Archives

Or, visit our complete archives.

Recommended for You

Got a Question?