By Bryan Rasmussen
June 22, 2008

In an earlier post I discussed that I did not foresee XSL-T in the browser via processing instruction ever working, due to the difficulties for Google to index it.

the main problem is always going to be that to understand what are links in a document you would need to either do one of the following:

  1. Look for absolute URIs to tell you what is being linked to(supposing that it will always be naively assumed that URIs are links as opposed to something like say, identifiers?)
  2. Refer to some external interpretable format to describe document semantics so that you can determine exactly what are links
  3. make your own interpreter for the specific format

Now using an XSL-T stylesheet to process the XML into an (X)HTML that is then indexed is an example of the second strategy. Another version of that strategy could be implemented using a much less powerful language than XSL-T however. XML Schema could be used to identify nodes in the document as only holding anyURI values. The drawback is that identifying something as an anyURI does not specify it will be used as a link. The other drawback is there are a lot of schemas out there were link elements should have been identified as being of type anyURI where they are still of type string.

