Last Updated:

XSLT : Conversion to XML, HTML, XHTML, RTF

Inside XSLT, S. Holsner

For example, suppose your company's Web site uses Commerce One's XML-based software, which uses the Java Message Service (JMS) to communicate securely over the Internet. Your operation has been so successful that you have just absorbed your competitor. Unfortunately, for their site on the Internet, your former competitor uses another XML-based product, RosettaNet. How do you now convert an xCBL Commerce One purchase order written in XML into a RosettaNet purchase order, also written in XML but in a completely different dialect?

Of course, apply XSLT. These kinds of XML-to-XML transformations are becoming more and more common. More and more companies are using JMS for secure communication over the Internet, and since JMS runs in Java, it makes sense to associate JMS with Java-based XSLT processors such as Xalan or Saxon.

The main task of XSLT is not just to replace one element with another, but to completely reorganize the contents of an XML document. For example, you might want to reorganize planets.xml in terms of planetary density using XSLT to create a new XML document:

<?xml version="1.0" encoding="UTF-8"?>

<DATA>
<DENSITY>
<VALUE>.983</VALUE>
<NAME>Mercury</NAME>
<MASS>.0553</MASS>
<DAY>58.65</DAY>
<RADIUS>1516</RADIUS>
</DENSITY>

<DENSITY>
<VALUE>.943</VALUE>
<NAME>Venus</NAME>
<MASS>.815</MASS>
<DAY>116.75</DAY>
<RADIUS>3716</RADIUS>
</DENSITY>

<DENSITY>
<VALUE>1</VALUE>
<NAME>Earth</NAME>
<MASS>1</MASS>
<DAY>1</DAY>
<RADIUS>2107</RADIUS>
</DENSITY>

</DATA>

We'll look at a transformation that completely changes the contents of planets. xml, leaving only a small HTML code and JavaScript code to display a few buttons in the browser.

So far, we've only created new elements by using literal result elements, that is, treating new elements as text and embedding them in a style sheet. But, as we'll see in this chapter, it's not always possible to know the names of the new elements that are being created. You can join together the elements you create as you go by treating them as raw text, but this is a clear flaw because the markup is treated as text. In this chapter, we'll start using XSLT elements <xsl:element>, <xsl:attribute>, <xsl:processing-instruction>, and <xsl:comment> to create new elements, attributes, processing instructions, and comments at run time. A good knowledge of these elements is required when reorganizing XML content.

We'll also look at using XSLT modes to perform multiple transformations on a document and how to apply only one of several suitable patterns.

Most of this chapter explores the capabilities of the <xsl:output> element, a brief overview of which I will begin.

The <xsl:output element>

With the element <xsl:output>, the element we first encountered in Chapter 2 and used it mainly to specify the type of the resulting document. This type can specify, for example, whether the XSLT processor will write an XML processing instruction <?xml version="1.0"?>, at the beginning of the document, and specify the MIME type (such as "text/xml" or "text/html") of documents sent by the XSLT processor from the web server to the browser. In addition, if you set the output type to HTML, most XSLT processors will be able to recognize that not all HTML elements require closing or opening tags, and so on.

The following list describes the attributes <xsl:output>:

  • cdata-section-elements (optional). Specifies the names of the elements whose contents should be displayed as CDATA sections. Accepts the values of a QName list separated by delimiter characters;

  • doctype-public (optional). Specifies the public ID that will be used in the <! DOCTYPE> in the output. Set to a string value;

  • doctype-system (optional). Specifies the system ID that will be used in the <! DOCTYPE> in the output. Set to a string value;

  • encoding (optional). Specifies the character encoding. Set to a string value;

  • indent (optional). Determines whether the output document is aligned with the nesting structure. Set to yes or no;

  • media-type (optional). Specifies the MIME type of the output. Set to a string value;

  • method (optional). Specifies the output format. Accepts "xml", "html", "text" or a valid QName;

  • omit-xml-declaration (optional) Specifies whether an XML declaration will be included in the output. Set to "yes" or "no";

  • standalone (optional). Determines whether a single XML declaration will be included in the output, and if so, sets its value. Set to yes or no;

  • version (optional). Specifies the output version. Accepts a valid NMToken value.

The method attribute is most commonly used because it determines the type of output tree that is required. Officially, the default output method of the output method is HTML, provided that all three of the following conditions are true:

  • The root node of the resulting tree has a child element.

  • The result tree document element name contains the "html" part (in any uppercase and lowercase combination) and an empty namespace URI.

  • all text nodes before the first child element of the root node contain only delimited characters.

If all three of these conditions are true, the default output method is set to HTML. Otherwise, the default output method is XML.

However, do not rely on the default output method settings, it is better to explicitly set this attribute to a value. The three common values of the method attribute are "html", "xml", and "text", and we'll look at them in the following sections.

Output method: HTML

For the HTML output method, the HTML output method, the XSLT processor must take certain actions. For example, for this method, the version attribute specifies the version of the HTML. The default value is 4.0.

This method should not add a trailing tag for empty items. (For HTML 4.0, the empty elements are <AREA>, <BASE>, <BASEFONT>,<BR>, <COL>, <FRAME>, <HR>, <IMG>, <INPUT>, <ISINDEX>, <LINK>, <META> and <PARAM>.) The HTML output method must recognize the names of HTML elements regardless of case.

According to the W3C, the HTML output method should not hide the contents of <SCRIPT> or <STYLE> elements. For example, the following element of the literal result:

<SCRIPT>
if (x < y)
</SCRIPT>
or the following, using the CDATA section:
<SCRIPT>
<![CDATA[if (x < y) ]]>
</SCRIPT>
should be converted to:
<SCRIPT>
if (x < y)
</SCRIPT>

The HTML output method should also not suppress < characters found in attribute values.

When setting the output method to HTML, the processor can take into account the alignment attribute. If this attribute is set to yes, the XSLT processor can add (or remove) delimiter characters to align the resulting document, as this does not affect the display of the document in the browser. For the HTML output method, the default value is "yes".

As you might expect, the HTML output method completes processing instructions using > rather than ?>, and also supports individual attributes, just like HTML. For example, the

<TD NOWRAP="NOWRAP">will be converted to:<TD NOWRAP>

For this method, you can set a media-type attribute whose default value is "text/html". The HTML method should not remove the character & that appears in the attribute value if it is immediately followed by a curly brace. The encoding attribute specifies the encoding to use. If a <HEAD> element is present, this output method should add a <META element> immediately after the <HEAD> tag, defining the character encoding:

<HEAD>
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
.
.
.

Using the doctype-public or doctype-system attributes, you can display a document type declaration just before the first element, as we'll see when converting XML to XHTML.

These are the rules for HTML output. The following is an example of a conversion from XML to HTML with slight deviations. In this case, the style sheet will actually generate JavaScript code from XML, demonstrating how to create JavaScript using XSLT. Specifically, we'll read planets.xml and create a new HTML document displaying three buttons – one for each of the three planets in planets.xml. When you click on the button on the page, the mass of the corresponding planet will be displayed.

All we need (Listing 6.1) are two elements <xsl:for-each>: one to loop through the three planets and create HTML for each button; and one to traverse the planets and create JavaScript for each function. I'll use the name of the planets as the names of the JavaScript functions; when called, the function will output the mass of the corresponding planet. Note that to generate the javascript code you only need to use the element <xsl:value-of> to get the names and masses of the planets. I'll also use two new XSLT elements, <xsl:element> and <xsl:attribute-set>, which we'll look at later in this chapter, to create a new element and set an attribute set for it.

Listing 1. Converting to JavaScript

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>

<xsl:template match="/PLANETS">
<HTML>

<HEAD>
<TITLE>
The Mass Page
</TITLE>

<SCRIPT LANGUAGE='javascript'>

<xsl:for-each select="PLANET">
<xsl:text>
function </xsl:text><xsl:value-of select="NAME"/><xsl:text>()
{
display.innerHTML = 'The mass of </xsl:text>
<xsl:value-of select="NAME"/>
<xsl:text> equals </xsl:text>
<xsl:value-of select="MASS"/>
<xsl:text> Earth masses.'</xsl:text>
}
</xsl:for-each>
</SCRIPT>
</HEAD>

<BODY>
<CENTER>
<H1>The Mass Page</H1>
</CENTER>
<xsl:for-each select="PLANET">
<P/>
<xsl:element name="input" use-attribute-sets="attribs"/>
</xsl:for-each>
<P/>
<P/>
<DIV ID='display'></DIV>
</BODY>

</HTML>
</xsl:template>

<xsl:attribute-set name="attribs">
<xsl:attribute name="type">BUTTON</xsl:attribute>
<xsl:attribute name="value"><xsl:value-of select="NAME"/></xsl:attribute>
<xsl:attribute name="onclick"><xsl:value-of select="NAME"/>()</xsl:attribute>
</xsl:attribute-set> </xsl:stylesheet>

The result, including the <SCRIPT element> for the new JavaScript code (Listing 2).

Listing 2. Resulting JavaScript document

<HTML>
<HEAD>
<TITLE>
The Mass Page
</TITLE>

<SCRIPT LANGUAGE="javascript">
function Mercury()
{
display.innerHTML =
'The mass of Mercury equals .0553 Earth masses.'
}

function Venus()
{
display.innerHTML = 'The mass of Venus equals .815 Earth masses.
' }
function Earth()
{
display.innerHTML = 'The mass of Earth equals 1 Earth masses.'
}
</SCRIPT>
</HEAD>
<BODY>
<CENTER>
<H1>The Mass Page</H1>
</CENTER>

<P></P>
<input type="BUTTON" value="Mercury" onclick="Mercury()">
<P></P>
<input type="BUTTON" value="Venus" onclick="Venus()">
<P></P>
<input type="BUTTON" value="Earth" onclick="Earth()">
<P></P>
<P></P>
<DIV ID="display"></DIV>
</BODY>
</HTML>

As you can see, I used XSLT to write JavaScript code to loop through the planets.. When you click on the button, the mass of the corresponding planet is displayed.