XSLT replace for character entities

atulshin · November 1st, 2008, 02:09 AM

Hi all,

I have XML file that contains HTML tags and when I generate PDF file the HTML are shown as such e.g. <br /> tag is output as it is. Whereas I want a new line instead of tag. I checked my XML file it contained "br" tag in the form " <br />".

My XSL code works below for simple "br" tag but not for character entities.

      <xsl:template match="br" >
          <fo:block />
      </xsl:template>

How can I interpret character entities as HTML tags in XSLT?
I tried string replace but I think something else will work here.

shinyboy

mhkay · November 1st, 2008, 05:19 AM

>it contained "br" tag in the form " <br />".

No, it didn't contain a "br" tag. If it was a tag it would be written <br/>. If a "<" is written as "<", that is because the author doesn't want the character to be treated as part of a tag. If they don't want it treated as a tag, then why are you trying to treat it as one?

OK, that's harsh: this is a common situation (though in my view it is bad design). But it helps to be clear about the terminology. You have an XML text node whose contents contain a fragment of lexical unparsed HTML. If you want to process that HTML in a way that takes account of its structure then you first need to parse it. There are two ways to do that. You can try to parse it in XSLT code, but unless you're dealing with a very constrained subset of HTML that is going to be hard work. Or you can pass it to an HTML parser (or perhaps an XML parser if you know that it's actually XHTML). That will require use of extension functions - which might come from your vendor, like saxon:parse(), or which you might have to write yourself.

Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer's Reference