well balanced xml

aowss · January 24th, 2006, 02:50 AM

Hello,

I read in Kay's great book that the input doc needs not be well-formed ( chapter 2 ).

When I try to transform an xml file that does not have an enclosing tag with saxon ( latest open source version ) I get an error saying that the input document is not well-formed.

Am I missing something ?

Thank you very much.

mhkay · January 24th, 2006, 05:15 AM

The XSLT processor can cope with trees in which the document node owns several elements or none - but the XML parser will never generate such a tree, because the input to the XML parser needs to be well-formed. You might get a well-balanced but not well-formed tree, for example, as the output of another XSLT transformation.

If you want to parse a source file that's well-balanced but not well-formed, you can either wrap a start and end tag around the content before parsing it, or you can reference it from a skeleton document as an external entity:

a.xml

<a/><b/><c/><d/>

doc.xml

<!DOCTYPE [
<!ENTITY a SYSTEM "a.xml">
]>
<doc>&a;</doc>

and then use doc.xml as the input to your transformation

Michael Kay
http://www.saxonica.com/
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference

aowss · January 25th, 2006, 12:43 AM

Thank you very much for your quick reply.

If I understand properly, this means that to have the xslt processor transform a well balanced document, you need to bypass the xml parser.

Question 1: As the xml & xslt object models are different, shouldn't the front end parser to the xslt processor be a modified xml parser that copes with the differences in the models and therefore accepts a well balanced document ?

Question 2: Would it work with the following scenario:

1. input is a csv file
2. a piece of code parses the file and generates SAX events but does not generate SAX events corresponding to an encloding tag
3. xslt processor transforms with SAX events as input.

Question 3: chaining xslt transforms

You say 'You might get a well-balanced but not well-formed tree, for example, as the output of another XSLT transformation'.

Does this mean that if the transformations are chained, the second one will cope with a well balanced tree.

- If yes, doesn't this mean that the output of the first transform is an xslt model-based tree that is fed straight to the transformation engine.

- If yes, I understand that what prevents us from transforming a well balanced document is the ability to built an xslt model-based tree out of a document.

Sorry for all these questions.
Thanks for helping me clarify this.

mhkay · January 25th, 2006, 04:35 AM

The way this is defined in the 2.0 specs (which have a much more detailed and more formal treatment than the 1.0 specs), the input to the XSLT processor is a node in an XDM document (where XDM is the name of the XSLT/XPath/XQuery data model). There are defined mappings to XDM from an InfoSet (the output of an XML parser) and from a PSVI (the output of an XML Schema processor). Neither of those two mappings will ever produce a non-well-formed tree; but the specs allow for other means of constructing an XDM document. And in particular, since the output of an XSLT transformation is an XDM document that needn't be well-formed, you would expect that in a chain of transformations the intermediate results would have this characteristic. The same is true of temporary trees constructed (in variables) within a stylesheet.

Beyond this, you're asking about products rather than W3C specifications. Yes, many XSLT processors in the Java world will probably accept a non-well-formed stream of SAX events as input; but there's nothing in any spec that says they must.

Michael Kay
http://www.saxonica.com/
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference