p2p.wrox.com Forums

p2p.wrox.com Forums (http://p2p.wrox.com/index.php)
-   XSLT (http://p2p.wrox.com/forumdisplay.php?f=86)
-   -   Testing XML if it is wellformed (http://p2p.wrox.com/showthread.php?t=46848)

rjonk August 23rd, 2006 04:11 AM

Testing XML if it is wellformed
I've been searching the book XSLT2 ans Path2 for a way how I can test external xml documents if it is well formed or not.

My question is two ways
What is the easiest way to test if an xml document is well formed and/or well balanced before I want to process the document with my xslt. Does the saxon processor provides a way to test well formed xml for instance in a .bat file before calling an xslt template.
Is it possible to test nodes in an external document with the document() or doc()function without having an input xml document. Or do I always have to use a dummy xml as input if I only want to test a node in an external document with the document function.

joefawcett August 23rd, 2006 04:17 AM

Well technically all XML is well formed, otherwise it's not XML.
Most, if not all, parsers have the ability to load a document. At this stage they can tell you if it's an XML document, and usually if not why not. In MSXML for example:

var bLoaded = XmlDoc.load(<path to doc>;
if (bLoaded)
  //document is XML
  //document is not XML, check XmlDoc.parseError for further details.

Each parser does it slightly differently so you'll have to state which one you intend to use if you need further help and can't find the documentation.


Joe (Microsoft MVP - XML)

mhkay August 23rd, 2006 04:21 AM

I think the adjective "well-formed" is rather unfortunate, because talking about "testing XML to see if it is well-formed" suggests that there can be XML that isn't well-formed. In fact all XML is well-formed, if it's not well-formed then it isn't XML.

The way to test whether an input file contains well-formed XML is to try parsing it with an XML parser, and if you get an error then it isn't.

XSLT and XPath can only process well-formed XML. If the input isn't well-formed you'll get an error from the XML parser long before the XSLT or XPath processor kicks in.

Michael Kay
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference

rjonk August 24th, 2006 07:13 AM

Thanx for the info. I was trying to do it with the saxon processor inside an xslt document. with the document() function.
I want to use the saxon 2.0 processor and have to test xml against a specific standard and generate a findings report of it. I don't know what the content of the .xml document is, I only know what node should be in it and want to test if it is a wellformed document and is including the node I want to test, before I even start the report.xsl.

Does this mean I first have to parse/load the xml document (even if it is a big one) in Java before I can call the xslt template? or can i call some java command inside the xslt?

mhkay August 24th, 2006 10:49 AM

With Saxon (or any JAXP processor) you can define a URIResolver, which is a simple Java class that is called to process any URI passed to the document() function. In your URIResolver you can call the XML parser, and trap any error that it reports. If the document is OK you can return it, if it's not OK you can return a dummy document containing the error information. You don't have to parse the document twice, because you can return the parsed document tree as the response from the URIResolver if parsing is successful.

Michael Kay
Author, XSLT Programmer's Reference and XPath 2.0 Programmer's Reference

rjonk August 25th, 2006 12:53 AM

Thanx that is a great solutions, I will try that.

The extra options for saxon8: -it and -im, are also of some use:
I can call the saxon8.jar with an option -it and than I don't need a source for my stylesheet and use the document function in my stylesheet. I can also use the -im option to start my transformation in a specific mode.

thanx again

All times are GMT -4. The time now is 04:32 AM.

Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
© 2013 John Wiley & Sons, Inc.