Subject: Parsing the xml which is having
Posted By: vikkiefd Post Date: 7/26/2008 12:00:13 AM
Hi,

When i try to parse an xml file, its giving the following SAXParserException:

The entity name must immediately follow the '&' in the entity reference.

The xml is as below :

<Request>
 ...
 <OfficeName>Johnson & Johnson</OfficeName>
..
</Request>

This xml is generated from an external program. So while validating this xml, using my xsd i am getting the above mentioned exception.

The corresponding schema part is as below:

<xs:element name="OfficeName">
  <xs:simpleType>
    <xs:restriction base="xs:string">
            <xs:maxLength value="32"/>
    </xs:restriction>
   </xs:simpleType>
/xs:element>

Please help me to resolve this issue. Thanks in advance for your help.

Reply By: Alain COUTHURES Reply Date: 7/26/2008 12:57:01 AM
& has to be written as an entity : &amp; instead of &

Reply By: vikkiefd Reply Date: 7/26/2008 1:01:28 AM
quote:
Originally posted by Alain COUTHURES

& has to be written as an entity : &amp; instead of &





Hi,

As this input xml is generated from external program, I just have to do validation and not any modification to the incoming xml. So here how to overcome this exception?

Reply By: Alain COUTHURES Reply Date: 7/26/2008 1:03:16 AM
No way... This XML document is not valid !

Reply By: vikkiefd Reply Date: 7/26/2008 1:08:04 AM
quote:
Originally posted by Alain COUTHURES

No way... This XML document is not valid !





How are you saying the xml doc. as invalid.In real time, it is possible to receive the '&' symbol in xml, as many oraganization's name having that. So in these kind of situation, there should be some way to handle this.

Reply By: Alain COUTHURES Reply Date: 7/26/2008 1:18:57 AM
XML is a notation with principles : & has to be written &amp; or has to be protected within a CDATA section. If so, & can be present in text data.

The external program you are talking of is not good for representing & the way it has to be...

Reply By: vikkiefd Reply Date: 7/26/2008 1:25:36 AM
Ok..Here, <OfficeName> node contains the value "Johnson & Johnson", which is a text value. So in order to accept '&' also in value of that node, how I have to write my schema

Reply By: Alain COUTHURES Reply Date: 7/26/2008 1:33:13 AM
Since "Johnson & Johnson" is not written "Johnson &amp; Johnson" in the XML document, this document is not valid, it can't be parsed and, so, no schema can be checked.

The XML document has to be modified !!!

Reply By: mhkay Reply Date: 7/26/2008 3:40:54 AM
& in XML must always be written &amp;

You need to fix the program that is generating this. It's a great idea for programs to generate output in XML, because so much software can accept XML. But if they generate stuff that isn't XML, it rather destroys the point.

Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer's Reference
Reply By: mhkay Reply Date: 7/26/2008 3:43:45 AM
>As this input xml is generated

Please don't refer to it as XML. It isn't. You must either fix the program so it generates XML, or you must convert its non-XML output to XML. It would be much better to fix it, so you can share in the benefits that come from using XML for data interchange rather than using proprietary non-standard formats.

Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer's Reference
Reply By: vikkiefd Reply Date: 7/28/2008 12:53:46 AM
Hi Michael,

Thanks for your reply. So is there any way to specify CDATA datatype in schema. ie. I want to know is there any datatype to specify the particular element is CDATA (as like xsd:string etc.,). If its there then probably I can do change in the schema for that particular element.

Reply By: vikkiefd Reply Date: 7/28/2008 12:55:08 AM
Hi. Thx for your reply.

Reply By: mhkay Reply Date: 7/28/2008 3:56:13 AM
CDATA can be used anywhere, around any text in your document. The schema doesn't have to permit it. It's not a datatype, it's just an alternative way of escaping special characters: instead of

<a>x &amp; y</a>

you can write

<a><![CDATA[x & y]]></a>

As far as the schema is concerned, and as far as the receiving application is concerned, these should be 100% equivalent.

Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer's Reference
Reply By: vikkiefd Reply Date: 7/28/2008 5:03:42 AM
ok..So there is no way to specify in xsd, whether a particular xml element  can have '&','<' characters..Is it right?

Reply By: mhkay Reply Date: 7/28/2008 5:21:43 AM
XSD constrains the logical content of the element, not its escaped lexical form. You can ban "<" by writing a pattern facet in the schema, and this will ban all forms of expressing "<" in the input, for example

&lt;
<![CDATA<]]>
#x003C;
#60;

Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer's Reference

Go to topic 70062

Return to index page 2
Return to index page 1