 |
| XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead. |
Welcome to the p2p.wrox.com Forums.
You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
|
|
|
|

November 26th, 2008, 12:44 PM
|
|
Authorized User
|
|
Join Date: Aug 2008
Posts: 17
Thanks: 1
Thanked 0 Times in 0 Posts
|
|
Hidden EOF in CDATA needs removal
Hello.
I've encountered an error in some of the xml transforms I work on. This error is "Unexpected end of file while parsing CDATA has occurred."
Essentially, within a <msg> tag is a CDATA string. Part of the transform is to remove the CDATA elements so we can store that string in the DB.
Example:
"<Msg><![CDATA[Message goes here.]]></Msg>"
NOTE: I have no control over the contents of these xml files, they are produced in a hardware element, developed by others.
Now, what is happening is that occasionally there will be invalid characters that the XML parser does not know how to handle. That is fine, I can deal with those. But what I can't deal with is essentially whatever message is copied over, into that CDATA tag, sometimes the EOF character is included. Which causes the error message above and even programs like NOTEPAD will stop displaying the contents of the xml file at that point, even though I am 100% certain that more content is supposed to follow.
I'm pretty certain there is no way to remove that during transform. So I was thinking of adding a for loop prior to the transform to find those extra EOF's and remove them...but I do not know what the EOF character is.
Has anybody experience this problem and know a work-around/fix? I am coding in C# for all actual code work.
If I missed including anything, please let me know. I tried to include everything I could that I felt was relevant.
Thank you!
|
|

November 26th, 2008, 12:50 PM
|
 |
Wrox Author
|
|
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
|
|
This isn't really an XSLT question. XSLT can only handle XML input. If your input isn't XML, then you'll have to turn it into XML before the XSLT can start. Rather than trying to patch-up broken XML, I would encourage you to look hard at the process creating the broken XML and fix it at source.
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer's Reference
|
|

November 26th, 2008, 12:59 PM
|
 |
Friend of Wrox
|
|
Join Date: Aug 2007
Posts: 2,128
Thanks: 1
Thanked 189 Times in 188 Posts
|
|
EOF means 'End Of File' - there is no specific 'character' that is an EOF character (at least I'm not aware of one.
What this probably means is that the XML parser is reading the CDATA section, and because of the invalid XML character doesn't realise that the CDATA has finished, and keeps reading till the end of the file.
As Michael said, there is nothing that XSLT can do to help you with this as it is an XML transformation language - and you don't have valid XML.
/- Sam Judson : Wrox Technical Editor -/
|
|

November 26th, 2008, 01:02 PM
|
|
Friend of Wrox
|
|
Join Date: Nov 2007
Posts: 1,243
Thanks: 0
Thanked 245 Times in 244 Posts
|
|
I agree with Michael that it is not an XSLT problem at all.
You will need to look at the file with a hex editor to see what kind of byte sequence is used in there.
Maybe the Wikipedia articles http://en.wikipedia.org/wiki/End_of_file, http://en.wikipedia.org/wiki/Control-Z help to identify what is in there.
--
Martin Honnen
Microsoft MVP - XML
|
|

November 26th, 2008, 01:04 PM
|
|
Authorized User
|
|
Join Date: Aug 2008
Posts: 17
Thanks: 1
Thanked 0 Times in 0 Posts
|
|
Ya, I figured as much. And there is nothing I can do to control how the XML file is constructed. It's out of my hands.
I'll just have to figure out how to patch it correctly.
And I know that the CDATA tag closes and all other tags close and continue on with other elements.
Thanks for the help and confirming what I was afraid of.
|
|

November 26th, 2008, 01:31 PM
|
 |
Wrox Author
|
|
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
|
|
Please don't call it an "XML file". It's a non-XML file.
XML is a really good way of constructing interfaces between systems. Before you patch up someone else's broken XML data, do remember that you are losing all the benefits that come from using a standard interchange syntax. You're keeping things together with sticky tape. It won't hold for long. It might be pragmatic, but it's not good engineering.
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer's Reference
|
|

November 26th, 2008, 01:48 PM
|
|
Authorized User
|
|
Join Date: Aug 2008
Posts: 17
Thanks: 1
Thanked 0 Times in 0 Posts
|
|
Without getting into semantics too much, it still is an XML file. It has XML elements and, in rare cases, invalid characters get stuffed into it.
I've already made it clear to those that I can speak to that the XML files we receive are poorly formatted and are in need of serious help. It falls on deaf ears, or those that don't view it as high priority.
I agree with you 100% that I'm losing the benefits, but this is what I have to work with.
It's hard to explain. I'm just going to have to find a way to patch it or ignore it altogether.
|
|

November 26th, 2008, 03:51 PM
|
 |
Wrox Author
|
|
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
|
|
>Without getting into semantics too much, it still is an XML file.
No it isn't. It might be nearly XML, but it's no more XML than 123456!?7 is a number.
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer's Reference
|
|

August 16th, 2011, 10:34 AM
|
|
Registered User
|
|
Join Date: Aug 2011
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
mhkay...you are a dick. A pompous, self-righteous dick.
BTW...123456!?7 is a number...with invalid characters stuffed into it.
THE GUARDIAN HAS SPOKEN!!!
|
|

August 16th, 2011, 02:48 PM
|
 |
Wrox Author
|
|
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
|
|
Yes, I'm rather pedantic. Most good programmers are. There's a connection there somewhere.
__________________
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer\'s Reference
|
|
 |