Wrox Programmer Forums
Go Back   Wrox Programmer Forums > XML > XSLT
|
XSLT General questions and answers about XSLT. For issues strictly specific to the book XSLT 1.1 Programmers Reference, please post to that forum instead.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XSLT section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old November 26th, 2008, 12:44 PM
Authorized User
 
Join Date: Aug 2008
Posts: 17
Thanks: 1
Thanked 0 Times in 0 Posts
Default Hidden EOF in CDATA needs removal

Hello.
I've encountered an error in some of the xml transforms I work on. This error is "Unexpected end of file while parsing CDATA has occurred."

Essentially, within a <msg> tag is a CDATA string. Part of the transform is to remove the CDATA elements so we can store that string in the DB.
Example:
"<Msg><![CDATA[Message goes here.]]></Msg>"

NOTE: I have no control over the contents of these xml files, they are produced in a hardware element, developed by others.

Now, what is happening is that occasionally there will be invalid characters that the XML parser does not know how to handle. That is fine, I can deal with those. But what I can't deal with is essentially whatever message is copied over, into that CDATA tag, sometimes the EOF character is included. Which causes the error message above and even programs like NOTEPAD will stop displaying the contents of the xml file at that point, even though I am 100% certain that more content is supposed to follow.

I'm pretty certain there is no way to remove that during transform. So I was thinking of adding a for loop prior to the transform to find those extra EOF's and remove them...but I do not know what the EOF character is.

Has anybody experience this problem and know a work-around/fix? I am coding in C# for all actual code work.

If I missed including anything, please let me know. I tried to include everything I could that I felt was relevant.

Thank you!

 
Old November 26th, 2008, 12:50 PM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

This isn't really an XSLT question. XSLT can only handle XML input. If your input isn't XML, then you'll have to turn it into XML before the XSLT can start. Rather than trying to patch-up broken XML, I would encourage you to look hard at the process creating the broken XML and fix it at source.

Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer's Reference
 
Old November 26th, 2008, 12:59 PM
samjudson's Avatar
Friend of Wrox
 
Join Date: Aug 2007
Posts: 2,128
Thanks: 1
Thanked 189 Times in 188 Posts
Default

EOF means 'End Of File' - there is no specific 'character' that is an EOF character (at least I'm not aware of one.

What this probably means is that the XML parser is reading the CDATA section, and because of the invalid XML character doesn't realise that the CDATA has finished, and keeps reading till the end of the file.

As Michael said, there is nothing that XSLT can do to help you with this as it is an XML transformation language - and you don't have valid XML.

/- Sam Judson : Wrox Technical Editor -/
 
Old November 26th, 2008, 01:02 PM
Friend of Wrox
 
Join Date: Nov 2007
Posts: 1,243
Thanks: 0
Thanked 245 Times in 244 Posts
Default

I agree with Michael that it is not an XSLT problem at all.
You will need to look at the file with a hex editor to see what kind of byte sequence is used in there.
Maybe the Wikipedia articles http://en.wikipedia.org/wiki/End_of_file, http://en.wikipedia.org/wiki/Control-Z help to identify what is in there.

--
  Martin Honnen
  Microsoft MVP - XML
 
Old November 26th, 2008, 01:04 PM
Authorized User
 
Join Date: Aug 2008
Posts: 17
Thanks: 1
Thanked 0 Times in 0 Posts
Default

Ya, I figured as much. And there is nothing I can do to control how the XML file is constructed. It's out of my hands.

I'll just have to figure out how to patch it correctly.
And I know that the CDATA tag closes and all other tags close and continue on with other elements.

Thanks for the help and confirming what I was afraid of.



 
Old November 26th, 2008, 01:31 PM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

Please don't call it an "XML file". It's a non-XML file.

XML is a really good way of constructing interfaces between systems. Before you patch up someone else's broken XML data, do remember that you are losing all the benefits that come from using a standard interchange syntax. You're keeping things together with sticky tape. It won't hold for long. It might be pragmatic, but it's not good engineering.

Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer's Reference
 
Old November 26th, 2008, 01:48 PM
Authorized User
 
Join Date: Aug 2008
Posts: 17
Thanks: 1
Thanked 0 Times in 0 Posts
Default

Without getting into semantics too much, it still is an XML file. It has XML elements and, in rare cases, invalid characters get stuffed into it.
I've already made it clear to those that I can speak to that the XML files we receive are poorly formatted and are in need of serious help. It falls on deaf ears, or those that don't view it as high priority.
I agree with you 100% that I'm losing the benefits, but this is what I have to work with.
It's hard to explain. I'm just going to have to find a way to patch it or ignore it altogether.

 
Old November 26th, 2008, 03:51 PM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

>Without getting into semantics too much, it still is an XML file.

No it isn't. It might be nearly XML, but it's no more XML than 123456!?7 is a number.

Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer's Reference
 
Old August 16th, 2011, 10:34 AM
Registered User
 
Join Date: Aug 2011
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
Angry

mhkay...you are a dick. A pompous, self-righteous dick.

BTW...123456!?7 is a number...with invalid characters stuffed into it.

THE GUARDIAN HAS SPOKEN!!!
 
Old August 16th, 2011, 02:48 PM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

Yes, I'm rather pedantic. Most good programmers are. There's a connection there somewhere.
__________________
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer\'s Reference





Similar Threads
Thread Thread Starter Forum Replies Last Post
namespace removal manish_jaiswal XSLT 5 February 15th, 2008 05:40 AM
Removal of modules slgknjn Excel VBA 0 September 24th, 2004 12:35 AM
Close button removal canuck38 Access 4 July 16th, 2004 03:50 AM
Chapt 6 Removal from an ArrayList Jerry Obrien BOOK: Beginning VB.NET 2nd Edition/Beginning VB.NET 2003 0 February 19th, 2004 02:40 PM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.