Wrox Programmer Forums

Need to download code?

View our list of code downloads.

Go Back   Wrox Programmer Forums > XML > XML
Password Reminder
Register
| FAQ | Members List | Search | Today's Posts | Mark Forums Read
XML General XML discussions.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XML section of the Wrox Programmer to Programmer discussions. This is a community of tens of thousands of software programmers and website developers including Wrox book authors and readers. As a guest, you can read any forum posting. By joining today you can post your own programming questions, respond to other developers’ questions, and eliminate the ads that are displayed to guests. Registration is fast, simple and absolutely free .
DRM-free e-books 300x50
Reply
 
Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old March 4th, 2004, 12:18 PM
Registered User
 
Join Date: Mar 2004
Location: , , .
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
Default Parsing Problem - help!

Hello

I have created a basic RSS news reader which is fine apart from one xml file - its actually the BBC News. They seem to have unicode characters within the xml file (upside down question marks and Pound signs) which causes a problem with the MSXML parser it basically dosen't create a document object model :( If I manually take out these charcters it works a treat.

Anybody help with this?

Thanks
Phil

Reply With Quote
  #2 (permalink)  
Old March 4th, 2004, 12:29 PM
joefawcett's Avatar
Wrox Author
Points: 9,763, Level: 42
Points: 9,763, Level: 42 Points: 9,763, Level: 42 Points: 9,763, Level: 42
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jun 2003
Location: Exeter, , United Kingdom.
Posts: 3,074
Thanks: 1
Thanked 38 Times in 37 Posts
Default

Can you point to the RSS link? These problems are normally caused by encoding incorrectly declared or the process changing it at some stage. Can you show some of your code?

Joe (MVP - xml)
Reply With Quote
  #3 (permalink)  
Old March 4th, 2004, 12:40 PM
Registered User
 
Join Date: Mar 2004
Location: , , .
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Quote:
quote:Originally posted by joefawcett
 Can you point to the RSS link? These problems are normally caused by encoding incorrectly declared or the process changing it at some stage. Can you show some of your code?

Joe (MVP - xml)

The url for the rss is http://news.bbc.co.uk/rss/newsonline...lth/rss091.xml

My code is

<code>
url = request.querystring("url")
call getXmlNews(url)

Sub getXmlNews(address)

Set oXMLHttp = Server.CreateObject("MSXML2.ServerXMLHTTP")

        dim URL
        dim i
        dim j
        dim root

    URL= address
          oXMLHttp.open "GET", URL, false
         oXMLHttp.send() ' Send the request.'

      If oXMLHttp.status = 200 Then

                sXML = oXMLHttp.responseText ' Retrieve from serverB.'
                removeDoctype(sXML)

                set oXML = Server.CreateObject("MSXML2.DomDocument")
                'response.write(sXML)'

                oXML.loadXML(sXML)

'*** create Title info'

    newstitle = oXML.getElementsByTagName("title").item(0).text
    response.write("<div id=""maintitle"">"&newstitle&"</div>")
    response.write("")



'*** Write news items'

                Set objLst = oXML.getElementsByTagName("item")
                intNoOfHeadlines = objLst.length -1
                'response.write(intNoOfHeadlines)'
                strContent = ""

             For i = 0 To (intNoOfHeadlines)

strContent = strContent &"<div class=""main"">"
                 Set objHdl = objLst.item(i) 'loop through each item'

                    for j = 0 to objHdl.childNodes.length -1


                        If objHdl.childNodes(j).nodeName = "title" Then
                            strContent = strContent &"<div id=""title"">"&objHdl.childNodes(j).text&"</div>"
                        End If

                        If objHdl.childNodes(j).nodeName = "description" Then
                            strContent = strContent &"<div id=""description"">"&objHdl.childNodes(j).text&"</div>"
                        End If


                        If objHdl.childNodes(j).nodeName = "link" Then
                            strContent = strContent &"<div id=""link""><a href="""&objHdl.childNodes(j).text&" "" target=""blank"">View Item</a></div>"
                        End If

                        If objHdl.childNodes(j).nodeName = "pubDate" Or objHdl.childNodes(j).nodeName = "dc:date" Then
                            strContent = strContent &"<div id=""pubdate"">Document Published: "&objHdl.childNodes(j).text&"</div>"
                        End If

                    Next

                strContent = strContent &"</div>"


            Next

strContent = strContent &"<br /><br />" 'make alittle space at bottom of page'

     Else

         strContent = "Sorry...the news source is temporarily unavailable - Please try again later. " '&oXMLHttp.status'

   End if


Response.Write(strContent) 'Write out the content'

     set oXML = Nothing
    set oXMLHttp = Nothing

End Sub
</code>

Thanks for your quick response
phil

Reply With Quote
  #4 (permalink)  
Old March 4th, 2004, 12:56 PM
joefawcett's Avatar
Wrox Author
Points: 9,763, Level: 42
Points: 9,763, Level: 42 Points: 9,763, Level: 42 Points: 9,763, Level: 42
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jun 2003
Location: Exeter, , United Kingdom.
Posts: 3,074
Thanks: 1
Thanked 38 Times in 37 Posts
Default

I will have a closer look if you wish but at first glance the problem is this. The site is using an encoding of ISO-8859-1. When ever you write string in VB, via Response.Write in this case, you are writing a unicode string encoded as utf-16. You are also using responseText property, if the xml is well formed you should be using oXMLHttp.responseXML. This is breaking the encoding. The most elegant way to fix this would be to abandon the dom parsing and use XSLT to transform to HTML. You might get away with setting the Response.CharSet to "ISO-8859-1", you might need to set the Response.CodePage as well.
As a test try this after posting
Code:
Response.ContentType = "text/xml"
set oXML = Server.CreateObject("MSXML2.DomDocument")
oXml.load oXMLHttp.responseXML
oXml.save Response
Response.end
As a secondary point you'll get more consistent performance by using the correct progids. If you have version four parser use:
MSXML2.ServerXMLHTTP.4.0 and
MSXML2.DomDocument.4.0
with version 3 change 4 to 3:)





--

Joe
Reply With Quote
  #5 (permalink)  
Old March 4th, 2004, 01:12 PM
Registered User
 
Join Date: Mar 2004
Location: , , .
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Hey Joe - many thanks for this

I have been considering using XSLT after having a tinker with it.
When you say XSLT do you mean asp with XSLT or just XSLT on its own?

I will play around with your suggestions.

Thanks again
phil

Reply With Quote
  #6 (permalink)  
Old March 5th, 2004, 05:28 AM
joefawcett's Avatar
Wrox Author
Points: 9,763, Level: 42
Points: 9,763, Level: 42 Points: 9,763, Level: 42 Points: 9,763, Level: 42
Activity: 0%
Activity: 0% Activity: 0% Activity: 0%
 
Join Date: Jun 2003
Location: Exeter, , United Kingdom.
Posts: 3,074
Thanks: 1
Thanked 38 Times in 37 Posts
Default

XSLT with ASP, this, combined with using the dom directly and not moving between it and strings, should solve your problems.

--

Joe
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem in XML parsing denzil_cactus Perl 3 December 18th, 2008 03:56 PM
Parsing a string problem poyserr Access 4 February 28th, 2007 08:19 AM
XML Parsing Problem when using MSXML magicwanda XML 1 December 9th, 2004 01:56 PM
XSLT simple parsing problem misu XSLT 3 August 18th, 2004 02:00 AM



All times are GMT -4. The time now is 01:18 AM.


Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
© 2013 John Wiley & Sons, Inc.