Wrox Programmer Forums
Go Back   Wrox Programmer Forums > XML > XML
|
XML General XML discussions.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the XML section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old August 26th, 2008, 12:40 AM
Authorized User
 
Join Date: Jun 2005
Posts: 97
Thanks: 0
Thanked 0 Times in 0 Posts
Send a message via Yahoo to aldwinenriquez
Default Searching for body element in an html not working

It seems that SelectSingle node does not work well with HTML documents loaded as XML.

XmlDocument doc = new XmlDocument();
doc.Load(@"c:\temp\layout.html");//load html as xml doc

//add namespace manager
XmlNamespaceManager man = new XmlNamespaceManager(doc.NameTable);
            man.AddNamespace(string.Empty, "http://wwww.w3.org/1999/xhtml");

XmlNode body = doc.DocumentElement.SelectSingleNode("//body",man);//also tried without namespace manager, but didn't work too.
if(body != null)
 Console.WriteLine(body.OuterXml);

However when I do doc.DocumentElement["body"], it gives me the node.
What am I missing here?


Below is the HTML document:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
    <title>Untitled Page</title>
    <link href="workflow.css" rel="stylesheet" type="text/css" />
</head>
<body>
        <div style="height: 100%; width: 100%; overflow: auto; text-align: center; padding-left: 20px;">
        <br />
        <table class="workflow_container">
            <tr>
                <td>
                    <div id="dvDevelop" class="workflow_active">


                    </div>
                </td>
            </tr>
            <tr>
                <td>
                    <div class="down_arrow">
                        &nbsp;
                    </div>
                </td>
            </tr>
            <tr>
                <td>
                    <div id="dvReview" class="workflow_disable">

                        </div>
                </td>
            </tr>
        </table>
        <div class="right_arrow">
            &nbsp;
        </div>
        <div id="dvContentQC" class="workflow_next">

            </div>
        <div class="right_arrow">
            &nbsp;
        </div>
        <div id="dvPublish" class="workflow_disable">

            </div>
    </div>
<p>
    &nbsp;</p>
<p>
    &nbsp;</p>
<p>
    &nbsp;</p>
<p>
    &nbsp;</p>
                    <div class="workflow_disable">
                        Approve
                        <table>
                            <tr>
                                <td>
                                    3<br />
                                    APP-00-102-11
                                    <br />
                                    Apr 12, 2007
                                </td>
                            </tr>
                        </table>
                    </div>
                </body>
</html>
__________________
\"Dont you ever give up!\"
 
Old August 26th, 2008, 02:05 AM
samjudson's Avatar
Friend of Wrox
 
Join Date: Aug 2007
Posts: 2,128
Thanks: 1
Thanked 189 Times in 188 Posts
Default

1) I assume that the error above is in the url checking code on the forum, but the xhtml namespace is http://www.w3.org/1999/xhtml.

2) Have you tried adding the namespace with a prefix (such as xhtml) and then using "//xhtml:body" ?

/- Sam Judson : Wrox Technical Editor -/
 
Old August 26th, 2008, 06:48 PM
Authorized User
 
Join Date: Jun 2005
Posts: 97
Thanks: 0
Thanked 0 Times in 0 Posts
Send a message via Yahoo to aldwinenriquez
Default

XmlNode body = doc.DocumentElement.SelectSingleNode("//body",man);
This always returns null.

"//xhtml:body" does not work either.


"Dont you ever give up!"
 
Old August 27th, 2008, 01:57 AM
joefawcett's Avatar
Wrox Author
 
Join Date: Jun 2003
Posts: 3,074
Thanks: 1
Thanked 38 Times in 37 Posts
Default

Show the relevant code where you read the document and set up the NamespaceManager.

--

Joe (Microsoft MVP - XML)
 
Old August 27th, 2008, 06:38 PM
Authorized User
 
Join Date: Jun 2005
Posts: 97
Thanks: 0
Thanked 0 Times in 0 Posts
Send a message via Yahoo to aldwinenriquez
Default

XmlDocument doc = new XmlDocument();
doc.Load(@"c:\temp\layout.html");//load html as xml doc

//add namespace manager
XmlNamespaceManager man = new XmlNamespaceManager(doc.NameTable);
man.AddNamespace(string.Empty, http://wwww.w3.org/1999/xhtml");
XmlNode body = doc.DocumentElement.SelectSingleNode("//body",man);//this is where I am searching for the node
if(body != null)
 Console.WriteLine(body.OuterXml);

HTML document is available in the first post as inline text..


"Dont you ever give up!"
 
Old August 28th, 2008, 02:29 AM
joefawcett's Avatar
Wrox Author
 
Join Date: Jun 2003
Posts: 3,074
Thanks: 1
Thanked 38 Times in 37 Posts
Default

Well you don't assign a prefix, string.Empty cannot be used. You need to use something like "xhtml" and use "//xhtml:body" as your XPath.
If that doesn't work then your HTML is not XHTML.

--

Joe (Microsoft MVP - XML)
 
Old August 28th, 2008, 03:26 AM
samjudson's Avatar
Friend of Wrox
 
Join Date: Aug 2007
Posts: 2,128
Thanks: 1
Thanked 189 Times in 188 Posts
Default

Joe, according to the docs String.Empty can be used to define the default namespace. Should this not work then?

http://msdn.microsoft.com/en-us/libr...namespace.aspx

/- Sam Judson : Wrox Technical Editor -/
 
Old August 28th, 2008, 03:46 AM
mhkay's Avatar
Wrox Author
 
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
Default

In XPath 1.0 "//body" means "find body elements in no namespace", not "find body elements in the default namespace". So setting the default namespace should make no difference.

Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer's Reference
 
Old August 28th, 2008, 05:56 AM
joefawcett's Avatar
Wrox Author
 
Join Date: Jun 2003
Posts: 3,074
Thanks: 1
Thanked 38 Times in 37 Posts
Default

Quote:
quote:Originally posted by samjudson
 Joe, according to the docs String.Empty can be used to define the default namespace. Should this not work then?

http://msdn.microsoft.com/en-us/libr...namespace.aspx

/- Sam Judson : Wrox Technical Editor -/
It can be used to define the default namespace but that doesn't help you use it in XPath so I'm not sure what use that is.

--

Joe (Microsoft MVP - XML)





Similar Threads
Thread Thread Starter Forum Replies Last Post
reading a html doc into outlook body message matpen Word VBA 5 June 21st, 2009 10:19 PM
Properties box for the body element -- Page 40-41 zcorker ASP.NET 1.0 and 1.1 Basics 3 October 25th, 2007 01:14 AM
Master Page Body Element Properties SomeoneKnows BOOK: Wrox's ASP.NET 2.0 Visual Web Developer 2005 Express Edition Starter ISBN: 978-0-7645-8807-5 0 August 10th, 2007 04:13 PM
Corrupt HTML Body in CDO email patwadd Classic ASP Professional 3 July 26th, 2007 05:14 PM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.