Subject: Searching for body element in an html not working
Posted By: aldwinenriquez Post Date: 8/26/2008 12:40:17 AM
It seems that SelectSingle node does not work well with HTML documents loaded as XML.

XmlDocument doc = new XmlDocument();
doc.Load(@"c:\temp\layout.html");//load html as xml doc

//add namespace manager
XmlNamespaceManager man = new XmlNamespaceManager(doc.NameTable);
            man.AddNamespace(string.Empty, "www.w3.org/1999/xhtml" target="_blank">http://wwww.w3.org/1999/xhtml");

XmlNode body = doc.DocumentElement.SelectSingleNode("//body",man);//also tried without namespace manager, but didn't work too.
if(body != null)
 Console.WriteLine(body.OuterXml);

However when I do doc.DocumentElement["body"], it gives me the node.
What am I missing here?


Below is the HTML document:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
    <title>Untitled Page</title>
    <link href="workflow.css" rel="stylesheet" type="text/css" />
</head>
<body>
        <div style="height: 100%; width: 100%; overflow: auto; text-align: center; padding-left: 20px;">
        <br />
        <table class="workflow_container">
            <tr>
                <td>
                    <div id="dvDevelop" class="workflow_active">

                        
                    </div>
                </td>
            </tr>
            <tr>
                <td>
                    <div class="down_arrow">
                        &nbsp;
                    </div>
                </td>
            </tr>
            <tr>
                <td>
                    <div id="dvReview" class="workflow_disable">

                        </div>
                </td>
            </tr>
        </table>
        <div class="right_arrow">
            &nbsp;
        </div>
        <div id="dvContentQC" class="workflow_next">

            </div>
        <div class="right_arrow">
            &nbsp;
        </div>
        <div id="dvPublish" class="workflow_disable">

            </div>
    </div>
<p>
    &nbsp;</p>
<p>
    &nbsp;</p>
<p>
    &nbsp;</p>
<p>
    &nbsp;</p>
                    <div class="workflow_disable">
                        Approve
                        <table>
                            <tr>
                                <td>
                                    3<br />
                                    APP-00-102-11
                                    <br />
                                    Apr 12, 2007
                                </td>
                            </tr>
                        </table>
                    </div>
                </body>
</html>
Reply By: samjudson Reply Date: 8/26/2008 2:05:56 AM
1) I assume that the error above is in the url checking code on the forum, but the xhtml namespace is http://www.w3.org/1999/xhtml.

2) Have you tried adding the namespace with a prefix (such as xhtml) and then using "//xhtml:body" ?

/- Sam Judson : Wrox Technical Editor -/
Reply By: aldwinenriquez Reply Date: 8/26/2008 6:48:50 PM
XmlNode body = doc.DocumentElement.SelectSingleNode("//body",man);
This always returns null.

"//xhtml:body" does not work either.


"Dont you ever give up!"
Reply By: joefawcett Reply Date: 8/27/2008 1:57:01 AM
Show the relevant code where you read the document and set up the NamespaceManager.

--

Joe (Microsoft MVP - XML)
Reply By: aldwinenriquez Reply Date: 8/27/2008 6:38:55 PM
XmlDocument doc = new XmlDocument();
doc.Load(@"c:\temp\layout.html");//load html as xml doc

//add namespace manager
XmlNamespaceManager man = new XmlNamespaceManager(doc.NameTable);
man.AddNamespace(string.Empty, www.w3.org/1999/xhtml" target="_blank">http://wwww.w3.org/1999/xhtml");
XmlNode body = doc.DocumentElement.SelectSingleNode("//body",man);//this is where I am searching for the node
if(body != null)
 Console.WriteLine(body.OuterXml);

HTML document is available in the first post as inline text..


"Dont you ever give up!"
Reply By: joefawcett Reply Date: 8/28/2008 2:29:19 AM
Well you don't assign a prefix, string.Empty cannot be used. You need to use something like "xhtml" and use "//xhtml:body" as your XPath.
If that doesn't work then your HTML is not XHTML.

--

Joe (Microsoft MVP - XML)
Reply By: samjudson Reply Date: 8/28/2008 3:26:27 AM
Joe, according to the docs String.Empty can be used to define the default namespace. Should this not work then?

http://msdn.microsoft.com/en-us/library/system.xml.xmlnamespacemanager.addnamespace.aspx

/- Sam Judson : Wrox Technical Editor -/
Reply By: mhkay Reply Date: 8/28/2008 3:46:12 AM
In XPath 1.0 "//body" means "find body elements in no namespace", not "find body elements in the default namespace". So setting the default namespace should make no difference.

Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer's Reference
Reply By: joefawcett Reply Date: 8/28/2008 5:56:20 AM
quote:
Originally posted by samjudson

Joe, according to the docs String.Empty can be used to define the default namespace. Should this not work then?

http://msdn.microsoft.com/en-us/library/system.xml.xmlnamespacemanager.addnamespace.aspx

/- Sam Judson : Wrox Technical Editor -/


It can be used to define the default namespace but that doesn't help you use it in XPath so I'm not sure what use that is.

--

Joe (Microsoft MVP - XML)

Go to topic 73692

Return to index page 1