 |
| XML General XML discussions. |
Welcome to the p2p.wrox.com Forums.
You are currently viewing the XML section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
|
|
|
|

August 26th, 2008, 12:40 AM
|
|
Authorized User
|
|
Join Date: Jun 2005
Posts: 97
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Searching for body element in an html not working
It seems that SelectSingle node does not work well with HTML documents loaded as XML.
XmlDocument doc = new XmlDocument();
doc.Load(@"c:\temp\layout.html");//load html as xml doc
//add namespace manager
XmlNamespaceManager man = new XmlNamespaceManager(doc.NameTable);
man.AddNamespace(string.Empty, "http://wwww.w3.org/1999/xhtml");
XmlNode body = doc.DocumentElement.SelectSingleNode("//body",man);//also tried without namespace manager, but didn't work too.
if(body != null)
Console.WriteLine(body.OuterXml);
However when I do doc.DocumentElement["body"], it gives me the node.
What am I missing here?
Below is the HTML document:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<title>Untitled Page</title>
<link href="workflow.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div style="height: 100%; width: 100%; overflow: auto; text-align: center; padding-left: 20px;">
<br />
<table class="workflow_container">
<tr>
<td>
<div id="dvDevelop" class="workflow_active">
</div>
</td>
</tr>
<tr>
<td>
<div class="down_arrow">
</div>
</td>
</tr>
<tr>
<td>
<div id="dvReview" class="workflow_disable">
</div>
</td>
</tr>
</table>
<div class="right_arrow">
</div>
<div id="dvContentQC" class="workflow_next">
</div>
<div class="right_arrow">
</div>
<div id="dvPublish" class="workflow_disable">
</div>
</div>
<p>
</p>
<p>
</p>
<p>
</p>
<p>
</p>
<div class="workflow_disable">
Approve
<table>
<tr>
<td>
3<br />
APP-00-102-11
<br />
Apr 12, 2007
</td>
</tr>
</table>
</div>
</body>
</html>
__________________
\"Dont you ever give up!\"
|
|

August 26th, 2008, 02:05 AM
|
 |
Friend of Wrox
|
|
Join Date: Aug 2007
Posts: 2,128
Thanks: 1
Thanked 189 Times in 188 Posts
|
|
1) I assume that the error above is in the url checking code on the forum, but the xhtml namespace is http://www.w3.org/1999/xhtml.
2) Have you tried adding the namespace with a prefix (such as xhtml) and then using "//xhtml:body" ?
/- Sam Judson : Wrox Technical Editor -/
|
|

August 26th, 2008, 06:48 PM
|
|
Authorized User
|
|
Join Date: Jun 2005
Posts: 97
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
XmlNode body = doc.DocumentElement.SelectSingleNode("//body",man);
This always returns null.
"//xhtml:body" does not work either.
"Dont you ever give up!"
|
|

August 27th, 2008, 01:57 AM
|
 |
Wrox Author
|
|
Join Date: Jun 2003
Posts: 3,074
Thanks: 1
Thanked 38 Times in 37 Posts
|
|
Show the relevant code where you read the document and set up the NamespaceManager.
--
Joe ( Microsoft MVP - XML)
|
|

August 27th, 2008, 06:38 PM
|
|
Authorized User
|
|
Join Date: Jun 2005
Posts: 97
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
XmlDocument doc = new XmlDocument();
doc.Load(@"c:\temp\layout.html");//load html as xml doc
//add namespace manager
XmlNamespaceManager man = new XmlNamespaceManager(doc.NameTable);
man.AddNamespace(string.Empty, http://wwww.w3.org/1999/xhtml");
XmlNode body = doc.DocumentElement.SelectSingleNode("//body",man);//this is where I am searching for the node
if(body != null)
Console.WriteLine(body.OuterXml);
HTML document is available in the first post as inline text..
"Dont you ever give up!"
|
|

August 28th, 2008, 02:29 AM
|
 |
Wrox Author
|
|
Join Date: Jun 2003
Posts: 3,074
Thanks: 1
Thanked 38 Times in 37 Posts
|
|
Well you don't assign a prefix, string.Empty cannot be used. You need to use something like "xhtml" and use "//xhtml:body" as your XPath.
If that doesn't work then your HTML is not XHTML.
--
Joe ( Microsoft MVP - XML)
|
|

August 28th, 2008, 03:26 AM
|
 |
Friend of Wrox
|
|
Join Date: Aug 2007
Posts: 2,128
Thanks: 1
Thanked 189 Times in 188 Posts
|
|
Joe, according to the docs String.Empty can be used to define the default namespace. Should this not work then?
http://msdn.microsoft.com/en-us/libr...namespace.aspx
/- Sam Judson : Wrox Technical Editor -/
|
|

August 28th, 2008, 03:46 AM
|
 |
Wrox Author
|
|
Join Date: Apr 2004
Posts: 4,962
Thanks: 0
Thanked 292 Times in 287 Posts
|
|
In XPath 1.0 "//body" means "find body elements in no namespace", not "find body elements in the default namespace". So setting the default namespace should make no difference.
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer's Reference
|
|

August 28th, 2008, 05:56 AM
|
 |
Wrox Author
|
|
Join Date: Jun 2003
Posts: 3,074
Thanks: 1
Thanked 38 Times in 37 Posts
|
|
It can be used to define the default namespace but that doesn't help you use it in XPath so I'm not sure what use that is.
--
Joe ( Microsoft MVP - XML)
|
|
 |