Wrox Programmer Forums
.NET Framework 2.0 For discussion of the Microsoft .NET Framework 2.0.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the .NET Framework 2.0 section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
Old December 27th, 2007, 06:31 AM
Registered User
Join Date: Dec 2007
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
Default get_meta_tags equivalence

how to abstract title and Meta of a page HTML in an array in asp.net and c#. In php, it's easy with get_meta_tags() , but i did not find equivalence in dotnet ?

Please help in this....
Old December 29th, 2007, 06:35 AM
samjudson's Avatar
Friend of Wrox
Join Date: Aug 2007
Posts: 2,128
Thanks: 1
Thanked 189 Times in 188 Posts

You could build your own using regular expressions (which is basically what the php version does)

e.g. <meta[^>]+name=\"([^\"]*)\"[^>]" . "+content=\"([^\"]*)\"[^>]+>

/- Sam Judson : Wrox Technical Editor -/
Old December 29th, 2007, 07:47 AM
Authorized User
Join Date: Sep 2007
Posts: 92
Thanks: 0
Thanked 0 Times in 0 Posts

Actually content is the only required attribute for meta tag. Everything else is optional. Attributes can be also in any order so that regexp doesn't work necessarily.

So what I would do is that I would rip out all the meta tags as whole using regexp: (<meta.+?>)

Then I would use substring to get rid of unnecessary stuff:

// Get rid of those: "<meta " and ">"
meta = meta.Substring(meta.IndexOf(" ") + 1);
meta = meta.SubString(0, meta.Length - 1);
And now I would split it up:

string[] attrs = meta.Split(' ');
foreach (string attr in attrs)
    string[] tmp = attr.Split('=');
    string key = tmp[0];
    string val = tmp[1];
That splitting stuff can also be made by using regexp: (content|http-equiv|name|scheme)="(.*?)"

Wow. Looks awful. Regular expressions are like violence. When you use it, you have to use it a lot! :)

Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.