Actually content is the only required attribute for meta tag. Everything else is optional. Attributes can be also in any order so that regexp doesn't work necessarily.
So what I would do is that I would rip out all the meta tags as whole using regexp: (<meta.+?>)
Then I would use substring to get rid of unnecessary stuff:
Code:
// Get rid of those: "<meta " and ">"
meta = meta.Substring(meta.IndexOf(" ") + 1);
meta = meta.SubString(0, meta.Length - 1);
And now I would split it up:
Code:
string[] attrs = meta.Split(' ');
foreach (string attr in attrs)
{
string[] tmp = attr.Split('=');
string key = tmp[0];
string val = tmp[1];
}
That splitting stuff can also be made by using regexp: (content|http-equiv|name|scheme)="(.*?)"
Wow. Looks awful. Regular expressions are like violence. When you use it, you have to use it a lot! :)