unexpected behavior from microsoft web page
did you see the problem of microsoft web page?
when the site is online and you see its source the meta is wrriten in the known format but if you save the page on the disk and open it
and see its source you will see new frmat for the meta tags
(new format and the known one is shown down this page)
the other strange behavior is when u parse the online page
i want to get the keywords of this page
i make two version of regex one for each format of metaand parse by the first and if its return null i parse by the second
the problem is i can't parse the online page of microsoft.
i try to get the keywords from the online microsoft webpage by the two version of regex but it fail
and parsing the saved version on the disk this time its succeeded
regex1
@"<meta[\s]*content\s*=\s*\""\s*(?<keywords>.+?)\""[\s]*name[\s]*=[\s]*\""?keywords\""?[\s]*>"
regex2
<meta[\s]*?name[\s]*?=[\s]*?""[\s]*?keywords[\s]*?\""[\s]*?(?:lang=""en-us"")?[\s]*?content\s*=\s*\""(?<keywords>.+?)\""\s*>"
known way of meta tag
<META name="keywords" content="vacation,Greece,sunshine">
new format
<META content="vacation,Greece,sunshine" name=keywords>
sometimes wrriten as this
<META content="vacation,Greece,sunshine" name="keywords">
sometimes wrriten as this
<META
content="vacation,Greece,sunshine"
name=keywords>
My Regards
|