Wrox Programmer Forums
Go Back   Wrox Programmer Forums > C# and C > C# 1.0 > C#
|
C# Programming questions specific to the Microsoft C# language. See also the forum Beginning Visual C# to discuss that specific Wrox book and code.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the C# section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old July 6th, 2003, 09:47 AM
Registered User
 
Join Date: Jul 2003
Posts: 2
Thanks: 0
Thanked 0 Times in 0 Posts
Default help in building regex

my question conncernning building the pattern of regular expersion
i parse the returned page from google search and i want to extract only the links of the pages but this page contains advertising and links to cashed pages pictures anyway i use the most popular regex for href

a.*href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+) and modify it to not include the last >

a.*href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>[^>]*

but the output is have some not needed information

http://www.allsports.com/ ok

http://translate.google.com/translate?hl=en&sl=fr&u=http://www.jeunessesports.gouv.fr/&prev=/search%3Fq%3Dsports%26num%3D50%26hl%3Den%26lr%3D%2 6ie%3DUTF-8%26oe%3DUTF-8%26sa%3DG

(not ok it contains what we need the url in blak)

http://www.sports-central.org/ ok

http://www.dsusa.org/ ok

/search?q=sports&num=50&hl=en&lr=&ie=UTF-8&oe=UTF-8&start=50&sa=N not ok

/about.html not ok

the page of the return result of google is organized as
 <p class=g><a href=http://dmoz.org/Sports/>Open Directory - <b>Sports</b></....and some other
what we need is
href=http://dmoz.org/Sports/

also,

i parse the returned page from yahoo search and i want to extract only the links of the

pages i use the same regex

but the output is have some not needed information

note this is one line

http://drs.yahoo.com/S=2766679/K=spo.../www.espn.com/

note this is one line

http://drs.yahoo.com/S=2766679/K=spo.../www.espn.com/

the page of the return result of yahoo is organized as

<li><big><a href="http://drs.yahoo.com/S=2766679/K=sports/v=2/SID=w/l=WS1/R=36/H=0/*-http://www.jeunesse-sports.gouv.fr/"> ....and some others



what we need
http://www.jeunesse-sports.gouv.fr/
 thanks you
thanks



My Regards





Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex pendemv JSP Basics 4 December 8th, 2008 08:29 AM
regex mrame XSLT 12 July 25th, 2008 09:37 AM
More regex help Snib Pro PHP 4 December 16th, 2004 09:56 PM
Need RegEx help Snib Pro PHP 6 November 16th, 2004 06:59 AM
Regex Help boyer99g General .NET 2 October 8th, 2004 05:46 PM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.