Subject: Need RegEx help
Posted By: Snib Post Date: 10/2/2004 12:25:53 PM
Hello,

I need a pattern that parses some HTML code and replaces all src and href values to include the absolute path to the resource.

So <a href='/index.php'> turns into <a href='http://www.mysite.com/index.php'> and <link href="styles.css"/> turns into <link href="http://www.mysite.com/directory/styles.css"/>

I would do it myself but I'm very new to regular expressions and can't figure it out.

Thanks,

-Snib <><
http://www.snibworks.com
There are only two stupid questions: the one you don't ask, and the one you ask more than once ;-)
Reply By: Richard Lightfoot Reply Date: 10/4/2004 3:21:01 AM
Bit new to it myself, but

$picture="<a href='/index.php'>";
$picture=  ereg_replace ( "<a href='/index.php'>", "<a href='http://www.mysite.com/index.php'>",$picture);
echo $picture;

Should work.


Reply By: Snib Reply Date: 10/4/2004 4:52:23 PM
Richard,

That will replace all instances of <a href='/index.php'> but what if it doesn't link to /index.php? What if I used double quotes (")? What if the tag really looks like this: <a style='color:red' href='/index.php'>. And also I need it to parse <img> tags, <script> tags and <link> tags.

Thanks for helping,

-Snib <><
http://www.snibworks.com
There are only two stupid questions: the one you don't ask, and the one you ask more than once ;-)
Reply By: Moharo Reply Date: 10/8/2004 9:21:59 AM
hey Snib

i gotta admit that regular expressions are not my favorites... but this is what i came up with for your problem... (might be buggy)....

<?php

$teststr = "<a href='index.php'>";
$newstr = preg_replace("/\<(a|link) href='(.*?)'>/","<\\1 href=\"http://www.mysite.com\\2\">",$str);

echo $newstr;

?>

after this script is parsed, you will not see anything on the screen (browser's window), but look at the html code ("view source")

hope that helped u

crazy zoltalar


www.campusgrind.com the college portal
Reply By: nikolai Reply Date: 10/20/2004 12:10:54 PM
There's a problem with your regex replacement, Moharo.  Check your own example:  "index.php" gets matched as \2, so your replacement shows up as:

"http://www.mysite.comindex.php"

There are a bunch of cases you need to test for.

1)  Does the link already specify an absolute URL?
    (e.g. http://www.example.com/foo/bar/page.html)

2)  Does the link specify an absolute path?
    (e.g.  /foo/bar/page.html)

3)  Does the link specify a path relative to the current script?
    a) at or below the current path?
        (e.g.   page.html, foo/bar/page.html)
    b) above or a sibling to the current path?
        (e.g.   ../page.html, ../foo/bar/page.html)


For 1), do nothing.  No replacement necessary.

For 2), simply prepend the host to the path.

for 3), you'll need to calculate the working directory of the currently executing path, and
    for a) append the relative url to the working directory.
    for b) modify the working directory to reflect the ".."s in the relative path.



Make sense?


Take care,

Nik
http://www.bigaction.org/
Reply By: Snib Reply Date: 10/20/2004 3:19:35 PM
Welcome back, Nik.

You seem to know your way around regular expressions better than either of us, could you try to make a pattern for this?

I am still trying myself, unsuccessfully.

Thanks,

-Snib <><
Try new FreshView 0.2!
There are only two stupid questions: the one you don't ask, and the one you ask more than once ;-)
Reply By: anshul Reply Date: 11/16/2004 5:59:59 AM
may b good url2learn regular expressions:
http://www.webreference.com/js/column5/


Go to topic 22194

Return to index page 935
Return to index page 934
Return to index page 933
Return to index page 932
Return to index page 931
Return to index page 930
Return to index page 929
Return to index page 928
Return to index page 927
Return to index page 926