Wrox Programmer Forums
Go Back   Wrox Programmer Forums > PHP/MySQL > Pro PHP
|
Pro PHP Advanced PHP coding discussions. Beginning-level questions will be redirected to the Beginning PHP forum.
Welcome to the p2p.wrox.com Forums.

You are currently viewing the Pro PHP section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
 
Old November 4th, 2003, 09:27 PM
richard.york's Avatar
Wrox Author
 
Join Date: Jun 2003
Posts: 1,706
Thanks: 0
Thanked 6 Times in 6 Posts
Default perl compatible regular expressions

I was wondering if anyone knew of a good tutorial on perl compatible regular expressions. I am trying to write a regular expression that would replace links in an email program with HTML formatted links.

I tried Nik's example in the following thread:
http://p2p.wrox.com/topic.asp?TOPIC_ID=5482

But have seem to run into a snag, in that I designed my program as a class and cannot seem to find a way to do the callback function from the class. I also tried defining the callback function in global scope, but the regular expression function didn't return the mail body. I've actually attempted several examples that I found on the web and none of them bring back the message body.

Here is one example that I tried:
$msg_body = imap_fetchbody($this->mailbox, $mid, $pid);

$msg_body = preg_replace("/([\w\.]+)(@)([\S\.]+)\b/i","<a href=\"mailto:$0\">$0</a>", $msg_body);
$msg_body = preg_replace("(^)"<a href=\"http$3://$4$5\"target=\"_blank\">$2$4$5</a>", $msg_body);

Neither of these look like a very good solution.

If I comment out the preg_replace functions the message body shows up, when I use them I get a blank message body.

I don't know much about regular expressions anyway, so I am at a loss to see where it might be going wrong. In all of my PHP books none of them seem to discuss perl compatible regular expressions in any detail, but they do talk quite a bit about POSIX-style regular expressions.

Thanks in advance!
: )
Rich


:::::::::::::::::::::::::::::::::
Smiling Souls
http://www.smilingsouls.net
:::::::::::::::::::::::::::::::::
 
Old November 4th, 2003, 11:37 PM
richard.york's Avatar
Wrox Author
 
Join Date: Jun 2003
Posts: 1,706
Thanks: 0
Thanked 6 Times in 6 Posts
Default

I was able to figure out a way to get Nik's example working.

Apparently my decode function which decodes the message body from quoted-printable was creating a conflict, so I moved that to happen before I attempted regular expression replacement.

I used create_function() to use preg_replace_callback from within my class.

$msg_body = imap_fetchbody($this->mailbox, $mid, $pid);
$msg_body = $this->decode_message($msg_body, $this->encoding[$mid][$i]);

$pattern = '!\bhttps?://([\w\-]+\.)+[a-zA-Z]{2,3}(/(\S+)?)?\b!';

$msg_body = htmlspecialchars($msg_body);
$msg_body = preg_replace_callback($pattern, create_function('$matches', 'return "<a href=\'".$matches[0]."\' target=\'_new\'>".$matches[0]."</a>";'), $msg_body);

: )
Rich

:::::::::::::::::::::::::::::::::
Smiling Souls
http://www.smilingsouls.net
:::::::::::::::::::::::::::::::::
 
Old November 5th, 2003, 03:55 PM
Friend of Wrox
 
Join Date: Jun 2003
Posts: 836
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Hey Rich,

I'd recommend reading through PHP's manual pages:
  http://www.php.net/pcre

Check out their "pattern syntax" and "pattern modifiers" page. Also, search for 'perl regular expression tutorial' on google; there's lots of hits.


I don't think for your case you need to use create_function(); the problem with that approach is that you create an unnamed function EVERY time you get to the point in execution. I don't think it causes a huge amount of excess overhead, but it's there nonetheless.


I don't have the time to play with your original patterns, but I suspect a couple reasons your patterns are failing:

1) You're using a dollar to access your back references. Perl-compatible regexes in PHP use a backslash and a number between 0 and 99 to access a back reference.

2) Your 2nd pattern isn't a valid string:
  "(^)"<a href=\"http$3://$4$5\"target=\"_blank\">$2$4$5</a>"

The 4th character of your pattern string is a double-quote character, which ends the string and should cause a parse error.

Good luck, and let me know if any more problems come up.





Take care,

Nik
http://www.bigaction.org/
 
Old November 6th, 2003, 12:33 AM
richard.york's Avatar
Wrox Author
 
Join Date: Jun 2003
Posts: 1,706
Thanks: 0
Thanked 6 Times in 6 Posts
Default

Thanks Nik,

I must have overlooked the pattern syntax links when I was looking through the manual. I have been trying out some patterns.

I saw in the user notes at http://www.php.net/preg_replace_callback someone suggested plugging in an array with two indices, the first being the class name and the second the function name.. well actually here is a quote:

Quote:
quote:
Also, if you want to use a *static* class method for the callback function, you can refer to it like this:
   preg_replace_callback(pattern, array('ClassName', 'methodName'), subject)

In PHP5, from within the class:
   preg_replace_callback(pattern, array('self', 'methodName'), subject)
I tried this and it works, well the first method, I'm waiting for PHP 5 to come out of beta before fooling with that.

I have been pouring over your syntax for a while and cannot seem to get it modified to accept any protocol.

The original I think was this:
$pattern = '!\bhttps?://([\w\-]+\.)+[a-zA-Z]{2,3}(/(\S+)?)?\b!';

I tried changing it to this:
$pattern = '!\b(https?|telnet|ftp)(:\/\/)([\w\-]+\.)+[a-zA-Z]{2,3}(/(\S+)?)?\b!';

And I was also trying to include an optional '/' at the end of the URL... for cases where the url contains only http://www.somesite.com/

I wrote this one for emails which seems to work well... actually I took the example on the zend website and modified it to include more addresses.

$body = preg_replace_callback('/[A-z0-9_\-\.]+[@][A-z0-9_\-]+([.][A-z0-9_\-]+)+[A-z0-9\-]+([.][A-z0-9_\-]+)?+[A-z]?/', array('library', 'mailify'), $body);

It matches dots in the address and optionally matches sub-domain addresses or double suffix domains, like .co.uk and it matches addresses attached to a mailto: statement.

I would appreciate any comments you might be able to throw my way!

Thanks!
: )
Rich

:::::::::::::::::::::::::::::::::
Smiling Souls
http://www.smilingsouls.net
:::::::::::::::::::::::::::::::::
 
Old November 6th, 2003, 03:07 PM
Friend of Wrox
 
Join Date: Jun 2003
Posts: 836
Thanks: 0
Thanked 0 Times in 0 Posts
Default

Your modified version of the pattern works for recognizing telnet and ftp protocol declarations. The reason the trailing slash doesn't get recognized is because the transition from a slash to whitespace (or the end of the line) does NOT constitute a word boundary. I thought that it would...

Remove the last \b in the pattern and the slashes sould be recognized.

When matching hostnames, most people find it sufficient to just enforce the top-level domain to either be 2 or 3 characters. All country domains (ws, tv, uk, en, jp, etc...) and US domain types (net, com, org, edu, gov, mil) will be matched.


Take care,

Nik
http://www.bigaction.org/
 
Old November 6th, 2003, 04:31 PM
richard.york's Avatar
Wrox Author
 
Join Date: Jun 2003
Posts: 1,706
Thanks: 0
Thanked 6 Times in 6 Posts
Default

Thanks Nik, that did the trick.

:::::::::::::::::::::::::::::::::
Smiling Souls
http://www.smilingsouls.net
:::::::::::::::::::::::::::::::::





Similar Threads
Thread Thread Starter Forum Replies Last Post
Do I need regular expressions..? ypomonh XSLT 2 May 20th, 2007 05:09 PM
Help with Regular Expressions WestRowOps Other Programming Languages 1 May 18th, 2007 05:34 AM
Regular Expressions mega Beginning PHP 1 February 5th, 2007 05:31 PM
Regular expressions on C# hideway C# 2 November 27th, 2006 05:08 PM
regular expressions help kyootepuffy Classic ASP Databases 2 September 10th, 2003 01:37 PM





Powered by vBulletin®
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Copyright (c) 2020 John Wiley & Sons, Inc.