 |
| Pro PHP Advanced PHP coding discussions. Beginning-level questions will be redirected to the Beginning PHP forum. |
Welcome to the p2p.wrox.com Forums.
You are currently viewing the Pro PHP section of the Wrox Programmer to Programmer discussions. This is a community of software programmers and website developers including Wrox book authors and readers. New member registration was closed in 2019. New posts were shut off and the site was archived into this static format as of October 1, 2020. If you require technical support for a Wrox book please contact http://hub.wiley.com
|
|
|
|

April 30th, 2004, 09:13 AM
|
|
Friend of Wrox
|
|
Join Date: Jun 2003
Posts: 256
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Code critique invited.
Hi all,
I've had this chunk of code for some time, and I tend to use it quite a lot, as it is - and certainly it's proven useful as such. It's simple, reliable, pretty fast, effective... and a bit of a hack, so I'd welcome some impartial comments about how it could be improved, or whether it's any use to people, as it is.
Basically what it's for, is finding all the words in a search string, and highlighting them when you output matching results - a bit like the Google highlighter. The highlighting is case-sensitive, finding instances of the search words in three possible guises:
1. all-in-lower-case
2. lower-case-with-first-word-capitalised
and 3. all-in-capitals
It preserves this in the highlighted results.
The real hack is in how it avoids munging it's own highlighting tags during sequential searches. As a disclaimer, I will say that it actually predates preg_replace, and has thus rather evolved, over time. In the interests of a) passing on a passable and useful bit of code that I thinks add a visually compelling element to search results, and b) improving it without having to do any work myself (:)), I place it before you all for your scruitiny and withering critique.
[u]Synopsis:</u>
You'll have done a search for something in a database along the lines of:
$searchString = $_POST['searchString'];
$field = $_POST['field'];
// retrieve all contacts from DB
$sql = "SELECT $field ";
$sql .= "FROM table ";
$sql .= "WHERE $field LIKE '%$searchString%'";
$sql .= "ORDER BY TRIM($field) ASC";
$result = $db->query($sql);
Where we specify a search string and a field to search. That's pretty straightforward, I know. However, with our list of searched fields, in each case, we can do this:
$searchedField = $row[$field];
$searchedField = highlighter(trim($searchString), $searchedField);
...which shoves it through this thing:
<?php
/* This function runs through the text given to it as '$haystack' and
highlights all matching instances of all words in the search string
'$needle'. It probably seems Byzantine, but I'm sure it will prove
useful, with further refinement. */
function highlighter($needle, $haystack){
//Break the search string into single words...
$needles = explode(" ", $needle);
/*We create two easily matched strings. These mark the
start and end of each highlighted section and will be replaced by
<span class="highlight"> tags in the final run through.*/
$regstart = "#¬#~#";
$regend = "#~#¬#";
/*The rationale is that these strings are very unlikely to
actually be part of the string we're searching. If we were to
insert <span class=> tags directly, they would be liable to
insertion, themselves, on each subsequent search&replace (if
we were searching for fragments of "span class=", such as "a" or
"an" - or the word "class", itself, of course!)*/
//Then we pattern-match a maximum of four times for each word...
foreach($needles as $needleword){
/*Start building our search & replace string arrays, starting
with the search text as first entered...*/
$patterns[] = "/" .$needleword . "/";
$replacements[] = $regstart .$needleword. $regend;
/*Then, if the word isn't in lower case already, we'll search
for it in lowercase
if($needleword!=strtolower($needleword)){
$patterns[] = "/" . strtolower($needleword) . "/";
$replacements[] = $regstart.strtolower($needleword).$regend;
}
/*Then, if the word doesn't have a captial letter for its
first letter already, we search with the first letter
capitalised*/
if($needleword!=ucwords(strtolower($needleword))){
$needleword = ucwords(strtolower($needleword));
$patterns[] = "/" . $needleword . "/";
$replacements[] = $regstart .$needleword. $regend;
}
//Then, finally, if the word isn't capitalised already, we
search for it in ALL CAPITALS.
if($needleword!=strtoupper($needleword)){
$needleword=strtoupper($needleword);
$patterns[] = "/" . $needleword . "/";
$replacements[] = $regstart .$needleword. $regend;
}
}//... we do this for each word in turn
//Now perform the replacements...
$haystack = preg_replace($patterns, $replacements, $haystack);
//... then replace our delimiters with the actual <span> tags...
$haystack = ereg_replace($regstart, "<span class=\"highlight\">",$haystack);
$haystack = ereg_replace($regend, "</span>", $haystack);
/*(Could probably use preg-replace for this, too, but creating the
arrays in the firstplace probaly takes just as long...)*/
//...and then we return our modified string...
return $haystack;
}
?>
Where "lighlight" is, obviously, something pretty distinctive like yellow text on a dark maroon background (what sort of bunch of aesthetically inept loosers would adopt a colourscheme like that?)
Anyway, what do you reckon?
Dan
|
|

May 1st, 2004, 06:33 PM
|
 |
Wrox Author
|
|
Join Date: Jun 2003
Posts: 1,706
Thanks: 0
Thanked 6 Times in 6 Posts
|
|
I dunno Dan, this looks pretty good to me. I use little hacks like that from time to time.. that is invent placeholders for data. One of my favorites is to use characters that bear resemblence to HTML entities, such as: &id; where a unique id will be replaced later on. I've done a similar thing in my search program, but I don't think mine was case sensative. I just did a straight-up replacement of all search words using str_replace, so my approach wasn't quite as advanced. Obviously using preg_replace would be marginally faster, but who's counting the milliseconds?
My $0.02, anyway. Maybe Nik's listening in and has something to say.
Regards,
Rich
::::::::::::::::::::::::::::::::::::::::::
The Spicy Peanut Project
http://www.spicypeanut.net
::::::::::::::::::::::::::::::::::::::::::
|
|

May 2nd, 2004, 05:32 AM
|
|
Friend of Wrox
|
|
Join Date: Jun 2003
Posts: 256
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Aye? Well, you're welcome to use it :). I'll "BSD licence" it.
It's quite cute, when you show your script to the client and perform a search for "ea", say, and the results instantly start filling the screen with all instances "year", "Early" and "BEA Weblogic" highlighted - with the letters still in their correct case.
What I'd like, is for suggestion of some means of highlighting he _actual_ search string. At the moment, if you do a search for "mary had a little lamb", the _database_ is searched for all instances of field LIKE '%mary had a little lamb%' (and the search is case-insensitive, by default, of course). However, the results are displayed with all instances of "mary", "a", "had" (including "mary's lamb was eaten by a wolf hiding in the shaddows", etc., highlighted - which rather detracts from the effect :P.
It probably just requires a bit of thought, but, at the moment, I notice that the sun is shining, outside, so I think I'll just go and do something very unprogrammer-like that involves getting dirty and tired, instead...
|
|

May 2nd, 2004, 06:36 AM
|
 |
Wrox Author
|
|
Join Date: Jun 2003
Posts: 1,706
Thanks: 0
Thanked 6 Times in 6 Posts
|
|
Quote:
quote:Originally posted by Daniel Walker
What I'd like, is for suggestion of some means of highlighting he _actual_ search string. At the moment, if you do a search for "mary had a little lamb", the _database_ is searched for all instances of field LIKE '%mary had a little lamb%' (and the search is case-insensitive, by default, of course). However, the results are displayed with all instances of "mary", "a", "had" (including "mary's lamb was eaten by a wolf hiding in the shaddows", etc., highlighted - which rather detracts from the effect :P.
|
Ah, I see what you're saying now (sorry I'm a bit dense for the details now and then). Well you probably need some more Google-ish syntax. Do you already have a mechanism in place to specify the search string literally and not as exploded terms?
"Mary had a little lamb" with exploded terms:
It goes in as
SEARCH table
WHERE FIELD
LIKE '%word1%' AND
LIKE '%word2%' ...etc.
1st its exploded on the space, then highlighted using that array.
Whereas, if the search string is delimited by quotations, "\"Mary had a little lamb\"", the parts enclosed with quotations are supposed to be taken literally.
It goes
SEARCH table
WHERE FIELD
LIKE '%search_string%'
The the term is exploded into bits based on where the quotations start and stop.
I haven't gottten around to implementing "Google" syntax like this myself, the following was my approach on it.
Code:
/*
* mixed explode_search(void) takes a search term and breaks it down into individual words
* via the explode() function, this is then passed to an array and a where clause
* is built from the search term array and a field array.
*
*/
function explode_search()
{
if (isset($_GET["search"]))
{
$search = urldecode($_GET["search"]);
if (stristr($search, " "))
{
$search = trim($search);
$search = explode(" ", $search);
for ($n = 0; each($this->search_fields); $n++)
{
if ($n == 0) $where = $this->loop_search($search, $n, $this->search_fields[$n]);
else $where .= $this->loop_search($search, $n, $this->search_fields[$n]);
}
}
else
{
for ($n = 0; each($this->search_fields); $n++)
{
if ($n == 0) $where = $this->search_fields[$n]." LIKE '%".$search."%'";
else $where .= " OR ".$this->search_fields[$n]." LIKE '%".$search."%'";
}
}
return $where;
}
}
/*
* loop_search() is a function called upon by explode_search() to build a
* where clause.
*/
function loop_search($search, $n, $search_field)
{
for ($i = 0; each($search); $i++)
{
if ($n == 0 && $i == 0)
{
$where = $search_field." LIKE '%".$search[$i]."%'";
}
else
{
$where .= ($i == 0)? " OR ".$search_field." LIKE '%".$search[$i]."%'" : " AND ".$search_field." LIKE '%".$search[$i]."%'";
}
}
return $where;
}
$where = $this->explode_search();
It takes a pre-defined list of fields and builds the whole WHERE query. It could be easily modified to do Google syntax, but I haven't yet gotten around to it.
With this approach you can save the search term array when its built here, then pass it along to your highlighter function.. then there's no need to explode it there and the regular expressions in that function will just deal with the words or phrases you pass to it, surpassing just exploded search terms.
That'd be the way I'd go about it anyway :).
Regards,
Rich
::::::::::::::::::::::::::::::::::::::::::
The Spicy Peanut Project
http://www.spicypeanut.net
::::::::::::::::::::::::::::::::::::::::::
|
|

May 3rd, 2004, 07:20 AM
|
|
Friend of Wrox
|
|
Join Date: Jun 2003
Posts: 256
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Right. I prefer the "exact match" approach, since it is usually what the user intends, in my
experience. However, having just read your code, it suddenly hit me like a brick on the forehead
why the Google search engine asks for double quotes around exact match searches: it
isn't just some quaint little shorthand that it happens to be using, but a request, on the part
of Google, for the user to insert a piece of regular expression (the " sign) that it can
then shove through it's sausage maker. My mistake was doing the explode within the highlighter,
by default. I should make the explode an option for all portions of the text passed to it without
double quotes around it and push all double-quoted text through the mill, unexploded. Database
searches for text that wasn't double quoted could be handled by the code you have give, above
(explode it and then search for each LIKE '%word%'), while (as far as the database search was
concerned) text that was quoted could be handled by code like that in the original post.
I suppose the reason I want to find out how this is done is partly because it's useful, in itself,
but mostly because I happen to now that they (Google) use standard PHP/Apache running
on standard Linux boxes to achieve the same effect... so it must be doable :).
|
|

May 10th, 2004, 08:21 AM
|
|
Friend of Wrox
|
|
Join Date: Jun 2003
Posts: 256
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
For what it's worth, I completely replaced my somewhat combersome process of
three/four-way search by using stristr to parse out matching block of text
in a case-insensitive manner - preserving the exact matches and adding them
to an array of replacements on an if(!in_array(... basis.
This has fixed one the longstanding problems with the code, since it finds
and highlights suitable matches for sources of bicapitalisation, such as
surnames like McDonnald, MacDonnald, O'Niel, D'Acre, etc., as well as camel-casing
in quoted code, shift-key-obsessive languge names like JavaScript, etc.. All of
these shared a sequence of capitalisation that did not come close to matching the
somewhat simplistic rules I had originally been using.
This, coupled with an improved method of searching that I'm building, using wildcards
that the user can input, and double quotes to indicate exact matches for word sequences
should make for a much more useful piece of code. I'll probably write this into a web
article and post it up, when I'm done, but what I've described here is a brief overview.
|
|

May 11th, 2004, 04:25 PM
|
|
Friend of Wrox
|
|
Join Date: Jun 2003
Posts: 256
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Maintaining this dialogue with myself even further, I see how stupid I've been to
attempt this mechanistic approach to isolating matching strings in a procedural
manner, when standard POSIX regular expressions could have done it for me.
By saying:
$haystack = preg_replace('/('.$needle.')/i','<span class="highlighter">$1</span>',$haystack);
I'd have been able to wrap <span>s around all matching instances
of needle without needing to do all that elaborate search&replace stuff.
Oh well, time to do some proper reading up on RegExs, I suppose.
|
|
 |