|
Subject:
|
Chapter 15 Search Problem
|
|
Posted By:
|
PhilM
|
Post Date:
|
10/28/2004 2:42:29 PM
|
I've only just noticed that I have this problem.
The search function in the forum seems to be working properly, but it returns "No articles found that match the search term(s) test". Obviously, I have many posts containing the word "test". I also have the FULLTEXT KEY Index on (subject,body) on forum_posts in the database.
As it's not saying it's broken, I don't know how to fix it.
Anyone have any pointers as to what's wrong?
|
|
Reply By:
|
PhilM
|
Reply Date:
|
10/28/2004 4:59:06 PM
|
Again, I've managed to find the answer myself!
The search function is working fine. The reason it won't display my result is explained in the MySQL Manual. For anyone who is interested the link is: http://dev.mysql.com/doc/mysql/en/Fulltext_Search.html
And I quote what it says:
quote: MySQL uses a very simple parser to split text into words. A ``word'' is any sequence of characters consisting of letters, digits, `'', or `_'. Some words are ignored in full-text searches:
* Any word that is too short is ignored. The default minimum length of words that will be found by full-text searches is four characters. * Words in the stopword list are ignored. A stopword is a word such as ``the'' or ``some'' that is so common that it is considered to have zero semantic value. There is a built-in stopword list.
The default minimum word length and stopword list can be changed as described in section 13.6.4 Fine-Tuning MySQL Full-Text Search.
Every correct word in the collection and in the query is weighted according to its significance in the collection or query. This way, a word that is present in many documents has a lower weight (and may even have a zero weight), because it has lower semantic value in this particular collection. Conversely, if the word is rare, it receives a higher weight. The weights of the words are then combined to compute the relevance of the row.
Such a technique works best with large collections (in fact, it was carefully tuned this way). For very small tables, word distribution does not adequately reflect their semantic value, and this model may sometimes produce bizarre results. For example, although the word ``MySQL'' is present in every row of the articles table, a search for the word produces no results:
mysql> SELECT * FROM articles -> WHERE MATCH (title,body) AGAINST ('MySQL'); Empty set (0.00 sec)
The search result is empty because the word ``MySQL'' is present in at least 50% of the rows. As such, it is effectively treated as a stopword. For large datasets, this is the most desirable behavior--a natural language query should not return every second row from a 1GB table. For small datasets, it may be less desirable.
A word that matches half of rows in a table is less likely to locate relevant documents. In fact, it will most likely find plenty of irrelevant documents. We all know this happens far too often when we are trying to find something on the Internet with a search engine. It is with this reasoning that rows containing the word are assigned a low semantic value for the particular dataset in which they occur. A given word may exceed the 50% threshold in one dataset but not another.
The 50% threshold has a significant implication when you first try full-text searching to see how it works: If you create a table and insert only one or two rows of text into it, every word in the text occurs in at least 50% of the rows. As a result, no search returns any results. Be sure to insert at least three rows, and preferably many more.
Explaining that in the book would have saved me a lot of time.
|
|
Reply By:
|
czambran
|
Reply Date:
|
11/2/2004 4:08:18 PM
|
Actually one of the authors of the book went through this before. You could have searched the forum for the answer
Christian
|