doFuzzy
Now it is time to develop a way to do fuzzy searching. What we want to do is find words that are similar to the ones the user searched for and didn't exist in our keyword list. To make this happen, we will use the similar_text() function of PHP.
The similar_text() function provides very good results but at the price of performance. Running a very large amount of keywords through the similar_text() function could create a sluggish application. One alternative is the levenshtein() function which is much better performance wise when compared to the similar_text() function. The drawback is that levenshtein() produces results that are not quite as accurate as similar_text().
The similar_text() function takes three parameters. The first and second are the words to compare, and the third is a variable to store the percentage of how similar they are. A similarity of 100% would be exactly the same, and it goes down from there. The levenshtein() function works in a very similar fashion to similar_text(). The one difference is that it assigns a cost value to the third parameter rather than a percentage. In this case, a lower number means a closer match. If you choose to use the levenshtein() function, keep this in mind when sorting the array later in the script. You will need to sort it in normal order rather than in reverse.
Now that we have covered a little bit of theory behind this function, let's take a look at how it will actually work.
Select all keywords from the keywords table that start with the same character as one of our search terms.
Build a list of matches for each search term
Merge the search term matches
Build query
Run query and display results
Select all keywords from table
The first thing we need to do is query the database and select all the keywords from the keywords table that start with the same character as one of our search terms. To only match against the first letter of the keyword, we will utilize the LEFT function of MySQL in our query. We will also use the DISTINCT SQL function to only retrieve unique keywords. The query will be built dynamically as we have done several other times. We will take the results from this query and store them in an array.
<?php
$match = "LEFT(keyword,1) in ('"
. substr($this->_searchterms[0], 0, 1) . "'";
for ($i = 1; $i < $this->_numterms; $i++) {
$match .= ", '" . substr($this->_searchterms[$i], 0, 1) . "'";
}
$match .= ")";
$query = "SELECT DISTINCT(keyword) FROM keywords WHERE $match";
$keywords = $this->_db->fetch($query);
?>
Build a list of matches for each search term
Now, we need to take each search term and each keyword and let the similar_text() function determine how alike they are. We will use foreach loops to iterate through each search term and keyword. Before we begin, we will use the reset() function to return the array pointer back to the first element of the $_searchterms class variable.
Once we determine how similar each keyword is in relation to a search term, we use the array_walk() function to call the _highpercent() function for each element to remove any keywords that are found to be less than 60% similar.
<?php
reset($this->_searchterms);
foreach ($this->_searchterms as $term) {
foreach ($keywords as $keyword) {
$word = $keyword['keyword'];
similar_text($term, $word, $matches["$term"]["$word"]);
}
array_walk ($matches["$term"],
array($this, '_highpercent'),
&$matches["$term"]);
}
?>
Merge the search term matches
At this point, we will have an associative array named $matches that has as an element for each search term with that search term as the key. Each of these elements is also an associative array. What we now need to do is to combine each of these inner associative arrays into one. To do this we will need to use the array_merge() function. Because we don't know how many search terms we are dealing with, we will need to also use the eval() function to accomplish this task.
The eval() function takes a string and evaluates it as PHP code. What we are going to do is build a string that contains a PHP statement, much as we did for the queries earlier. Once we build the string, we will pass it to the eval() function and let it evaluate it. Upon evaluation, the inner arrays will be merged and stored in the $merged variable.
If we only have one search term, we don't need to go through all this so we will just assign the contents of the only inner array to the variable $merged. Once we have the data in the $merged array we will sort it in reverse order, and maintain key relationships, with the arsort() function. Then we will extract the names of keys, as they are the similar keywords, and store them in an array called $search.
<?php
if($this->_numterms > 1) {
$merge = '$merged = array_merge($matches["'
. $this->_searchterms[0] . '"]';
for ($i = 1; $i < $this->_numterms; $i++) {
$merge .= ', $matches["'
. $this->_searchterms[$i] . '"]';
}
$merge .= ');';
eval ($merge);
} else {
$merged = $matches[$this->_searchterms[0]];
}
arsort($merged);
$search = array_keys($merged);
?>
Build query
Now, we have a list of similar keywords that exist in our keywords table. We now must run a query to determine what URLs they are associated with and with what frequency they occur. Before we run the query, however, we will escape any character in the $search array by passing it through the array_walk() function and specifying the function _slashit() that will call the addslashes() function for each array element.
Then, we will build the query as we have in the past. If we have rows in the result set, we will return them. Otherwise we will return FALSE.
<?php
array_walk($search, array($this, '_slashit'));
$match = "keywords.keyword in ('"
. $search[0] . "'";
$numwords = count ($search);
for ($i = 1; $i < $numwords; $i++) {
$match .= ", '" . $search[$i] . "'";
}
$match .= ")";
$query = "SELECT urls.url, keywords.keyword, count(*) as counter "
. "FROM urls, keywords "
. "WHERE $match "
. "AND keywords.url_id = urls.id "
. "GROUP BY keywords.url_id, keywords.keyword "
. "ORDER BY counter DESC";
$result = $this->_db->fetch($query);
if (count($result) > 0)
$return = $result;
else
$return = FALSE;
return $return;
?>
SourcE:
http://codewalkers.com/tutorials/46/26.html