FireFox! The PHP Forum Loans and Credit
Panama Web Design for Hire Free Insurance Quotes!
Web Hosting Advertise Here $10 a Month Designer Children
Never Pay Taxes Again HGH Domain name registration
Web Hosting and Dedicated Servers Insurance Affordable web-hosting


HomeWatched TopicsRegisterSearchDirectory
FAQMemberlistUsergroupsLog inStoresItemsBank
Google

Reply to topic Page 2 of 2
Goto page Previous  1, 2
Creating a Search Application
Message  

Reply with quote
Post  
    _harvest
Here is where all the real work is being done. In this function we will collect the keywords for one URL and store them in the database. We start off by calling the _checkURL() function to determine the validity of the URL, and then get the source of the URL with the _getData() function.

Next we take the string that contains the source and split it into individual words and store them in an array. We can do this fairly easily with the preg_split() function. We will split the string at every occurrence of a white space character, a comma, or a period.

Then, we will use the array_walk() function and have it call the _prune() function for each array element. You may notice that the array_walk() function call is a little different than you have seen it in the past. For the second parameter, we have to pass it an array that contains the $this pointer as the first element and the name of the function as the second element. This is needed because we are calling a class function.

After the array_walk() function completes its task, we then use the sort function on the array of words. We are not so concerned about actually sorting the array, but we want to force our numerical keys to be sequential. After the array_walk() function finishes, it is very likely that we will have gaps in our enumerated array. As an example, we could have keys 0, 1, and 2 and then it might skip to key 6. In order to renumber our numerical keys, we can simply run the array through the sort function.

The next step we need to take is to insert the URL into the urls table. We should first look to see if it already exists, and if it does delete the keywords associated with it in the keywords table. Doing this will allow us to refresh the information in our database from time to time. If we did not check for the existence of the URL, we could end up with the same URL indexed multiple times.

The final step of the _harvest() function is to insert the keywords into the keywords table. Because we will have a variable number of keywords and those keywords will be ever changing, we need to construct the SQL query dynamically.

We will accomplish this by using the count() function to determine how many words are in the $words array and then adding each word to a variable called $values. We will add the first value to the $values variable outside of the loop so that we can format the SQL query properly with commas in the right places. The $url_id used in the $values variable is taken from the id of the URL in the urls table.

<?php
function _harvest($url) {
    if(!$this->_checkURL($url)) {
        echo "URL is not valid ($url).<br />\n";
    } elseif ($data = $this->_getData($url)) {
        $words = preg_split ("/[\s,.]+/", $data);
        array_walk ($words, array($this, '_prune'), &$words);
        sort ($words);
        $url_id = $this->_db->getone("SELECT id FROM urls WHERE url='$url'");
        if($url_id) {
            $this->_db->query("DELETE FROM keywords WHERE url_id=$url_id");
        } else {
            $this->_db->query("INSERT INTO urls SET url='$url'");
            $url_id = mysql_insert_id();
        }
        $values = "($url_id, '$words[0]')";
        $numwords = count ($words);
        for ($i = 1; $i < $numwords; $i++) {
            $values .= ", ($url_id, '$words[$i]')";
        }
        $this->_db->query("INSERT INTO keywords VALUES $values");
    }
}
?>  

Source: http://codewalkers.com/tutorials/46/16.html

View user's profile Send private message

Reply with quote
Post  
    process
This function iterates through an array of URLs and calls the _harvest() function for each. This is the function we will call from our scripts to access the class.

<?php
function process() {
    foreach($this->_urlarray as $url) {
        $this->_harvest($url);
    }
}
?>  


Source: http://codewalkers.com/tutorials/46/17.html

View user's profile Send private message

Reply with quote
Post  
    Harvest_Keywords Class
Now, we will glue all the bits together for you so that you may see the class file in its entirety. At the beginning of the class file, we have used to require statement to include the database class from the earlier section.

<?php
require('dbclass.php');

class Harvest_Keywords {

    var $_db;
    var $_urlarray;
    var $_stopwords = array ('and', 'but', 'are', 'the');
    var $_allowwords = array ('c++', 'ado', 'vb');

    function Harvest_Keywords($urls) {
        $this->_db = new DB_Class('test', 'username', 'password');
    $this->_urlarray = trim ($urls);
    $this->_urlarray = explode ("\n", $this->_urlarray);
    }

    function _prune (&$item, $key, $array) {
        $item = strtolower ($item);
        if (((preg_match ("/[^a-z0-9'\?!-]/", $item))
           || (strlen ($item) < 3)
           || (in_array($item, $this->_stopwords)))
           && (!in_array($item, $this->_allowwords))) {

             unset($array[$key]);
        } else {
            $item = addslashes(preg_replace("/[^a-z0-9'-]/i",
                                            '', $item));
        }
    }

    function _checkURL($url) {
        return preg_match ("/http:\/\/(.*)\.(.*)/i", $url);
    }

    function _getData($url) {
        $filehandle = @fopen($url, 'r');
        if(!$filehandle) {
            echo "Could not open URL ($url).<br />\n";
            $return = FALSE;
        } else {
            $data = fread($filehandle, 25000);
            fclose($filehandle);
            $data = strip_tags ($data);
            $data = str_replace('&nbsp;', ' ', $data);
            $return = $data;
        }
        return $return;
    }

    function _harvest($url) {
        if(!$this->_checkURL($url)) {
            echo "URL is not valid ($url).<br />\n";
        } elseif ($data = $this->_getData($url)) {
            $words = preg_split ("/[\s,.]+/", $data);
            array_walk ($words, array($this, '_prune'), &$words);
            sort ($words);
            $url_id = $this->_db->getone("SELECT id FROM urls "
                                         . "WHERE url='$url'");
            if($url_id) {
                $this->_db->query("DELETE FROM keywords "
                                  . "WHERE url_id=$url_id");
            } else {
                $this->_db->query("INSERT INTO urls SET url='$url'");
                $url_id = mysql_insert_id();
            }
            $values = "($url_id, '$words[0]')";
            $numwords = count ($words);
            for ($i = 1; $i < $numwords; $i++) {
                $values .= ", ($url_id, '$words[$i]')";
            }
            $this->_db->query("INSERT INTO keywords VALUES $values");
        }
    }

    function process() {
        foreach($this->_urlarray as $url) {
            $this->_harvest($url);
        }
    }
}
?>  

Source: http://codewalkers.com/tutorials/46/18.html

View user's profile Send private message

Reply with quote
Post  
    harvest.php
Now that we have created the harvesting class, we can code the harvest.php script. This is essentially just an HTML form that posts to back to itself. Because we have placed all the code that does the actual keyword harvesting in a class, this script is clean keeps the presentation separate from the logic.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Harvest Keywords</title>
</head>
<body>
<?php
require('harvestclass.php');
if(isset($_POST['submit'])) {
    $gather = new Harvest_Keywords($_POST['urls']);
    $gather->process();
    echo "Keywords Harvested.<br />\n";
}
?>
<h1>Harvest</h1>
<p>Enter URLs to harvest keywords from, each on its own line:</p>
<form method="POST" action="<?php echo $_SERVER['PHP_SELF']; ?>">
<p><textarea name="urls" cols=50 rows=10></textarea></p>
<p><input type="submit" name="submit" value="Submit"></p>
</form>
</body>
</html>  

Source: http://codewalkers.com/tutorials/46/19.html

View user's profile Send private message

Reply with quote
Post  
Searching
So, we now have a method to populate our tables with URLs and keywords. The next thing we need to accomplish is to create a search class. Our search class will provide methods for exact keyword searching and fuzzy searching. Let's start off with an overview of how the class will be built.

Class variables - As with the harvesting class, we will have some variables in the this class to hold data that each function needs access to.
Constructor - The constructor of this search class will connect to the database and then place the search terms into a class variable as an array. It will also record the number of elements in that array in another class variable.
doSearch - This function provides the exact keyword matching search.
_highpercent - This is a private function that will be used along with an array_walk() function call.
_slashit - This private function will also be used for an array_walk() function call.
doFuzzy - In this function we will do a fuzzy search.

SourcE: http://codewalkers.com/tutorials/46/20.html

View user's profile Send private message

Reply with quote
Post  
  Class Variables
Three private class variables will be declared. The will be used to store the database connection identifier, the search terms, and the number of search terms.

<?php
var $_db;
var $_searchterms;
var $_numterms;
?>  

Source: http://codewalkers.com/tutorials/46/21.html

View user's profile Send private message

Reply with quote
Post  
  Constructor
The first task in the constructor is to connect to the database. We will, again, use the database class from earlier in this tutorial. Second, we will add slashes to the search terms provided to this function if the magic_quotes_gpc directive is not turned on. Then, we will separate the search terms and store them in a class variable as an array. If only one word was entered, we will end up with an array of only one element. Last, we will store the number of search terms in a class variable.

<?php
function Search($keywords) {
    $this->_db = new DB_Class('test', 'username', 'password');
    if (!get_magic_quotes_gpc()) {
        $keywords = addslashes ($keywords);
    }
    $this->_searchterms = explode(' ', $keywords);
    $this->_numterms = count($this->_searchterms);
}
?>  

Source: http://codewalkers.com/tutorials/46/22.html

View user's profile Send private message

Reply with quote
Post  
  doSearch
In this function we will do an exact keyword search. This search is relatively straightforward, as it only consists of a single SQL query. Because the number of search terms can change, we need to dynamically create the query. This query will result in URLs being returned along with the number of times those URLs contained the search terms. The results will be ordered from highest to lowest by the number of search term matches.

This function will return the data from the query, or FALSE if there were not any rows in the result set.

<?php
function doSearch() {
    $match = "keywords.keyword in ('"
           . $this->_searchterms[0] . "'";
    for ($i = 1; $i < $this->_numterms; $i++) {
        $match .= ", '" . $this->_searchterms[$i] . "'";
    }
    $match .= ")";
    $query = "SELECT urls.url, count(*) as counter "
           . "FROM urls, keywords "
           . "WHERE $match "
           . "AND keywords.url_id = urls.id "
           . "GROUP BY keywords.url_id "
           . "ORDER BY counter DESC";
    $result = $this->_db->fetch($query);
    if (count($result) > 0)
        $return = $result;
    else
        $return = FALSE;

    return $return;
}
?>  

Source: http://codewalkers.com/tutorials/46/23.html

View user's profile Send private message

Reply with quote
Post  
  _highpercent
This private function of the class will be used along with the array_walk() function in our fuzzy search. It will determine if an element of an array is less than 60. If it is, it will unset that element from the array.

<?php
function _highpercent(&$item, $key, $array) {
    if ($item < 60)
        unset($array["$key"]);
}
?>  

Source: http://codewalkers.com/tutorials/46/24.html

View user's profile Send private message

Reply with quote
Post  
  _slashit
Another private function that will be used with array_walk(), this one will add slashes to an element of an array.

<?php
function _slashit(&$item, $key) {
    $item = addslashes($item);
}
?>  

Source: http://codewalkers.com/tutorials/46/25.html

View user's profile Send private message

Reply with quote
Post  
  doFuzzy
Now it is time to develop a way to do fuzzy searching. What we want to do is find words that are similar to the ones the user searched for and didn't exist in our keyword list. To make this happen, we will use the similar_text() function of PHP.

The similar_text() function provides very good results but at the price of performance. Running a very large amount of keywords through the similar_text() function could create a sluggish application. One alternative is the levenshtein() function which is much better performance wise when compared to the similar_text() function. The drawback is that levenshtein() produces results that are not quite as accurate as similar_text().

The similar_text() function takes three parameters. The first and second are the words to compare, and the third is a variable to store the percentage of how similar they are. A similarity of 100% would be exactly the same, and it goes down from there. The levenshtein() function works in a very similar fashion to similar_text(). The one difference is that it assigns a cost value to the third parameter rather than a percentage. In this case, a lower number means a closer match. If you choose to use the levenshtein() function, keep this in mind when sorting the array later in the script. You will need to sort it in normal order rather than in reverse.

Now that we have covered a little bit of theory behind this function, let's take a look at how it will actually work.

Select all keywords from the keywords table that start with the same character as one of our search terms.
Build a list of matches for each search term
Merge the search term matches
Build query
Run query and display results
Select all keywords from table
The first thing we need to do is query the database and select all the keywords from the keywords table that start with the same character as one of our search terms. To only match against the first letter of the keyword, we will utilize the LEFT function of MySQL in our query. We will also use the DISTINCT SQL function to only retrieve unique keywords. The query will be built dynamically as we have done several other times. We will take the results from this query and store them in an array.

<?php
$match = "LEFT(keyword,1) in ('"
       . substr($this->_searchterms[0], 0, 1) . "'";
for ($i = 1; $i < $this->_numterms; $i++) {
    $match .= ", '" . substr($this->_searchterms[$i], 0, 1) . "'";
}
$match .= ")";
$query = "SELECT DISTINCT(keyword) FROM keywords WHERE $match";
$keywords = $this->_db->fetch($query);
?>  


Build a list of matches for each search term
Now, we need to take each search term and each keyword and let the similar_text() function determine how alike they are. We will use foreach loops to iterate through each search term and keyword. Before we begin, we will use the reset() function to return the array pointer back to the first element of the $_searchterms class variable.

Once we determine how similar each keyword is in relation to a search term, we use the array_walk() function to call the _highpercent() function for each element to remove any keywords that are found to be less than 60% similar.

<?php
reset($this->_searchterms);
foreach ($this->_searchterms as $term) {
    foreach ($keywords as $keyword) {
        $word = $keyword['keyword'];
        similar_text($term, $word, $matches["$term"]["$word"]);
    }
    array_walk ($matches["$term"],
                array($this, '_highpercent'),
                &$matches["$term"]);
}
?>  


Merge the search term matches
At this point, we will have an associative array named $matches that has as an element for each search term with that search term as the key. Each of these elements is also an associative array. What we now need to do is to combine each of these inner associative arrays into one. To do this we will need to use the array_merge() function. Because we don't know how many search terms we are dealing with, we will need to also use the eval() function to accomplish this task.

The eval() function takes a string and evaluates it as PHP code. What we are going to do is build a string that contains a PHP statement, much as we did for the queries earlier. Once we build the string, we will pass it to the eval() function and let it evaluate it. Upon evaluation, the inner arrays will be merged and stored in the $merged variable.

If we only have one search term, we don't need to go through all this so we will just assign the contents of the only inner array to the variable $merged. Once we have the data in the $merged array we will sort it in reverse order, and maintain key relationships, with the arsort() function. Then we will extract the names of keys, as they are the similar keywords, and store them in an array called $search.

<?php
if($this->_numterms > 1) {
    $merge = '$merged = array_merge($matches["'
           . $this->_searchterms[0] . '"]';
    for ($i = 1; $i < $this->_numterms; $i++) {
        $merge .= ', $matches["'
                . $this->_searchterms[$i] . '"]';
    }
    $merge .= ');';
    eval ($merge);
} else {
    $merged = $matches[$this->_searchterms[0]];
}
arsort($merged);
$search = array_keys($merged);
?>  


Build query
Now, we have a list of similar keywords that exist in our keywords table. We now must run a query to determine what URLs they are associated with and with what frequency they occur. Before we run the query, however, we will escape any character in the $search array by passing it through the array_walk() function and specifying the function _slashit() that will call the addslashes() function for each array element.

Then, we will build the query as we have in the past. If we have rows in the result set, we will return them. Otherwise we will return FALSE.

<?php
array_walk($search, array($this, '_slashit'));
$match = "keywords.keyword in ('"
       . $search[0] . "'";
$numwords = count ($search);
for ($i = 1; $i < $numwords; $i++) {
    $match .= ", '" . $search[$i] . "'";
}
$match .= ")";
$query = "SELECT urls.url, keywords.keyword, count(*) as counter "
       . "FROM urls, keywords "
       . "WHERE $match "
       . "AND keywords.url_id = urls.id "
       . "GROUP BY keywords.url_id, keywords.keyword "
       . "ORDER BY counter DESC";
$result = $this->_db->fetch($query);
if (count($result) > 0)
    $return = $result;
else
    $return = FALSE;

return $return;
?>  

SourcE: http://codewalkers.com/tutorials/46/26.html

View user's profile Send private message

Reply with quote
Post  
  Search Class
Now, we present the search class script as a whole.

<?php
require('dbclass.php');

class Search {

    var $_db;
    var $_searchterms;
    var $_numterms;

    function Search($keywords) {
        $this->_db = new DB_Class('test', 'username', 'password');
        if (!get_magic_quotes_gpc()) {
            $keywords = addslashes ($keywords);
        }
        $this->_searchterms = explode(' ', $keywords);
        $this->_numterms = count($this->_searchterms);
    }

    function doSearch() {
        $match = "keywords.keyword in ('"
               . $this->_searchterms[0] . "'";
        for ($i = 1; $i < $this->_numterms; $i++) {
            $match .= ", '" . $this->_searchterms[$i] . "'";
        }
        $match .= ")";
        $query = "SELECT urls.url, count(*) as counter "
               . "FROM urls, keywords "
               . "WHERE $match "
               . "AND keywords.url_id = urls.id "
               . "GROUP BY keywords.url_id "
               . "ORDER BY counter DESC";
        $result = $this->_db->fetch($query);
        if (count($result) > 0)
            $return = $result;
        else
            $return = FALSE;

        return $return;
    }

    function _highpercent(&$item, $key, $array) {
        if ($item < 60)
            unset($array["$key"]);
    }

    function _slashit(&$item, $key) {
        $item = addslashes($item);
    }

    function doFuzzy() {
        $match = "LEFT(keyword,1) in ('"
               . substr($this->_searchterms[0], 0, 1) . "'";
        for ($i = 1; $i < $this->_numterms; $i++) {
            $match .= ", '" . substr($this->_searchterms[$i], 0, 1)
                    . "'";
        }
        $match .= ")";
        $query = "SELECT DISTINCT(keyword) FROM keywords "
               . "WHERE $match";
        $keywords = $this->_db->fetch($query);
        reset($this->_searchterms);
        foreach ($this->_searchterms as $term) {
            foreach ($keywords as $keyword) {
                $word = $keyword['keyword'];
                similar_text($term, $word,
                $matches["$term"]["$word"]);
            }
            array_walk ($matches["$term"],
                        array($this, '_highpercent'),
                        &$matches["$term"]);
        }
        if($this->_numterms > 1) {
            $merge = '$merged = array_merge($matches["'
                   . $this->_searchterms[0] . '"]';
            for ($i = 1; $i < $this->_numterms; $i++) {
                $merge .= ', $matches["'
                        . $this->_searchterms[$i] . '"]';
            }
            $merge .= ');';
            eval ($merge);
        } else {
            $merged = $matches[$this->_searchterms[0]];
        }
        arsort($merged);
        $search = array_keys($merged);
        array_walk($search, array($this, '_slashit'));
        $match = "keywords.keyword in ('"
               . $search[0] . "'";
        $numwords = count ($search);
        for ($i = 1; $i < $numwords; $i++) {
            $match .= ", '" . $search[$i] . "'";
        }
        $match .= ")";
        $query = "SELECT urls.url, keywords.keyword, "
               . "count(*) as counter "
               . "FROM urls, keywords "
               . "WHERE $match "
               . "AND keywords.url_id = urls.id "
               . "GROUP BY keywords.url_id, keywords.keyword "
               . "ORDER BY counter DESC";
        $result = $this->_db->fetch($query);
        if (count($result) > 0)
            $return = $result;
        else
            $return = FALSE;

        return $return;
    }
}
?>  

Source: http://codewalkers.com/tutorials/46/27.html

View user's profile Send private message

Reply with quote
Post  
  search.php
Now, we can create a script called search.php and actually do some searching. In this script, we will display a text box to allow for search terms to be entered. When the form is submitted, we will first do an exact keyword match. If that does not yield any results, we will perform a fuzzy search.

<?php
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Search</title>
</head>
<body>
<h1>Search</h1>
<p>Enter keywords to search for:</p>
<form method=POST action="<?= $_SERVER['PHP_SELF'] ?>">
<p><input type="text" name="search_term" size="20">
<input type="submit" name="submit" value="Submit"></p>
</form>
<?php

require('searchclass.php');

if (isset ($_POST['submit'])) {
    $search = new Search($_POST['search_term']);
    $results = $search->doSearch();
    if($results) {
        echo "<p><b>Your search results:</b></p>\n";
        echo "<p>";
        foreach($results as $row) {
            echo "<a href=\"{$row['url']}\">"
                 . "{$row['url']}</a><br />\n";
        }
        echo "</p>\n";
    } else {
        $results = $search->doFuzzy();
        echo "<p><b>No matches! ";
        if($results) {
            echo "These pages contain similar words "
                 . "to what you searched for:</b></p>\n";
            echo "<p><i>(Similar terms are in parentheses)</i></p>\n";
            echo "<p>";
            foreach($results as $row) {
                echo "<a href=\"{$row['url']}\">{$row['url']}"
                     . "</a>&nbsp;({$row['keyword']})<br />\n";
            }
            echo "</p>\n";
        } else {
            echo "No similar words either.</b></p>\n";
        }
    }
}
?>
</body>
</html>
?>  

Source: http://codewalkers.com/tutorials/46/28.html

View user's profile Send private message
Display posts from previous:
Reply to topic Page 2 of 2
Goto page Previous  1, 2
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  



Google

FireFox! The PHP Forum Loans and Credit
Panama Web Design for Hire Free Insurance Quotes!
Web Hosting Advertise Here $10 a Month Designer Children
Never Pay Taxes Again HGH Domain name registration
Web Hosting and Dedicated Servers Insurance Affordable web-hosting


Web Design by PlatinumShore.com & Web Hosting by TradeWebHosting.com