Get top 100 words / keywords from a text with PHP

Submitted by Anonymous (not verified) on Wed, 12/02/2009 - 18:45

This is useful if you want to create dynamic keywords from content or just sort words by appearing frequency in a text or html by excluding very common words like "the on and to ...". You can give custom limit of words to return and custom words to ignore.

<?php

function top_words($str, $limit=100, $ignore=""){

    if(!$ignore) $ignore = "the of to and a in for is The that on said with be was by";
   
   
   
$ignore_arr = explode(" ", $ignore);

    $str = trim($str);
   
$str = preg_replace("#[&].{2,7}[;]#sim", " ", $str);
   
$str = preg_replace("#[()°^!"§\$%&/{(\[)\]=}?´`,;.:\-_\#'~+*]#", " ", $str);
   
$str = preg_replace("#\s+#sim", " ", $str);
   
$arraw = explode(" ", $str);
   
    foreach(
$arraw as $v){
       
$v = trim($v);
        if(strlen(
$v)<3 || in_array($v, $ignore_arr)) continue;
       
$arr[$v]++;
    }
   
    arsort(
$arr);
   
    return array_keys( array_slice(
$arr, 0, $limit) );
}

// usage:
// $meta_keywords = implode(", ", top_words( strip_tags( $html_content ) ) );
?>

Add new comment