Get top 100 words / keywords from a text with PHP

Submitted by Anonymous (not verified) on Wed, 12/02/2009 - 18:45

This is useful if you want to create dynamic keywords from content or just sort words by appearing frequency in a text or html by excluding very common words like "the on and to ...". You can give custom limit of words to return and custom words to ignore.

function top_words($str, $limit=100, $ignore=""){

if(!$ignore) $ignore = "the of to and a in for is The that on said with be was by";

$ignore_arr = explode(" ", $ignore);

$str = trim($str);
$str = preg_replace("#[&].{2,7}[;]#sim", " ", $str);
$str = preg_replace("#[()°^!\"§\$%&/{(\[)\]=}?´`,;.:\-_\#'~+*]#", " ", $str);
$str = preg_replace("#\s+#sim", " ", $str);
$arraw = explode(" ", $str);

foreach($arraw as $v){
$v = trim($v);
if(strlen($v)<3 || in_array($v, $ignore_arr)) continue;
$arr[$v]++;
}

arsort($arr);

return array_keys( array_slice($arr, 0, $limit) );
}

// usage:
// $meta_keywords = implode(", ", top_words( strip_tags( $html_content ) ) );
?>

Add new comment