PHP Classes

The crawler appears to be broken

Recommend this page to a friend!

      GoogleCrawler  >  All threads  >  The crawler appears to be broken  >  (Un) Subscribe thread alerts  
Subject:The crawler appears to be broken
Summary:The crawler appears to be broken - no results are returned
Messages:3
Author:Ian Noble
Date:2011-08-25 13:35:06
Update:2011-08-28 18:51:36
 

  1. The crawler appears to be broken   Reply   Report abuse  
Picture of Ian Noble Ian Noble - 2011-08-25 13:35:06
The crawler appears to be broken - no results are returned.

The pages appear to be extracted from Google but not parsed correctly. Noticed this a couple of days ago.

  2. Re: The crawler appears to be broken   Reply   Report abuse  
Picture of Ian Noble Ian Noble - 2011-08-26 16:15:52 - In reply to message 1 from Ian Noble
Amending the code in GoogleCrawler.class.php appears to fix:

// echo '<pre>' . htmlspecialchars($this->content) . '</pre><br />';

// $pos = strpos($this->content, '<h2 class=r><a href="', $npos);
$pos = strpos($this->content, '<h2 class="r"><a class="l" href="', $npos);
while($pos != false)
{
$curr = array();
// $npos = strpos($this->content, '"', $pos + 22);
$npos = strpos($this->content, '"', $pos + 34);
if($npos == false)
break;
$curr['url'] = html_entity_decode(substr($this->content, $pos+33, $npos-$pos-33));

$pos = strpos($this->content, '>', $npos);
$npos = strpos($this->content, '</a>', $pos);
if($pos == false || $npos == false)
break;
$curr['title'] = substr($this->content, $pos+1, $npos-$pos-1);

// $pos = strpos($this->content, '<div class=std>', $npos);
$pos = strpos($this->content, '<div class="std"><span class="s">', $npos);
$npos = strpos($this->content, '</span>', $pos);
if($pos == false || $npos == false)
break;
$curr['description'] = substr($this->content, $pos+33, $npos-$pos-33);

// $pos = strpos($this->content, '<nobr><a class=fl href="', $npos);
// $tmppos=strpos($this->content, '<h2 class=r><a href="', $npos);
$pos = strpos($this->content, '<nobr><a href="', $npos);
$tmppos=strpos($this->content, '<h2 class="r"><a class="l" href="', $npos);
if($pos && (!$tmppos || $pos < $tmppos))
{
$npos = strpos($this->content, '"', $pos+15);
$curr['cache-url'] = html_entity_decode(substr($this->content, $pos+15, $npos-$pos-14));
}
else
$curr['cache-url'] = null;
$pos = $tmppos;
$this->results[] = $curr;
}


  3. Re: The crawler appears to be broken   Reply   Report abuse  
Picture of Jonathan Schmidt-Dominé Jonathan Schmidt-Dominé - 2011-08-28 18:51:36 - In reply to message 2 from Ian Noble
Noticed it too, but had no time in the last few days, will fix it now, thanks.