What is Google Crawler?
“Crawler” (sometimes also called a “robot” or “spider”) is a generic term for any program that is used to automatically discover and scan websites by following links from one webpage to another. Google’s main crawler is called Googlebot. This table lists information about the common Google crawlers you may see in your referrer logs, and how to specify them in robots.txt, the robots meta tags, and the X-Robots-Tag HTTP directives.
The following table shows the crawlers used by various products and services at Google:
- The user agent token is used in the User-agent: line in robots.txt to match a crawler type when writing crawl rules for your site. Some crawlers have more than one token, as shown in the table; you need to match only one crawler token for a rule to apply. This list is not complete but covers most of the crawlers you might see on your website.
- The full user agent string is a full description of the crawler and appears in the request and your weblogs.
User agents in robots.txt
Where several user agents are recognized in the robots.txt file, Google will follow the most specific. If you want all of Google to be able to crawl your pages, you don’t need a robots.txt file at all. If you want to block or allow all of Google’s crawlers from accessing some of your content, you can do this by specifying Googlebot as the user agent.
Some pages use multiple robots meta tags to specify directives for different crawlers, like this: In this case, Google will use the sum of the negative directives, and Googlebot will follow both the no-index and no-follow directives. More detailed information about controlling how Google crawls and indexes your site.
Controlling crawl speed
Each Google crawler accesses sites for a specific purpose and at different rates. Google uses algorithms to determine the optimal crawl rate for each site. If a Google crawler is crawling your site too often, you can reduce the crawl rate.