Top Search Engines and their Robots Name

 

search-engine-robots-name

A robots.txt file is to be used to allow or disallow search engine bots to index some specific part of the website. They are used to block website or its internal directories. If you don’t want to hide any directory or page of the website then you don’t need a robots txt file. For security reason you must need to have robots file on your website so that you can protect your website from spam crawlers. All major search engines have some specific robot name. Generate your website robots file using robots txt generator tool . In a recent Google update in 2015 it is declared that if pages are coming in search engine result page from your tags, categories or authors then they will be consider as duplicate. So, here you need to block them by using the robots txt format given below.

User-agent: *
Disallow: /tag/
Disallow: /category/
Disallow: /author/
 
How to block a specific page by Robots txt. The answer is given below.User-agent: *

Disallow: /office-spce-in-india.php

This will block a page starts with this URL. You can put your sitemap link in robots txt file , so you can give direction to robots to crawl the website in a particular manners.

User-agent: *
Disallow: sitemap: http://www.example.com/sitemap.xml

These are the major search engines and their robots name.

Search Engine Robots Name
Google - Googlebot
Yahoo yahoo-slurp
MSN Search msnbot
Bing Bing Bot
Yandex YandexBot
Ask/Teoma teoma
Baidu baiduspider
Alexa/Wayback ia_archiver
DMOZ Checker robozilla
GigaBlast gigabot
Cuil twiceler
Naver naverbot, yeti

2,167 total views, 3 views today

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>