Should Have a robots.txt File
Your robots.txt page tells
search engine spiders what pages, files and directories are okay to spider,
and which ones are off-limits. robots.txt is a file placed in your root
directory (ie. the same place you have your index page for your www.example.com
page). So your robots file would be located at www.example.com/robots.txt.
If you do not have a robots.txt
file, it can make looking through raw logs a bit of a hassle. Every single
time a bot visits your site, it will first request the robots.txt file.
If you do not have one, this results in a 404 "page not found"
error. If you are searching for the 404 page error (to find any pages
that you may have linked to incorrectly, or pages that have been removed
or moved), you will also see every single one of the robots.txt not found
errors as well.
Not having this file can also
result in skewed log analysis when you are determining percentages of
page requests and page errors.
If you have a custom 404 page,
where visitors would get a "Sorry, this page may have been moved
on MySite.com...", this can also result in excess bandwitdth usage.
It might not be much, but if you are paying on a bandwidth usage basis,
or find that you are constantly going over your bandwidth allowance, consider
it that every KB reduction means less money you pay out.
Do keep in mind that you should
NEVER put any hidden directories in your robots.txt file. This file can
be viewed by anyone (meaning your competitors can go and see any hidden
directories you have listed). There are also some rogue bots that deliberately
try and view the pages and directories that the robots.txt file says are
off-limits. In these cases, you should disallow
robots using meta tags instead.
You should always check that
your syntax is correct. You do not want to accidentily block out all search
engine spiders from your site by a mistake in your file.
You can use our robots.txt
tutorial to create your own file, or you can use our free
robots.txt generator to have one created for you.