|
The_proper_way_to_use_the_robotstxt_file
| The proper way to use the robots.txt file
When optimizing your web site most webmasters don’t consider
using the robots.txt file. This is a very important file for
your site. It let the spiders and crawlers know what they can
and can not index. This is helpful in keeping them out of
folders that you do not want index like the admin or stats
folder or content that they can not index.
Here is a list of variables that you can include in a robots.txt
file and there meaning:
1)User-agent: In this field you can specify a specific robot to
describe access policy for or a “*” for all robots more
explained in example. 2)Disallow: In the field you specify the
files and folders not to include in the crawl. 3)# the number
sign represents comments
Here are some examples of a robots.txt file for redball.com
User-agent: * Disallow:
The above would let all spiders index all content.
Here another example
User-agent: * Disallow: /cgi-bin/
The above would block all spiders from indexing the cgi-bin
directory.
User-agent: googlebot Disallow:
User-agent: * Disallow: /admin.php Disallow: /cgi-bin/ Disallow:
/admin/ Disallow: /stats/
In the above example googlebot can index everything while all
other spiders can not index admin.php, cgi-bin, admin, and stats
directory. Notice that you can block single files like
admin.php.
About the author:
Jimmy Whisenhunt is the owner of VIP Enterprises
|
|
| |